- Home
- AI Entity resolution
Data deduplication with AI: Entity Resolution – Automated data cleansing for greater efficiency
Entity Resolution is an important procedure to avoid multiple entries of data records. We rely on embedding models that use AI to convert the vectors.
Entity Resolution: The problem
Multiple entries can occur in a data record where the same person or object is listed several times. A common example is a list of employees in which the same person is listed under different variants.
Let’s take the head of department “Ms. Erika Mustermann” as an example. She may be listed under different entries such as “Erika”, “Ms. Mustermann”, “e.mustermann@firma.de” or “Head of Sample Department”. Although all these entries refer to the same person, they appear individually.
In order to filter these multiple entries, the data records must be clearly assignable. To do this, a list must be created in which the duplications are filtered into individual, unique data records.
Multiple entries lead to higher operating costs due to redundant data processing, impair data analysis and cause unnecessary work time for cleansing. They can also lead to confusion, for example when different departments access inconsistent data or when wrong decisions are made based on incomplete information.
Example from everyday life
We assume a customer who needs to classify products from different e-commerce websites into the correct categories in the course of a purchase or for integration into a central product database. For example, a product could be listed in the category “Devices > Pneumatics > Control devices” on one website and under “Pneumatic control devices” on another. These different categorizations must be correctly merged to ensure that the products are displayed consistently and correctly everywhere.
How is this implemented?
There are several approaches, but our focus is on mapping by vector similarity. In this process, the different data sets are converted into vectors using an embedding model. On this basis, the cosine distance to each other, i.e. the angle created between the vectors, is then compared to measure the similarity of the data sets. In this way, the ‘sense’ of the data sets is compared and a numerical result is determined.
Entity Resolution – Your benefits
Saving resources
The automated approach primarily saves working time
Immediate effects
After development, the process can be used directly without the need to release additional resources.
Plannable fault tolerance
The data deduplication process provides consistent, predictable fault tolerance.
Automation
It enables the automation of processes that were previously not feasible due to high costs.
MORE ABOUT US?
Find out everything you need to know about Medienwerft – experts in customer experience & e-commerce IT for over 25 years – here:
Über uns
AI only works with architecture: watch the masterclass now
Digital innovation in the building materials trade: AI image search for STARK Germany
The right system for every application: Medienwerft Demo Days – See for yourself
End2End solutions from UX to hosting: Benefit from the expertise of the FIS Group!
Do you fit into our team? Discover all vacancies now and apply now!
Ready for the shipyard? Find out what you can look forward to as an apprentice at Medienwerft!
AI expertise in three episodes: Listen to our podcast series now!
Composable commerce reimagined: Discover Emporix's new strategy now!
Three use cases: Watch the AI Booster in Retail webinar now!