Record Matching

Record matching compares records from one or more datasets to determine which ones refer to the same real-world entity. It's the core engine behind deduplication, data merging, and entity resolution. Matching evaluates multiple fields simultaneously, name, email, phone, company, address, and produces a confidence score for each potential pair. High-confidence matches merge automatically. Low-confidence matches get reviewed by a human.

Why It Matters

Matching is deceptively hard. Exact matching catches the easy duplicates but misses 50%+ of real matches that have variations in spelling, formatting, or completeness. Overly aggressive matching merges records that should stay separate (two different John Smiths at different companies). The balance between catching real duplicates and avoiding false merges determines whether your data gets better or worse after the process.

Matching Strategies

Example

Two databases being merged: Database A has "Sarah Johnson, [email protected]" and Database B has "S. Johnson, [email protected]." Email matching finds no exact match. But fuzzy name matching (Sarah/S. Johnson = 82% similar) plus domain matching (both @acme.com) produces an 89% overall match score. Auto-merge threshold is 90%, so it goes to human review. The reviewer confirms it's the same person.

Related Terms

Related Resources

Matching records across systems?

We'll match your records using multi-field probabilistic scoring and human QA to get it right.

See What We'll Find