Data profiling is the process of examining your database to understand its quality, structure, and content. It answers fundamental questions: What percentage of records have email addresses? How many duplicates exist? Are phone numbers formatted consistently? Which fields have the most gaps? Profiling gives you the diagnostic picture before you start cleaning, like getting an X-ray before surgery.
Why It Matters
You can't fix what you haven't measured. Most teams assume their data is "pretty good" until profiling reveals that 40% of records are missing phone numbers, 18% of emails bounce, and 12% of accounts are duplicates. Profiling quantifies the problem and tells you where to focus cleaning efforts first. It also establishes a baseline so you can measure improvement after cleaning.
What Profiling Measures
- Completeness: What percentage of records have values in each field? Which fields have the most gaps?
- Accuracy: Do email addresses actually deliver? Do phone numbers connect? Are job titles current?
- Consistency: Are formats standardized? Does 'New York' appear as 'NY,' 'New York,' 'new york,' and 'NYC'?
- Duplication: What's the duplicate rate? How many records share the same email, phone, or name+company combination?
- Freshness: When were records last updated? What percentage haven't been touched in over a year?
Example
A company profiles their 50,000-record CRM before a migration. Results: 72% have emails (but 15% of those bounce), 45% have phone numbers, 88% have company names, 12% are duplicates, and 8,000 records haven't been updated in two years. They now know exactly what needs fixing and can estimate the effort before starting.
Related Terms
Related Resources
Don't know the state of your data?
We'll profile your database and show you exactly where the problems are before we fix them.
See What We'll Find