Contact Data Waterfall Strategy: Maximize Coverage with Multiple Providers
No single data provider has complete coverage. Based on Verum's testing across multiple B2B segments, individual providers typically cover 60-75% of target accounts—but the gaps differ by provider. When you combine them strategically in a waterfall, you can reach 85-95% coverage.
This is the waterfall strategy: query multiple providers in sequence until you get a match. When Provider A doesn't have the data, fall through to Provider B, then Provider C. Like water cascading down, each level catches what the one above missed.
This guide covers how to build, optimize, and operate a contact data waterfall for maximum coverage at reasonable cost.
Why Use a Waterfall?
The math is simple but compelling:
| Approach | Typical Coverage | Relative Cost |
|---|---|---|
| Single provider (premium) | 60-70% | $$ |
| 2-provider waterfall | 75-85% | $$ + $ |
| 3-provider waterfall | 85-92% | $$ + $ + $ |
| 4+ provider waterfall | 90-95% | $$$ (diminishing returns) |
The incremental cost of Provider B only applies to records where Provider A failed. If your primary provider covers 70% of your records, you're only paying Provider B for the remaining 30%. This makes waterfalls surprisingly cost-effective.
How Waterfalls Work
Step 1: Input Record
Name, Company, LinkedIn URL
Step 2: Provider A (Primary)
Best quality, highest cost
Step 3: Provider B (Secondary)
Good coverage, moderate cost
Step 4: Provider C (Tertiary)
Wide coverage, lower accuracy
Step 5: Validation & Output
Verify email, standardize format
Waterfall Logic
- Input: Start with what you know—name, company, LinkedIn URL, domain
- Primary query: Hit your highest-quality (usually highest-cost) provider first
- Evaluate result: Did you get a match? Is it complete? Does it meet quality thresholds?
- Fall through: If no match or incomplete data, query the next provider
- Validate: Regardless of source, validate emails and phones before use
- Tag source: Track which provider supplied each data point for quality monitoring
What Triggers a Fallthrough?
Define clear rules for when to query the next provider:
- No match: Provider returns no results for the input
- Missing critical fields: Match found but missing email or phone
- Low confidence: Provider returns data but with low confidence score
- Stale data: Data found but last verified date is too old
Decision point: Should you always query the next provider if a field is missing? Not necessarily. If you need email AND phone, fall through if email is missing. But if phone is optional, you might accept an email-only result from Provider A rather than paying for Provider B.
Provider Selection
Choosing Your Primary Provider
Your primary provider should optimize for:
- Data quality: Highest accuracy, most current data
- Coverage for your ICP: Best match rates for your target segments
- Integration ease: Native CRM connectors, API quality
Cost matters less for the primary—you're paying for most records anyway.
Choosing Secondary/Tertiary Providers
Secondary providers should optimize for:
- Complementary coverage: Strong where your primary is weak
- Cost efficiency: Lower per-record cost since you're paying for fewer records
- Specific strengths: Maybe better phone data, or better EMEA coverage
Provider Comparison by Segment
| Segment | Strong Providers | Notes |
|---|---|---|
| Enterprise (Fortune 1000) | ZoomInfo, Dun & Bradstreet | Most providers have good coverage here |
| Mid-Market | ZoomInfo, Apollo, Clearbit | Coverage varies more; test specific segments |
| SMB | Apollo, Lusha, RocketReach | Premium providers often weak here |
| Tech Companies | Clearbit, BuiltWith, HG Insights | Tech stack data valuable for targeting |
| EMEA | Cognism, Lusha | GDPR-compliant providers essential |
| APAC | ZoomInfo, LeadIQ | Coverage generally weaker; verify carefully |
Provider Comparison by Data Type
| Data Type | Strong Providers | Notes |
|---|---|---|
| Work Email | ZoomInfo, Clearbit, Apollo | Highest-value data point; validate regardless |
| Personal Email | ContactOut, RocketReach | Use carefully; higher privacy concerns |
| Direct Dial Phone | ZoomInfo, Cognism | Most difficult data to source accurately |
| Mobile Phone | Lusha, Cognism | Valuable but verify connectivity |
| Firmographics | Clearbit, ZoomInfo, D&B | Relatively commoditized; most providers good |
| Tech Stack | BuiltWith, HG Insights, Clearbit | Specialized providers more accurate |
Implementation Architecture
Option 1: Sequential API Calls
Simplest approach—query providers one at a time:
async function enrichContact(input) {
// Try Provider A first
let result = await providerA.enrich(input);
if (isComplete(result)) {
return { ...result, source: 'provider_a' };
}
// Fall through to Provider B
result = await providerB.enrich(input);
if (isComplete(result)) {
return { ...result, source: 'provider_b' };
}
// Fall through to Provider C
result = await providerC.enrich(input);
return {
...result,
source: result ? 'provider_c' : 'no_match'
};
}
function isComplete(result) {
return result &&
result.email &&
result.confidence >= 0.7;
}
Pros: Simple to implement and debug
Cons: Slower (sequential calls), doesn't optimize for cost
Option 2: Parallel with Priority
Query all providers simultaneously, use results by priority:
async function enrichContactParallel(input) {
// Query all providers in parallel
const [resultA, resultB, resultC] = await Promise.all([
providerA.enrich(input).catch(() => null),
providerB.enrich(input).catch(() => null),
providerC.enrich(input).catch(() => null)
]);
// Use results by priority
if (isComplete(resultA)) {
return { ...resultA, source: 'provider_a' };
}
if (isComplete(resultB)) {
return { ...resultB, source: 'provider_b' };
}
if (isComplete(resultC)) {
return { ...resultC, source: 'provider_c' };
}
// Return best partial result
return mergePartialResults(resultA, resultB, resultC);
}
Pros: Fastest (parallel execution), gets all available data
Cons: Most expensive (pays all providers for every record)
Option 3: Smart Waterfall with Routing
Route to specific providers based on input characteristics:
async function enrichContactSmart(input) {
// Determine which providers to try based on input
const providers = selectProviders(input);
for (const provider of providers) {
const result = await provider.enrich(input);
if (isComplete(result)) {
return { ...result, source: provider.name };
}
}
return { source: 'no_match' };
}
function selectProviders(input) {
const providers = [];
// Route EMEA contacts to Cognism first
if (isEMEA(input.company)) {
providers.push(cognism, zoomInfo, apollo);
}
// Route SMB to Apollo first (better coverage, lower cost)
else if (isSMB(input.company)) {
providers.push(apollo, lusha, zoomInfo);
}
// Enterprise goes to ZoomInfo first
else {
providers.push(zoomInfo, clearbit, apollo);
}
return providers;
}
Pros: Optimizes cost and coverage based on segment
Cons: Requires segment detection, more complex to maintain
Handling Data Conflicts
When multiple providers return data, you'll encounter conflicts. Provider A says the person is VP of Sales; Provider B says Director of Business Development. Which do you trust?
Strategy 1: Strict Hierarchy
Primary provider always wins. Simple but may miss better data from secondary sources.
Strategy 2: Recency Wins
Use the most recently updated data, regardless of provider. Requires tracking data freshness.
Strategy 3: Confidence Scoring
Build a composite score based on:
- Provider reliability (based on your historical accuracy)
- Data freshness (when was it last verified?)
- Verification status (was it validated?)
- Source type (scraped vs. self-reported vs. verified)
function scoreDataPoint(data, providerName) {
const providerWeight = {
'zoominfo': 0.9,
'clearbit': 0.85,
'apollo': 0.75
};
let score = providerWeight[providerName] || 0.5;
// Boost for recent data
const daysSinceVerified = daysSince(data.lastVerified);
if (daysSinceVerified < 30) score *= 1.2;
else if (daysSinceVerified > 180) score *= 0.7;
// Boost for verified data
if (data.isVerified) score *= 1.1;
return Math.min(score, 1.0);
}
Strategy 4: Field-Level Merging
Take the best data for each field from different providers:
- Email from Provider A (highest deliverability)
- Phone from Provider B (best phone coverage)
- Title from Provider A (most recent)
- Company data from Provider C (most complete)
Always validate email: Regardless of which provider supplied the email, run it through a validation service before use. A bounced email costs more than the validation fee.
Cost Optimization
Pay-Per-Match vs. Pay-Per-Query
Understand your contract structure:
| Model | How It Works | Waterfall Implication |
|---|---|---|
| Pay-per-match | Only charged when data is returned | Low risk to query; can try multiple providers |
| Pay-per-query | Charged for every API call | Expensive waterfall; optimize routing |
| Credit bucket | Monthly credit allocation | Monitor usage; may need to throttle end of month |
| Unlimited | Flat monthly fee | Query freely; maximize usage |
Optimization Strategies
- Cache results: Don't re-enrich the same contact multiple times. Cache for 30-90 days.
- Batch during off-peak: Some providers offer lower rates for batch processing
- Segment routing: Route SMB to lower-cost providers that have good SMB coverage
- Skip validation for known-good: If you have a recently validated email, don't re-validate
- Pre-filter impossible matches: Don't query for contacts at companies too small to have the role
Sample Cost Calculation
For 10,000 contact enrichments with a 3-provider waterfall:
| Provider | Coverage | Records Queried | Cost/Record | Total |
|---|---|---|---|---|
| Provider A (Primary) | 70% | 10,000 | $0.25 | $2,500 |
| Provider B (Secondary) | 60% of remaining | 3,000 | $0.15 | $450 |
| Provider C (Tertiary) | 50% of remaining | 1,200 | $0.10 | $120 |
| Total | 91% coverage | - | - | $3,070 |
Effective cost: $0.31/contact for 91% coverage vs. $0.25/contact for 70% coverage with single provider.
Monitoring and Quality Control
Key Metrics to Track
- Coverage rate by provider: What % of queries return matches?
- Fallthrough rate: How often does each provider fail, triggering the next?
- Accuracy by provider: Track bounce rates and connection rates by source
- Cost per enriched record: Total cost / successful enrichments
- Time to enrich: Latency matters for real-time use cases
Quality Monitoring Dashboard
| Metric | Provider A | Provider B | Provider C |
|---|---|---|---|
| Match rate | 72% | 58% | 45% |
| Email bounce rate | 3% | 7% | 12% |
| Phone connect rate | 65% | 48% | 35% |
| Avg. confidence score | 0.85 | 0.72 | 0.61 |
Review monthly. If a provider's quality drops significantly, consider reordering or replacing them.
Alerting
Set up alerts for:
- Match rate drops >10% week-over-week
- Bounce rate exceeds threshold (5% is typical)
- API errors or timeouts spike
- Cost per record increases significantly
Waterfall Platforms
You can build your own waterfall or use platforms that handle orchestration:
DIY Approach
- Pros: Full control, can optimize for your specific needs
- Cons: Engineering effort, need to manage multiple API integrations
- Best for: Teams with engineering resources and complex requirements
Orchestration Platforms
| Platform | Providers Supported | Best For |
|---|---|---|
| Clay | 50+ providers | Outbound sales teams, flexible workflows |
| Clearbit (Breeze) | Clearbit + others | HubSpot users, B2B marketing |
| LeadGenius | Multiple + custom | Enterprise, custom research needs |
| Openprise | Most major providers | Enterprise RevOps, complex routing |
Implementation Checklist
Planning Phase
- Define coverage requirements (what % is acceptable?)
- Identify target segments and their characteristics
- Audit current provider coverage on sample data
- Calculate cost scenarios for different waterfall configurations
- Define quality thresholds (confidence scores, freshness requirements)
Build Phase
- Set up API integrations with selected providers
- Implement waterfall logic with fallthrough rules
- Add caching layer to avoid duplicate enrichments
- Build validation layer (email, phone verification)
- Tag data with source provider for tracking
Launch Phase
- Run pilot on sample data (1,000-5,000 records)
- Measure coverage and quality metrics
- Adjust provider order and fallthrough rules based on results
- Set up monitoring and alerting
- Document procedures for handling quality issues
Frequently Asked Questions
What is a contact data waterfall?
A contact data waterfall is a strategy that queries multiple data providers in sequence to maximize contact data coverage. When the first provider doesn't have data for a contact, the request falls through to the second provider, then the third, and so on—like water cascading down a waterfall.
How many providers should be in a waterfall?
Most waterfalls use 2-4 providers. Beyond 4, you see diminishing returns—each additional provider adds maybe 5-10% more coverage at significant cost and complexity. Start with 2 providers, measure your coverage gap, and add more only if needed.
What order should providers be in the waterfall?
Order providers by: (1) data quality/accuracy, (2) coverage for your target segments, and (3) cost per match. Put your highest-quality provider first, then backfill with providers that have good coverage in segments where your primary is weak. Always validate results before use.
How do you handle conflicting data from different providers?
Establish a provider hierarchy where your most trusted source wins conflicts. Alternatively, implement confidence scoring based on data freshness, source type, and historical accuracy. For critical data like email, validate with a third-party tool regardless of source.
Need help with your data?
Tell us about your data challenges and we'll show you what clean, enriched data looks like.
See What We'll FindAbout the Author
Rome Thorndike is the founder of Verum, where he helps B2B companies clean, enrich, and maintain their CRM data. With over 10 years of experience in data at Microsoft, Databricks, and Salesforce, Rome has seen firsthand how data quality impacts revenue operations.