Strategy Guide

Contact Data Waterfall Strategy: Maximize Coverage with Multiple Providers

14 min read

No single data provider has complete coverage. Based on Verum's testing across multiple B2B segments, individual providers typically cover 60-75% of target accounts—but the gaps differ by provider. When you combine them strategically in a waterfall, you can reach 85-95% coverage.

This is the waterfall strategy: query multiple providers in sequence until you get a match. When Provider A doesn't have the data, fall through to Provider B, then Provider C. Like water cascading down, each level catches what the one above missed.

This guide covers how to build, optimize, and operate a contact data waterfall for maximum coverage at reasonable cost.

Why Use a Waterfall?

The math is simple but compelling:

Approach	Typical Coverage	Relative Cost
Single provider (premium)	60-70%	$$
2-provider waterfall	75-85%	$$ + $
3-provider waterfall	85-92%	$$ + $ + $
4+ provider waterfall	90-95%	$$$ (diminishing returns)

The incremental cost of Provider B only applies to records where Provider A failed. If your primary provider covers 70% of your records, you're only paying Provider B for the remaining 30%. This makes waterfalls surprisingly cost-effective.

How Waterfalls Work

Step 1: Input Record

Name, Company, LinkedIn URL

↓

Step 2: Provider A (Primary)

Best quality, highest cost

↓ No match?

Step 3: Provider B (Secondary)

Good coverage, moderate cost

↓ No match?

Step 4: Provider C (Tertiary)

Wide coverage, lower accuracy

↓

Step 5: Validation & Output

Verify email, standardize format

Waterfall Logic

Input: Start with what you know—name, company, LinkedIn URL, domain
Primary query: Hit your highest-quality (usually highest-cost) provider first
Evaluate result: Did you get a match? Is it complete? Does it meet quality thresholds?
Fall through: If no match or incomplete data, query the next provider
Validate: Regardless of source, validate emails and phones before use
Tag source: Track which provider supplied each data point for quality monitoring

What Triggers a Fallthrough?

Define clear rules for when to query the next provider:

No match: Provider returns no results for the input
Missing critical fields: Match found but missing email or phone
Low confidence: Provider returns data but with low confidence score
Stale data: Data found but last verified date is too old

Decision point: Should you always query the next provider if a field is missing? Not necessarily. If you need email AND phone, fall through if email is missing. But if phone is optional, you might accept an email-only result from Provider A rather than paying for Provider B.

Provider Selection

Choosing Your Primary Provider

Your primary provider should optimize for:

Data quality: Highest accuracy, most current data
Coverage for your ICP: Best match rates for your target segments
Integration ease: Native CRM connectors, API quality

Cost matters less for the primary—you're paying for most records anyway.

Choosing Secondary/Tertiary Providers

Secondary providers should optimize for:

Complementary coverage: Strong where your primary is weak
Cost efficiency: Lower per-record cost since you're paying for fewer records
Specific strengths: Maybe better phone data, or better EMEA coverage

Provider Comparison by Segment

Segment	Strong Providers	Notes
Enterprise (Fortune 1000)	ZoomInfo, Dun & Bradstreet	Most providers have good coverage here
Mid-Market	ZoomInfo, Apollo, Clearbit	Coverage varies more; test specific segments
SMB	Apollo, Lusha, RocketReach	Premium providers often weak here
Tech Companies	Clearbit, BuiltWith, HG Insights	Tech stack data valuable for targeting
EMEA	Cognism, Lusha	GDPR-compliant providers essential
APAC	ZoomInfo, LeadIQ	Coverage generally weaker; verify carefully

Provider Comparison by Data Type

Data Type	Strong Providers	Notes
Work Email	ZoomInfo, Clearbit, Apollo	Highest-value data point; validate regardless
Personal Email	ContactOut, RocketReach	Use carefully; higher privacy concerns
Direct Dial Phone	ZoomInfo, Cognism	Most difficult data to source accurately
Mobile Phone	Lusha, Cognism	Valuable but verify connectivity
Firmographics	Clearbit, ZoomInfo, D&B	Relatively commoditized; most providers good
Tech Stack	BuiltWith, HG Insights, Clearbit	Specialized providers more accurate

Implementation Architecture

Option 1: Sequential API Calls

Simplest approach—query providers one at a time:

async function enrichContact(input) {
  // Try Provider A first
  let result = await providerA.enrich(input);

  if (isComplete(result)) {
    return { ...result, source: 'provider_a' };
  }

  // Fall through to Provider B
  result = await providerB.enrich(input);

  if (isComplete(result)) {
    return { ...result, source: 'provider_b' };
  }

  // Fall through to Provider C
  result = await providerC.enrich(input);

  return {
    ...result,
    source: result ? 'provider_c' : 'no_match'
  };
}

function isComplete(result) {
  return result &&
         result.email &&
         result.confidence >= 0.7;
}
        

Pros: Simple to implement and debug
Cons: Slower (sequential calls), doesn't optimize for cost

Option 2: Parallel with Priority

Query all providers simultaneously, use results by priority:

async function enrichContactParallel(input) {
  // Query all providers in parallel
  const [resultA, resultB, resultC] = await Promise.all([
    providerA.enrich(input).catch(() => null),
    providerB.enrich(input).catch(() => null),
    providerC.enrich(input).catch(() => null)
  ]);

  // Use results by priority
  if (isComplete(resultA)) {
    return { ...resultA, source: 'provider_a' };
  }
  if (isComplete(resultB)) {
    return { ...resultB, source: 'provider_b' };
  }
  if (isComplete(resultC)) {
    return { ...resultC, source: 'provider_c' };
  }

  // Return best partial result
  return mergePartialResults(resultA, resultB, resultC);
}
        

Pros: Fastest (parallel execution), gets all available data
Cons: Most expensive (pays all providers for every record)

Option 3: Smart Waterfall with Routing

Route to specific providers based on input characteristics:

async function enrichContactSmart(input) {
  // Determine which providers to try based on input
  const providers = selectProviders(input);

  for (const provider of providers) {
    const result = await provider.enrich(input);
    if (isComplete(result)) {
      return { ...result, source: provider.name };
    }
  }

  return { source: 'no_match' };
}

function selectProviders(input) {
  const providers = [];

  // Route EMEA contacts to Cognism first
  if (isEMEA(input.company)) {
    providers.push(cognism, zoomInfo, apollo);
  }
  // Route SMB to Apollo first (better coverage, lower cost)
  else if (isSMB(input.company)) {
    providers.push(apollo, lusha, zoomInfo);
  }
  // Enterprise goes to ZoomInfo first
  else {
    providers.push(zoomInfo, clearbit, apollo);
  }

  return providers;
}
        

Pros: Optimizes cost and coverage based on segment
Cons: Requires segment detection, more complex to maintain

Handling Data Conflicts

When multiple providers return data, you'll encounter conflicts. Provider A says the person is VP of Sales; Provider B says Director of Business Development. Which do you trust?

Strategy 1: Strict Hierarchy

Primary provider always wins. Simple but may miss better data from secondary sources.

Strategy 2: Recency Wins

Use the most recently updated data, regardless of provider. Requires tracking data freshness.

Strategy 3: Confidence Scoring

Build a composite score based on:

Provider reliability (based on your historical accuracy)
Data freshness (when was it last verified?)
Verification status (was it validated?)
Source type (scraped vs. self-reported vs. verified)

function scoreDataPoint(data, providerName) {
  const providerWeight = {
    'zoominfo': 0.9,
    'clearbit': 0.85,
    'apollo': 0.75
  };

  let score = providerWeight[providerName] || 0.5;

  // Boost for recent data
  const daysSinceVerified = daysSince(data.lastVerified);
  if (daysSinceVerified < 30) score *= 1.2;
  else if (daysSinceVerified > 180) score *= 0.7;

  // Boost for verified data
  if (data.isVerified) score *= 1.1;

  return Math.min(score, 1.0);
}
        

Strategy 4: Field-Level Merging

Take the best data for each field from different providers:

Email from Provider A (highest deliverability)
Phone from Provider B (best phone coverage)
Title from Provider A (most recent)
Company data from Provider C (most complete)

Always validate email: Regardless of which provider supplied the email, run it through a validation service before use. A bounced email costs more than the validation fee.

Cost Optimization

Pay-Per-Match vs. Pay-Per-Query

Understand your contract structure:

Model	How It Works	Waterfall Implication
Pay-per-match	Only charged when data is returned	Low risk to query; can try multiple providers
Pay-per-query	Charged for every API call	Expensive waterfall; optimize routing
Credit bucket	Monthly credit allocation	Monitor usage; may need to throttle end of month
Unlimited	Flat monthly fee	Query freely; maximize usage

Optimization Strategies

Cache results: Don't re-enrich the same contact multiple times. Cache for 30-90 days.
Batch during off-peak: Some providers offer lower rates for batch processing
Segment routing: Route SMB to lower-cost providers that have good SMB coverage
Skip validation for known-good: If you have a recently validated email, don't re-validate
Pre-filter impossible matches: Don't query for contacts at companies too small to have the role

Sample Cost Calculation

For 10,000 contact enrichments with a 3-provider waterfall:

Provider	Coverage	Records Queried	Cost/Record	Total
Provider A (Primary)	70%	10,000	$0.25	$2,500
Provider B (Secondary)	60% of remaining	3,000	$0.15	$450
Provider C (Tertiary)	50% of remaining	1,200	$0.10	$120
Total	91% coverage	-	-	$3,070

Effective cost: $0.31/contact for 91% coverage vs. $0.25/contact for 70% coverage with single provider.

Monitoring and Quality Control

Key Metrics to Track

Coverage rate by provider: What % of queries return matches?
Fallthrough rate: How often does each provider fail, triggering the next?
Accuracy by provider: Track bounce rates and connection rates by source
Cost per enriched record: Total cost / successful enrichments
Time to enrich: Latency matters for real-time use cases

Quality Monitoring Dashboard

Metric	Provider A	Provider B	Provider C
Match rate	72%	58%	45%
Email bounce rate	3%	7%	12%
Phone connect rate	65%	48%	35%
Avg. confidence score	0.85	0.72	0.61

Review monthly. If a provider's quality drops significantly, consider reordering or replacing them.

Alerting

Set up alerts for:

Match rate drops >10% week-over-week
Bounce rate exceeds threshold (5% is typical)
API errors or timeouts spike
Cost per record increases significantly

Waterfall Platforms

You can build your own waterfall or use platforms that handle orchestration:

DIY Approach

Pros: Full control, can optimize for your specific needs
Cons: Engineering effort, need to manage multiple API integrations
Best for: Teams with engineering resources and complex requirements

Orchestration Platforms

Platform	Providers Supported	Best For
Clay	50+ providers	Outbound sales teams, flexible workflows
Clearbit (Breeze)	Clearbit + others	HubSpot users, B2B marketing
LeadGenius	Multiple + custom	Enterprise, custom research needs
Openprise	Most major providers	Enterprise RevOps, complex routing

Implementation Checklist

Planning Phase

Define coverage requirements (what % is acceptable?)
Identify target segments and their characteristics
Audit current provider coverage on sample data
Calculate cost scenarios for different waterfall configurations
Define quality thresholds (confidence scores, freshness requirements)

Build Phase

Set up API integrations with selected providers
Implement waterfall logic with fallthrough rules
Add caching layer to avoid duplicate enrichments
Build validation layer (email, phone verification)
Tag data with source provider for tracking

Launch Phase

Run pilot on sample data (1,000-5,000 records)
Measure coverage and quality metrics
Adjust provider order and fallthrough rules based on results
Set up monitoring and alerting
Document procedures for handling quality issues

Frequently Asked Questions

What is a contact data waterfall?

A contact data waterfall is a strategy that queries multiple data providers in sequence to maximize contact data coverage. When the first provider doesn't have data for a contact, the request falls through to the second provider, then the third, and so on—like water cascading down a waterfall.

How many providers should be in a waterfall?

Most waterfalls use 2-4 providers. Beyond 4, you see diminishing returns—each additional provider adds maybe 5-10% more coverage at significant cost and complexity. Start with 2 providers, measure your coverage gap, and add more only if needed.

What order should providers be in the waterfall?

Order providers by: (1) data quality/accuracy, (2) coverage for your target segments, and (3) cost per match. Put your highest-quality provider first, then backfill with providers that have good coverage in segments where your primary is weak. Always validate results before use.

How do you handle conflicting data from different providers?

Establish a provider hierarchy where your most trusted source wins conflicts. Alternatively, implement confidence scoring based on data freshness, source type, and historical accuracy. For critical data like email, validate with a third-party tool regardless of source.

Need help with your data?

Tell us about your data challenges and we'll show you what clean, enriched data looks like.

See What We'll Find

About the Author

Rome Thorndike is the founder of Verum, where he helps B2B companies clean, enrich, and maintain their CRM data. With over 10 years of experience in data at Microsoft, Databricks, and Salesforce, Rome has seen firsthand how data quality impacts revenue operations.