Technical Guide

Data Enrichment API Integration: A Technical Guide for Developers

18 min read

Integrating data enrichment APIs into your application seems straightforward until you hit your first rate limit, deal with inconsistent response formats, or try to handle partial matches gracefully. This guide covers the technical patterns and gotchas that turn a fragile integration into a robust one.

We'll focus on practical implementation details: authentication patterns, error handling, webhooks, caching, and the provider-specific quirks that documentation often glosses over.

Integration Architecture Patterns

Before writing code, decide on your integration pattern. The right choice depends on your use case, volume, and latency requirements.

Pattern 1: Synchronous Real-Time

Best for: Form submissions, live lookups, low-volume enrichment

User Request

→

Your API

→

Enrichment API

Enriched Response

→

Your API

→

User Response

Node.js / Express

async function enrichContact(email) {
  const response = await fetch('https://api.enrichment.com/v1/person', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.ENRICHMENT_API_KEY}`,
      'Content-Type': 'application/json'
    },
    body: JSON.stringify({ email })
  });

  if (!response.ok) {
    throw new EnrichmentError(response.status, await response.text());
  }

  return response.json();
}

Pros: Simple to implement, immediate feedback

Cons: Adds latency to user requests, vulnerable to provider outages

Pattern 2: Asynchronous Queue-Based

Best for: Bulk enrichment, background processing, high-volume operations

Records

→

Queue

→

Worker

→

Enrichment API

Enriched Response

→

Worker

→

Database

Python / Celery

from celery import Celery
from tenacity import retry, stop_after_attempt, wait_exponential

app = Celery('enrichment')

@app.task(bind=True, max_retries=3)
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, max=60))
def enrich_contact(self, contact_id, email):
    try:
        response = enrichment_client.enrich_person(email=email)
        update_contact(contact_id, response.data)
    except RateLimitError as e:
        # Re-queue with delay based on rate limit headers
        self.retry(countdown=e.retry_after)
    except EnrichmentError as e:
        log_enrichment_failure(contact_id, e)
        raise

Pros: Handles failures gracefully, scales horizontally, doesn't block users

Cons: More infrastructure, eventual consistency

Pattern 3: Webhook-Based

Best for: Large batch operations, providers with async-only APIs

Submit Batch

→

Enrichment API

→

Job ID

Enrichment API

→

Your Webhook

→

Database

We'll cover webhook implementation in detail in the webhooks section.

Authentication & Security

API Key Authentication

Most enrichment APIs use API key authentication. Common patterns:

Method	Header Format	Providers
Bearer Token	Authorization: Bearer sk_live_xxx	Clearbit, Apollo
API Key Header	X-API-Key: xxx	Hunter, Snov.io
Basic Auth	Authorization: Basic base64(key:)	Some legacy APIs
Query Parameter	?api_key=xxx	FullContact (deprecated)

Security Warning: Never expose API keys in client-side code. Even "read-only" keys can be abused to exhaust your quota. Always proxy enrichment requests through your backend.

Secure Key Management

Best Practices

// DON'T: Hardcode keys
const API_KEY = 'sk_live_abc123';  // Never do this

// DO: Use environment variables
const API_KEY = process.env.ENRICHMENT_API_KEY;

// BETTER: Use a secrets manager
const { SecretManagerServiceClient } = require('@google-cloud/secret-manager');
const client = new SecretManagerServiceClient();

async function getApiKey() {
  const [version] = await client.accessSecretVersion({
    name: 'projects/my-project/secrets/enrichment-api-key/versions/latest'
  });
  return version.payload.data.toString();
}

Key Rotation

Enterprise providers support multiple API keys for rotation. Implement a rotation strategy:

Generate new key in provider dashboard
Update secrets manager with new key
Deploy with new key
Monitor for errors using old key
Revoke old key after confirming new key works

Common Endpoints & Data Models

Person Enrichment

POST /v1/person/enrich

Enrich a person record by email, name + company, or LinkedIn URL

Request

{
  "email": "[email protected]",
  // OR
  "linkedin_url": "https://linkedin.com/in/janedoe",
  // OR
  "name": "Jane Doe",
  "company": "Acme Inc"
}

Response

{
  "id": "per_abc123",
  "email": "[email protected]",
  "name": {
    "full": "Jane Doe",
    "first": "Jane",
    "last": "Doe"
  },
  "title": "VP of Engineering",
  "seniority": "vp",
  "department": "engineering",
  "phone": "+1-555-123-4567",
  "linkedin": "https://linkedin.com/in/janedoe",
  "company": {
    "id": "com_xyz789",
    "name": "Acme Inc",
    "domain": "acme.com"
  },
  "confidence": 0.92,
  "last_updated": "2026-01-15T10:30:00Z"
}

Company Enrichment

GET /v1/company/enrich?domain=acme.com

Enrich a company by domain, name, or other identifiers

Response

{
  "id": "com_xyz789",
  "name": "Acme Inc",
  "legal_name": "Acme Incorporated",
  "domain": "acme.com",
  "industry": "Software",
  "sub_industry": "Enterprise Software",
  "employee_count": 500,
  "employee_range": "201-500",
  "revenue": 50000000,
  "revenue_range": "$10M-$50M",
  "founded_year": 2015,
  "location": {
    "city": "San Francisco",
    "state": "CA",
    "country": "US"
  },
  "technologies": ["Salesforce", "HubSpot", "Slack"],
  "social": {
    "linkedin": "https://linkedin.com/company/acme",
    "twitter": "https://twitter.com/acme"
  }
}

Handling Partial Matches

Not every enrichment returns complete data. Design your data model to handle partial results:

TypeScript

interface EnrichedPerson {
  email: string;
  name?: {
    full?: string;
    first?: string;
    last?: string;
  };
  title?: string;
  phone?: string;
  company?: EnrichedCompany;
  confidence: number;  // Always present
  enriched_at: Date;      // Always present
  enrichment_source: string;  // Track which provider
}

function mergeEnrichmentData(existing: Contact, enriched: EnrichedPerson): Contact {
  // Only overwrite if enriched data exists AND confidence is high enough
  return {
    ...existing,
    first_name: enriched.name?.first ?? existing.first_name,
    last_name: enriched.name?.last ?? existing.last_name,
    title: enriched.confidence > 0.8 ? (enriched.title ?? existing.title) : existing.title,
    phone: enriched.phone ?? existing.phone,
    enriched_at: enriched.enriched_at,
    enrichment_confidence: enriched.confidence
  };
}

Rate Limiting & Throttling

Every enrichment API has rate limits. Exceeding them results in 429 Too Many Requests responses and potentially temporary bans.

Common Rate Limit Headers

Header	Description
X-RateLimit-Limit	Maximum requests per window
X-RateLimit-Remaining	Requests left in current window
X-RateLimit-Reset	Unix timestamp when window resets
Retry-After	Seconds to wait before retrying (on 429)

Implementing a Rate Limiter

JavaScript - Token Bucket

class RateLimiter {
  constructor(maxRequests, windowMs) {
    this.maxRequests = maxRequests;
    this.windowMs = windowMs;
    this.tokens = maxRequests;
    this.lastRefill = Date.now();
  }

  async acquire() {
    this.refill();

    if (this.tokens > 0) {
      this.tokens--;
      return true;
    }

    // Calculate wait time
    const waitTime = this.windowMs - (Date.now() - this.lastRefill);
    await new Promise(resolve => setTimeout(resolve, waitTime));
    return this.acquire();
  }

  refill() {
    const now = Date.now();
    const elapsed = now - this.lastRefill;

    if (elapsed >= this.windowMs) {
      this.tokens = this.maxRequests;
      this.lastRefill = now;
    }
  }

  updateFromHeaders(headers) {
    const remaining = parseInt(headers.get('X-RateLimit-Remaining'), 10);
    if (!isNaN(remaining)) {
      this.tokens = remaining;
    }
  }
}

// Usage
const limiter = new RateLimiter(100, 60000);  // 100 requests per minute

async function enrichWithRateLimit(email) {
  await limiter.acquire();

  const response = await fetch('...');
  limiter.updateFromHeaders(response.headers);

  return response.json();
}

Batch Endpoint Optimization

When available, use batch endpoints to reduce API calls:

Batch Request

// Instead of 100 individual calls...
for (const email of emails) {
  await enrichPerson(email);  // 100 API calls
}

// Use a single batch call
const results = await fetch('https://api.enrichment.com/v1/person/bulk', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${API_KEY}` },
  body: JSON.stringify({
    emails: emails,  // Up to 100 emails per batch
    webhook_url: 'https://yourapp.com/webhooks/enrichment'
  })
});  // 1 API call

Error Handling & Retries

Common Error Codes

Status	Meaning	Action
200	Success (match found)	Process response
202	Accepted (async processing)	Wait for webhook
400	Bad request	Fix request, don't retry
401	Invalid API key	Check credentials
404	No match found	Mark as not enriched
422	Invalid input data	Validate input, don't retry
429	Rate limited	Backoff and retry
500	Server error	Retry with backoff
503	Service unavailable	Retry with backoff

Exponential Backoff with Jitter

JavaScript

async function enrichWithRetry(email, maxRetries = 3) {
  let lastError;

  for (let attempt = 0; attempt < maxRetries; attempt++) {
    try {
      const response = await enrichPerson(email);
      return response;
    } catch (error) {
      lastError = error;

      // Don't retry client errors (except rate limits)
      if (error.status >= 400 && error.status < 500 && error.status !== 429) {
        throw error;
      }

      // Calculate delay with exponential backoff + jitter
      const baseDelay = Math.pow(2, attempt) * 1000;  // 1s, 2s, 4s...
      const jitter = Math.random() * 1000;           // 0-1s random
      const delay = Math.min(baseDelay + jitter, 30000);  // Cap at 30s

      // Use Retry-After header if available
      if (error.retryAfter) {
        await sleep(error.retryAfter * 1000);
      } else {
        await sleep(delay);
      }
    }
  }

  throw lastError;
}

function sleep(ms) {
  return new Promise(resolve => setTimeout(resolve, ms));
}

Why jitter? Without jitter, multiple clients hitting a rate limit will all retry at exactly the same time, causing another rate limit. Random jitter spreads retries across time.

Circuit Breaker Pattern

Prevent cascading failures when an enrichment provider is down:

JavaScript

class CircuitBreaker {
  constructor(failureThreshold = 5, resetTimeMs = 60000) {
    this.failureCount = 0;
    this.failureThreshold = failureThreshold;
    this.resetTimeMs = resetTimeMs;
    this.state = 'CLOSED';  // CLOSED, OPEN, HALF_OPEN
    this.lastFailure = null;
  }

  async execute(fn) {
    if (this.state === 'OPEN') {
      if (Date.now() - this.lastFailure > this.resetTimeMs) {
        this.state = 'HALF_OPEN';
      } else {
        throw new Error('Circuit breaker is OPEN');
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failureCount = 0;
    this.state = 'CLOSED';
  }

  onFailure() {
    this.failureCount++;
    this.lastFailure = Date.now();

    if (this.failureCount >= this.failureThreshold) {
      this.state = 'OPEN';
    }
  }
}

// Usage
const breaker = new CircuitBreaker();

async function safeEnrich(email) {
  try {
    return await breaker.execute(() => enrichPerson(email));
  } catch (error) {
    if (error.message.includes('Circuit breaker')) {
      // Return cached data or skip enrichment
      return { enriched: false, reason: 'service_unavailable' };
    }
    throw error;
  }
}

Webhooks & Async Processing

Setting Up a Webhook Endpoint

Express.js

const crypto = require('crypto');

app.post('/webhooks/enrichment', async (req, res) => {
  // 1. Verify webhook signature
  const signature = req.headers['x-webhook-signature'];
  const payload = JSON.stringify(req.body);
  const expectedSig = crypto
    .createHmac('sha256', process.env.WEBHOOK_SECRET)
    .update(payload)
    .digest('hex');

  if (signature !== expectedSig) {
    return res.status(401).json({ error: 'Invalid signature' });
  }

  // 2. Acknowledge receipt immediately
  res.status(200).json({ received: true });

  // 3. Process asynchronously
  try {
    const { job_id, status, results } = req.body;

    if (status === 'completed') {
      for (const result of results) {
        await updateContact(result.request_id, result.data);
      }
    } else if (status === 'failed') {
      await logBatchFailure(job_id, results);
    }
  } catch (error) {
    // Log but don't fail - we already acknowledged
    console.error('Webhook processing error:', error);
  }
});

Always verify signatures. Without signature verification, anyone can send fake webhook payloads to your endpoint. Most providers include an HMAC signature in headers.

Handling Webhook Retries

Providers retry failed webhooks. Make your handler idempotent:

Idempotent Handler

async function handleWebhook(payload) {
  const { event_id, job_id, results } = payload;

  // Check if we've already processed this event
  const processed = await redis.get(`webhook:${event_id}`);
  if (processed) {
    console.log(`Webhook ${event_id} already processed, skipping`);
    return;
  }

  // Process the webhook
  await processResults(results);

  // Mark as processed (expire after 7 days)
  await redis.setex(`webhook:${event_id}`, 604800, 'processed');
}

Caching Strategies

Enrichment data doesn't change frequently. Caching reduces costs and improves performance.

Cache Key Design

Cache Keys

// Person enrichment - email is the primary key
const personKey = `enrich:person:${email.toLowerCase()}`;

// Company enrichment - domain is the primary key
const companyKey = `enrich:company:${domain.toLowerCase()}`;

// Include provider if using multiple
const keyWithProvider = `enrich:person:clearbit:${email.toLowerCase()}`;

Cache-Aside Pattern

Node.js with Redis

const CACHE_TTL = 86400 * 30;  // 30 days

async function enrichPersonCached(email) {
  const cacheKey = `enrich:person:${email.toLowerCase()}`;

  // 1. Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }

  // 2. Call API
  const result = await enrichPerson(email);

  // 3. Cache result (including "not found" to prevent repeated lookups)
  await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(result));

  return result;
}

Cache Invalidation

Consider when to refresh enrichment data:

Time-based: Refresh after 30-90 days (balance freshness vs. cost)
Event-based: Refresh when user updates their profile
Confidence-based: Refresh low-confidence matches sooner
On-demand: Allow manual refresh when data seems stale

Conditional Refresh

async function getEnrichedPerson(email, forceRefresh = false) {
  const existing = await getFromCache(email);

  if (!forceRefresh && existing) {
    const age = Date.now() - existing.enriched_at;
    const maxAge = existing.confidence > 0.9
      ? 90 * 24 * 60 * 60 * 1000  // 90 days for high confidence
      : 30 * 24 * 60 * 60 * 1000; // 30 days for low confidence

    if (age < maxAge) {
      return existing;
    }
  }

  return enrichPersonCached(email);
}

Testing & Monitoring

Unit Testing with Mocks

Jest

import { enrichPerson } from './enrichment';
import { mockEnrichmentResponse } from './fixtures';

jest.mock('./enrichment-client');

describe('enrichPerson', () => {
  it('returns enriched data for valid email', async () => {
    enrichmentClient.enrich.mockResolvedValue(mockEnrichmentResponse);

    const result = await enrichPerson('[email protected]');

    expect(result.name.full).toBe('Jane Doe');
    expect(result.confidence).toBeGreaterThan(0.8);
  });

  it('handles rate limit with retry', async () => {
    enrichmentClient.enrich
      .mockRejectedValueOnce({ status: 429, retryAfter: 1 })
      .mockResolvedValue(mockEnrichmentResponse);

    const result = await enrichPerson('[email protected]');

    expect(enrichmentClient.enrich).toHaveBeenCalledTimes(2);
    expect(result.name.full).toBe('Jane Doe');
  });

  it('returns null for not found', async () => {
    enrichmentClient.enrich.mockRejectedValue({ status: 404 });

    const result = await enrichPerson('[email protected]');

    expect(result).toBeNull();
  });
});

Recording API Responses

Use VCR-style recording for integration tests:

Nock (Node.js)

import nock from 'nock';

// Record mode: capture real API responses
nock.recorder.rec({ output_objects: true });

// Playback mode: use recorded responses
const scope = nock('https://api.enrichment.com')
  .get('/v1/[email protected]')
  .reply(200, recordedResponse);

Key Metrics to Monitor

Enrichment rate: % of records successfully enriched
Match quality: Average confidence score
API latency: P50, P95, P99 response times
Error rate: % of requests that fail
Rate limit hits: How often you're throttled
Cache hit rate: % of requests served from cache
Cost per enrichment: Track spend vs. budget

Prometheus Metrics

const enrichmentTotal = new Counter({
  name: 'enrichment_requests_total',
  help: 'Total enrichment requests',
  labelNames: ['provider', 'status', 'type']
});

const enrichmentLatency = new Histogram({
  name: 'enrichment_latency_seconds',
  help: 'Enrichment request latency',
  labelNames: ['provider'],
  buckets: [0.1, 0.5, 1, 2, 5]
});

const enrichmentConfidence = new Histogram({
  name: 'enrichment_confidence',
  help: 'Distribution of enrichment confidence scores',
  buckets: [0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0]
});

Provider-Specific Notes

Clearbit

See the official Clearbit API documentation for complete implementation details.

Auth: Bearer token in Authorization header
Rate limit: Varies by plan (check your API dashboard for current limits)
Gotcha: Returns 202 for async lookups, poll for results or use webhooks
Best for: Real-time enrichment with streaming updates

ZoomInfo

Refer to the ZoomInfo Developer Portal for authentication flows and endpoint specifications.

Auth: OAuth 2.0 with JWT tokens (see authentication docs)
Rate limit: Plan-dependent (contact ZoomInfo for your specific limits)
Gotcha: Complex token refresh flow, check developer docs for token expiration details
Best for: High-quality B2B data, large enterprises

Apollo

The Apollo API documentation provides full endpoint references and usage examples.

Auth: API key in header (see authentication guide)
Rate limit: Check current rate limits in the API docs (varies by plan)
Gotcha: Credit-based pricing, some endpoints cost more
Best for: Sales prospecting, bulk enrichment

Hunter

Auth: API key as query parameter or header
Rate limit: Plan-dependent (25-500/minute)
Gotcha: Email verification separate from enrichment
Best for: Email discovery and verification

Lusha

Auth: API key in header
Rate limit: Contact support for limits
Gotcha: Direct dial phone numbers have different pricing
Best for: Contact phone numbers, especially mobile

Need Help with Your Integration?

We've built enrichment integrations for dozens of companies. Get expert guidance on architecture, vendor selection, and implementation.

Get a Free Assessment

Frequently Asked Questions

What authentication methods do enrichment APIs use?

Most enrichment APIs use API key authentication via headers (Authorization: Bearer or X-API-Key). Some enterprise providers also support OAuth 2.0 for more granular access control. Always use HTTPS and never expose API keys in client-side code.

How should I handle rate limits in enrichment APIs?

Implement exponential backoff with jitter when you receive 429 responses. Most APIs return rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset) that you can use to proactively throttle requests. For bulk operations, use batch endpoints or implement a queue system.

What's the difference between synchronous and asynchronous enrichment APIs?

Synchronous APIs return enriched data immediately in the response, ideal for real-time lookups. Asynchronous APIs accept requests and deliver results via webhooks or polling, better for bulk operations. Many providers offer both modes depending on the endpoint.

How do I test enrichment API integrations?

Use sandbox environments when available. Create mock responses for unit tests. Use rate-limited test accounts for integration tests. Always test error scenarios including timeouts, rate limits, and malformed responses. Consider using tools like VCR to record and replay API responses.

Need help with your data?

Tell us about your data challenges and we'll show you what clean, enriched data looks like.

See What We'll Find

About the Author

Rome Thorndike is the founder of Verum, where he helps B2B companies clean, enrich, and maintain their CRM data. With over 10 years of experience in data at Microsoft, Databricks, and Salesforce, Rome has seen firsthand how data quality impacts revenue operations.

Integration Architecture Patterns

Pattern 1: Synchronous Real-Time

Pattern 2: Asynchronous Queue-Based

Pattern 3: Webhook-Based

Authentication & Security

API Key Authentication

Secure Key Management

Key Rotation

Common Endpoints & Data Models

Person Enrichment

Company Enrichment

Handling Partial Matches

Rate Limiting & Throttling

Common Rate Limit Headers

Implementing a Rate Limiter

Batch Endpoint Optimization

Error Handling & Retries

Common Error Codes

Exponential Backoff with Jitter

Circuit Breaker Pattern

Webhooks & Async Processing

Setting Up a Webhook Endpoint

Handling Webhook Retries

Caching Strategies

Cache Key Design

Cache-Aside Pattern

Cache Invalidation

Testing & Monitoring

Unit Testing with Mocks

Recording API Responses

Key Metrics to Monitor

Provider-Specific Notes

Clearbit

ZoomInfo

Apollo

Hunter

Lusha

Need Help with Your Integration?

Frequently Asked Questions

What authentication methods do enrichment APIs use?

How should I handle rate limits in enrichment APIs?

What's the difference between synchronous and asynchronous enrichment APIs?

How do I test enrichment API integrations?

Related Resources

Need help with your data?