Data Enrichment API Integration: A Technical Guide for Developers
Integrating data enrichment APIs into your application seems straightforward until you hit your first rate limit, deal with inconsistent response formats, or try to handle partial matches gracefully. This guide covers the technical patterns and gotchas that turn a fragile integration into a robust one.
We'll focus on practical implementation details: authentication patterns, error handling, webhooks, caching, and the provider-specific quirks that documentation often glosses over.
Integration Architecture Patterns
Before writing code, decide on your integration pattern. The right choice depends on your use case, volume, and latency requirements.
Pattern 1: Synchronous Real-Time
Best for: Form submissions, live lookups, low-volume enrichment
async function enrichContact(email) {
const response = await fetch('https://api.enrichment.com/v1/person', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.ENRICHMENT_API_KEY}`,
'Content-Type': 'application/json'
},
body: JSON.stringify({ email })
});
if (!response.ok) {
throw new EnrichmentError(response.status, await response.text());
}
return response.json();
}
Pros: Simple to implement, immediate feedback
Cons: Adds latency to user requests, vulnerable to provider outages
Pattern 2: Asynchronous Queue-Based
Best for: Bulk enrichment, background processing, high-volume operations
from celery import Celery
from tenacity import retry, stop_after_attempt, wait_exponential
app = Celery('enrichment')
@app.task(bind=True, max_retries=3)
@retry(stop=stop_after_attempt(3), wait=wait_exponential(multiplier=1, max=60))
def enrich_contact(self, contact_id, email):
try:
response = enrichment_client.enrich_person(email=email)
update_contact(contact_id, response.data)
except RateLimitError as e:
# Re-queue with delay based on rate limit headers
self.retry(countdown=e.retry_after)
except EnrichmentError as e:
log_enrichment_failure(contact_id, e)
raise
Pros: Handles failures gracefully, scales horizontally, doesn't block users
Cons: More infrastructure, eventual consistency
Pattern 3: Webhook-Based
Best for: Large batch operations, providers with async-only APIs
We'll cover webhook implementation in detail in the webhooks section.
Authentication & Security
API Key Authentication
Most enrichment APIs use API key authentication. Common patterns:
| Method | Header Format | Providers |
|---|---|---|
| Bearer Token | Authorization: Bearer sk_live_xxx | Clearbit, Apollo |
| API Key Header | X-API-Key: xxx | Hunter, Snov.io |
| Basic Auth | Authorization: Basic base64(key:) | Some legacy APIs |
| Query Parameter | ?api_key=xxx | FullContact (deprecated) |
Security Warning: Never expose API keys in client-side code. Even "read-only" keys can be abused to exhaust your quota. Always proxy enrichment requests through your backend.
Secure Key Management
// DON'T: Hardcode keys
const API_KEY = 'sk_live_abc123'; // Never do this
// DO: Use environment variables
const API_KEY = process.env.ENRICHMENT_API_KEY;
// BETTER: Use a secrets manager
const { SecretManagerServiceClient } = require('@google-cloud/secret-manager');
const client = new SecretManagerServiceClient();
async function getApiKey() {
const [version] = await client.accessSecretVersion({
name: 'projects/my-project/secrets/enrichment-api-key/versions/latest'
});
return version.payload.data.toString();
}
Key Rotation
Enterprise providers support multiple API keys for rotation. Implement a rotation strategy:
- Generate new key in provider dashboard
- Update secrets manager with new key
- Deploy with new key
- Monitor for errors using old key
- Revoke old key after confirming new key works
Common Endpoints & Data Models
Person Enrichment
Enrich a person record by email, name + company, or LinkedIn URL
{
"email": "[email protected]",
// OR
"linkedin_url": "https://linkedin.com/in/janedoe",
// OR
"name": "Jane Doe",
"company": "Acme Inc"
}
{
"id": "per_abc123",
"email": "[email protected]",
"name": {
"full": "Jane Doe",
"first": "Jane",
"last": "Doe"
},
"title": "VP of Engineering",
"seniority": "vp",
"department": "engineering",
"phone": "+1-555-123-4567",
"linkedin": "https://linkedin.com/in/janedoe",
"company": {
"id": "com_xyz789",
"name": "Acme Inc",
"domain": "acme.com"
},
"confidence": 0.92,
"last_updated": "2026-01-15T10:30:00Z"
}
Company Enrichment
Enrich a company by domain, name, or other identifiers
{
"id": "com_xyz789",
"name": "Acme Inc",
"legal_name": "Acme Incorporated",
"domain": "acme.com",
"industry": "Software",
"sub_industry": "Enterprise Software",
"employee_count": 500,
"employee_range": "201-500",
"revenue": 50000000,
"revenue_range": "$10M-$50M",
"founded_year": 2015,
"location": {
"city": "San Francisco",
"state": "CA",
"country": "US"
},
"technologies": ["Salesforce", "HubSpot", "Slack"],
"social": {
"linkedin": "https://linkedin.com/company/acme",
"twitter": "https://twitter.com/acme"
}
}
Handling Partial Matches
Not every enrichment returns complete data. Design your data model to handle partial results:
interface EnrichedPerson {
email: string;
name?: {
full?: string;
first?: string;
last?: string;
};
title?: string;
phone?: string;
company?: EnrichedCompany;
confidence: number; // Always present
enriched_at: Date; // Always present
enrichment_source: string; // Track which provider
}
function mergeEnrichmentData(existing: Contact, enriched: EnrichedPerson): Contact {
// Only overwrite if enriched data exists AND confidence is high enough
return {
...existing,
first_name: enriched.name?.first ?? existing.first_name,
last_name: enriched.name?.last ?? existing.last_name,
title: enriched.confidence > 0.8 ? (enriched.title ?? existing.title) : existing.title,
phone: enriched.phone ?? existing.phone,
enriched_at: enriched.enriched_at,
enrichment_confidence: enriched.confidence
};
}
Rate Limiting & Throttling
Every enrichment API has rate limits. Exceeding them results in 429 Too Many Requests responses and potentially temporary bans.
Common Rate Limit Headers
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests per window |
| X-RateLimit-Remaining | Requests left in current window |
| X-RateLimit-Reset | Unix timestamp when window resets |
| Retry-After | Seconds to wait before retrying (on 429) |
Implementing a Rate Limiter
class RateLimiter {
constructor(maxRequests, windowMs) {
this.maxRequests = maxRequests;
this.windowMs = windowMs;
this.tokens = maxRequests;
this.lastRefill = Date.now();
}
async acquire() {
this.refill();
if (this.tokens > 0) {
this.tokens--;
return true;
}
// Calculate wait time
const waitTime = this.windowMs - (Date.now() - this.lastRefill);
await new Promise(resolve => setTimeout(resolve, waitTime));
return this.acquire();
}
refill() {
const now = Date.now();
const elapsed = now - this.lastRefill;
if (elapsed >= this.windowMs) {
this.tokens = this.maxRequests;
this.lastRefill = now;
}
}
updateFromHeaders(headers) {
const remaining = parseInt(headers.get('X-RateLimit-Remaining'), 10);
if (!isNaN(remaining)) {
this.tokens = remaining;
}
}
}
// Usage
const limiter = new RateLimiter(100, 60000); // 100 requests per minute
async function enrichWithRateLimit(email) {
await limiter.acquire();
const response = await fetch('...');
limiter.updateFromHeaders(response.headers);
return response.json();
}
Batch Endpoint Optimization
When available, use batch endpoints to reduce API calls:
// Instead of 100 individual calls...
for (const email of emails) {
await enrichPerson(email); // 100 API calls
}
// Use a single batch call
const results = await fetch('https://api.enrichment.com/v1/person/bulk', {
method: 'POST',
headers: { 'Authorization': `Bearer ${API_KEY}` },
body: JSON.stringify({
emails: emails, // Up to 100 emails per batch
webhook_url: 'https://yourapp.com/webhooks/enrichment'
})
}); // 1 API call
Error Handling & Retries
Common Error Codes
| Status | Meaning | Action |
|---|---|---|
| 200 | Success (match found) | Process response |
| 202 | Accepted (async processing) | Wait for webhook |
| 400 | Bad request | Fix request, don't retry |
| 401 | Invalid API key | Check credentials |
| 404 | No match found | Mark as not enriched |
| 422 | Invalid input data | Validate input, don't retry |
| 429 | Rate limited | Backoff and retry |
| 500 | Server error | Retry with backoff |
| 503 | Service unavailable | Retry with backoff |
Exponential Backoff with Jitter
async function enrichWithRetry(email, maxRetries = 3) {
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await enrichPerson(email);
return response;
} catch (error) {
lastError = error;
// Don't retry client errors (except rate limits)
if (error.status >= 400 && error.status < 500 && error.status !== 429) {
throw error;
}
// Calculate delay with exponential backoff + jitter
const baseDelay = Math.pow(2, attempt) * 1000; // 1s, 2s, 4s...
const jitter = Math.random() * 1000; // 0-1s random
const delay = Math.min(baseDelay + jitter, 30000); // Cap at 30s
// Use Retry-After header if available
if (error.retryAfter) {
await sleep(error.retryAfter * 1000);
} else {
await sleep(delay);
}
}
}
throw lastError;
}
function sleep(ms) {
return new Promise(resolve => setTimeout(resolve, ms));
}
Why jitter? Without jitter, multiple clients hitting a rate limit will all retry at exactly the same time, causing another rate limit. Random jitter spreads retries across time.
Circuit Breaker Pattern
Prevent cascading failures when an enrichment provider is down:
class CircuitBreaker {
constructor(failureThreshold = 5, resetTimeMs = 60000) {
this.failureCount = 0;
this.failureThreshold = failureThreshold;
this.resetTimeMs = resetTimeMs;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
this.lastFailure = null;
}
async execute(fn) {
if (this.state === 'OPEN') {
if (Date.now() - this.lastFailure > this.resetTimeMs) {
this.state = 'HALF_OPEN';
} else {
throw new Error('Circuit breaker is OPEN');
}
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'CLOSED';
}
onFailure() {
this.failureCount++;
this.lastFailure = Date.now();
if (this.failureCount >= this.failureThreshold) {
this.state = 'OPEN';
}
}
}
// Usage
const breaker = new CircuitBreaker();
async function safeEnrich(email) {
try {
return await breaker.execute(() => enrichPerson(email));
} catch (error) {
if (error.message.includes('Circuit breaker')) {
// Return cached data or skip enrichment
return { enriched: false, reason: 'service_unavailable' };
}
throw error;
}
}
Webhooks & Async Processing
Setting Up a Webhook Endpoint
const crypto = require('crypto');
app.post('/webhooks/enrichment', async (req, res) => {
// 1. Verify webhook signature
const signature = req.headers['x-webhook-signature'];
const payload = JSON.stringify(req.body);
const expectedSig = crypto
.createHmac('sha256', process.env.WEBHOOK_SECRET)
.update(payload)
.digest('hex');
if (signature !== expectedSig) {
return res.status(401).json({ error: 'Invalid signature' });
}
// 2. Acknowledge receipt immediately
res.status(200).json({ received: true });
// 3. Process asynchronously
try {
const { job_id, status, results } = req.body;
if (status === 'completed') {
for (const result of results) {
await updateContact(result.request_id, result.data);
}
} else if (status === 'failed') {
await logBatchFailure(job_id, results);
}
} catch (error) {
// Log but don't fail - we already acknowledged
console.error('Webhook processing error:', error);
}
});
Always verify signatures. Without signature verification, anyone can send fake webhook payloads to your endpoint. Most providers include an HMAC signature in headers.
Handling Webhook Retries
Providers retry failed webhooks. Make your handler idempotent:
async function handleWebhook(payload) {
const { event_id, job_id, results } = payload;
// Check if we've already processed this event
const processed = await redis.get(`webhook:${event_id}`);
if (processed) {
console.log(`Webhook ${event_id} already processed, skipping`);
return;
}
// Process the webhook
await processResults(results);
// Mark as processed (expire after 7 days)
await redis.setex(`webhook:${event_id}`, 604800, 'processed');
}
Caching Strategies
Enrichment data doesn't change frequently. Caching reduces costs and improves performance.
Cache Key Design
// Person enrichment - email is the primary key
const personKey = `enrich:person:${email.toLowerCase()}`;
// Company enrichment - domain is the primary key
const companyKey = `enrich:company:${domain.toLowerCase()}`;
// Include provider if using multiple
const keyWithProvider = `enrich:person:clearbit:${email.toLowerCase()}`;
Cache-Aside Pattern
const CACHE_TTL = 86400 * 30; // 30 days
async function enrichPersonCached(email) {
const cacheKey = `enrich:person:${email.toLowerCase()}`;
// 1. Try cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// 2. Call API
const result = await enrichPerson(email);
// 3. Cache result (including "not found" to prevent repeated lookups)
await redis.setex(cacheKey, CACHE_TTL, JSON.stringify(result));
return result;
}
Cache Invalidation
Consider when to refresh enrichment data:
- Time-based: Refresh after 30-90 days (balance freshness vs. cost)
- Event-based: Refresh when user updates their profile
- Confidence-based: Refresh low-confidence matches sooner
- On-demand: Allow manual refresh when data seems stale
async function getEnrichedPerson(email, forceRefresh = false) {
const existing = await getFromCache(email);
if (!forceRefresh && existing) {
const age = Date.now() - existing.enriched_at;
const maxAge = existing.confidence > 0.9
? 90 * 24 * 60 * 60 * 1000 // 90 days for high confidence
: 30 * 24 * 60 * 60 * 1000; // 30 days for low confidence
if (age < maxAge) {
return existing;
}
}
return enrichPersonCached(email);
}
Testing & Monitoring
Unit Testing with Mocks
import { enrichPerson } from './enrichment';
import { mockEnrichmentResponse } from './fixtures';
jest.mock('./enrichment-client');
describe('enrichPerson', () => {
it('returns enriched data for valid email', async () => {
enrichmentClient.enrich.mockResolvedValue(mockEnrichmentResponse);
const result = await enrichPerson('[email protected]');
expect(result.name.full).toBe('Jane Doe');
expect(result.confidence).toBeGreaterThan(0.8);
});
it('handles rate limit with retry', async () => {
enrichmentClient.enrich
.mockRejectedValueOnce({ status: 429, retryAfter: 1 })
.mockResolvedValue(mockEnrichmentResponse);
const result = await enrichPerson('[email protected]');
expect(enrichmentClient.enrich).toHaveBeenCalledTimes(2);
expect(result.name.full).toBe('Jane Doe');
});
it('returns null for not found', async () => {
enrichmentClient.enrich.mockRejectedValue({ status: 404 });
const result = await enrichPerson('[email protected]');
expect(result).toBeNull();
});
});
Recording API Responses
Use VCR-style recording for integration tests:
import nock from 'nock';
// Record mode: capture real API responses
nock.recorder.rec({ output_objects: true });
// Playback mode: use recorded responses
const scope = nock('https://api.enrichment.com')
.get('/v1/[email protected]')
.reply(200, recordedResponse);
Key Metrics to Monitor
- Enrichment rate: % of records successfully enriched
- Match quality: Average confidence score
- API latency: P50, P95, P99 response times
- Error rate: % of requests that fail
- Rate limit hits: How often you're throttled
- Cache hit rate: % of requests served from cache
- Cost per enrichment: Track spend vs. budget
const enrichmentTotal = new Counter({
name: 'enrichment_requests_total',
help: 'Total enrichment requests',
labelNames: ['provider', 'status', 'type']
});
const enrichmentLatency = new Histogram({
name: 'enrichment_latency_seconds',
help: 'Enrichment request latency',
labelNames: ['provider'],
buckets: [0.1, 0.5, 1, 2, 5]
});
const enrichmentConfidence = new Histogram({
name: 'enrichment_confidence',
help: 'Distribution of enrichment confidence scores',
buckets: [0.5, 0.6, 0.7, 0.8, 0.9, 0.95, 1.0]
});
Provider-Specific Notes
Clearbit
See the official Clearbit API documentation for complete implementation details.
- Auth: Bearer token in Authorization header
- Rate limit: Varies by plan (check your API dashboard for current limits)
- Gotcha: Returns 202 for async lookups, poll for results or use webhooks
- Best for: Real-time enrichment with streaming updates
ZoomInfo
Refer to the ZoomInfo Developer Portal for authentication flows and endpoint specifications.
- Auth: OAuth 2.0 with JWT tokens (see authentication docs)
- Rate limit: Plan-dependent (contact ZoomInfo for your specific limits)
- Gotcha: Complex token refresh flow, check developer docs for token expiration details
- Best for: High-quality B2B data, large enterprises
Apollo
The Apollo API documentation provides full endpoint references and usage examples.
- Auth: API key in header (see authentication guide)
- Rate limit: Check current rate limits in the API docs (varies by plan)
- Gotcha: Credit-based pricing, some endpoints cost more
- Best for: Sales prospecting, bulk enrichment
Hunter
- Auth: API key as query parameter or header
- Rate limit: Plan-dependent (25-500/minute)
- Gotcha: Email verification separate from enrichment
- Best for: Email discovery and verification
Lusha
- Auth: API key in header
- Rate limit: Contact support for limits
- Gotcha: Direct dial phone numbers have different pricing
- Best for: Contact phone numbers, especially mobile
Need Help with Your Integration?
We've built enrichment integrations for dozens of companies. Get expert guidance on architecture, vendor selection, and implementation.
Get a Free AssessmentFrequently Asked Questions
What authentication methods do enrichment APIs use?
Most enrichment APIs use API key authentication via headers (Authorization: Bearer or X-API-Key). Some enterprise providers also support OAuth 2.0 for more granular access control. Always use HTTPS and never expose API keys in client-side code.
How should I handle rate limits in enrichment APIs?
Implement exponential backoff with jitter when you receive 429 responses. Most APIs return rate limit headers (X-RateLimit-Remaining, X-RateLimit-Reset) that you can use to proactively throttle requests. For bulk operations, use batch endpoints or implement a queue system.
What's the difference between synchronous and asynchronous enrichment APIs?
Synchronous APIs return enriched data immediately in the response, ideal for real-time lookups. Asynchronous APIs accept requests and deliver results via webhooks or polling, better for bulk operations. Many providers offer both modes depending on the endpoint.
How do I test enrichment API integrations?
Use sandbox environments when available. Create mock responses for unit tests. Use rate-limited test accounts for integration tests. Always test error scenarios including timeouts, rate limits, and malformed responses. Consider using tools like VCR to record and replay API responses.
Need help with your data?
Tell us about your data challenges and we'll show you what clean, enriched data looks like.
See What We'll FindAbout the Author
Rome Thorndike is the founder of Verum, where he helps B2B companies clean, enrich, and maintain their CRM data. With over 10 years of experience in data at Microsoft, Databricks, and Salesforce, Rome has seen firsthand how data quality impacts revenue operations.