Rate limit errors are the most predictable integration failures in MarTech development and the most commonly handled incorrectly. “Retry on 429” is not a rate limiting strategy — it is a reactive response to having already failed. The integrations that survive production load implement rate limiting as a client-side constraint, not as an error handler.
This guide covers the rate limiting architectures that actually hold up, with implementation patterns for the MarTech platforms that developers integrate most frequently.
Why Naive Retry Handling Fails
The standard bad implementation looks like this:
def call_api(url, params):
response = requests.get(url, params=params)
if response.status_code == 429:
time.sleep(int(response.headers.get('Retry-After', 60)))
return call_api(url, params) # Recursive retry
return response
This fails in several ways:
It sleeps the calling thread. For any concurrent workload, sleeping the thread blocks other work that could be proceeding.
It does not handle cascading retries. When multiple concurrent calls hit the rate limit simultaneously, they all retry with the same Retry-After delay, then all fire simultaneously again, creating a thundering herd.
It does not prevent the limit from being hit. The approach reacts to 429s rather than preventing them. In the time between request bursts and the rate limiter’s response, you may have already failed dozens of calls.
It creates unbounded recursion. Repeated rate limiting during a retry cycle can cause deep recursion and stack overflow in high-volume scenarios.
The Token Bucket Implementation
The token bucket algorithm is the correct client-side implementation for rate limiting. Tokens are added to the bucket at the rate allowed by the API. Each request consumes a token. When the bucket is empty, the request waits for a token before executing.
import threading
import time
from collections import deque
class TokenBucketRateLimiter:
def __init__(self, rate_per_second, burst_capacity=None):
"""
rate_per_second: Sustained rate allowed
burst_capacity: Max tokens that can accumulate (defaults to rate_per_second)
"""
self.rate = rate_per_second
self.capacity = burst_capacity or rate_per_second
self.tokens = self.capacity
self.last_refill = time.monotonic()
self.lock = threading.Lock()
def _refill(self):
now = time.monotonic()
elapsed = now - self.last_refill
refill_amount = elapsed * self.rate
self.tokens = min(self.capacity, self.tokens + refill_amount)
self.last_refill = now
def acquire(self, timeout=None):
"""Block until a token is available, or timeout expires."""
deadline = time.monotonic() + timeout if timeout else None
while True:
with self.lock:
self._refill()
if self.tokens >= 1:
self.tokens -= 1
return True
wait_time = (1 - self.tokens) / self.rate
if deadline and time.monotonic() + wait_time > deadline:
return False # Would exceed timeout
time.sleep(min(wait_time, 0.1)) # Check frequently
class HubSpotClient:
# HubSpot allows 100 requests per 10 seconds = 10/second
rate_limiter = TokenBucketRateLimiter(rate_per_second=10, burst_capacity=100)
def __init__(self, token):
self.token = token
self.session = requests.Session()
self.session.headers.update({
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
})
def request(self, method, url, **kwargs):
self.rate_limiter.acquire()
response = self.session.request(method, url, **kwargs)
# Even with rate limiting, occasional 429s can occur
if response.status_code == 429:
retry_after = int(response.headers.get('Retry-After', 10))
time.sleep(retry_after)
return self.request(method, url, **kwargs)
response.raise_for_status()
return response
Adaptive Rate Limiting
Static rate limiters calibrated to the documented API limit leave performance on the table. Most APIs have burst allowances that allow higher rates for short periods. An adaptive rate limiter adjusts based on actual API feedback:
class AdaptiveRateLimiter:
def __init__(self, initial_rate, min_rate, max_rate):
self.current_rate = initial_rate
self.min_rate = min_rate
self.max_rate = max_rate
self.success_streak = 0
self.backoff_factor = 0.5
self.recovery_factor = 1.1
self.lock = threading.Lock()
def on_success(self):
with self.lock:
self.success_streak += 1
# Gradually increase rate after consecutive successes
if self.success_streak >= 10:
self.current_rate = min(
self.max_rate,
self.current_rate * self.recovery_factor
)
self.success_streak = 0
def on_rate_limited(self):
with self.lock:
self.success_streak = 0
self.current_rate = max(
self.min_rate,
self.current_rate * self.backoff_factor
)
def get_rate(self):
return self.current_rate
Queue-Based Architecture for High-Volume Sync
For bulk synchronization jobs — importing contacts, syncing product catalogs, updating records — a queue-based architecture separates task production from task execution and naturally rate-limits by controlling worker count and inter-task delay:
import queue
import threading
from dataclasses import dataclass
from typing import Callable, Optional
@dataclass
class Task:
id: str
payload: dict
retry_count: int = 0
max_retries: int = 3
class RateLimitedWorkerPool:
def __init__(self, worker_count, rate_limiter, api_client):
self.task_queue = queue.Queue(maxsize=10000)
self.dead_letter_queue = queue.Queue()
self.rate_limiter = rate_limiter
self.api_client = api_client
self.workers = []
self._start_workers(worker_count)
def _start_workers(self, count):
for i in range(count):
t = threading.Thread(target=self._worker, daemon=True)
t.start()
self.workers.append(t)
def _worker(self):
while True:
task = self.task_queue.get()
try:
self.rate_limiter.acquire()
result = self.api_client.process(task.payload)
self.rate_limiter.on_success()
except RateLimitError as e:
self.rate_limiter.on_rate_limited()
if task.retry_count < task.max_retries:
task.retry_count += 1
self.task_queue.put(task) # Re-queue
else:
self.dead_letter_queue.put(task)
except Exception as e:
self.dead_letter_queue.put(task)
finally:
self.task_queue.task_done()
def submit(self, task):
self.task_queue.put(task)
def wait_completion(self):
self.task_queue.join()
Platform-Specific Rate Limits and Patterns
HubSpot
HubSpot’s limits: 100 requests per 10 seconds, 40,000 requests per day. Both limits are per portal, shared across all apps. The daily limit is the binding constraint for large sync operations.
The key optimization: HubSpot’s batch endpoints accept up to 100 records per request and count as a single API call. A batch upsert of 100 contacts consumes 1 API call instead of 100. At maximum throughput with batch operations:
100 requests/10 seconds × 100 records/request = 1,000 records/second
For a portal with 100,000 contacts, a full sync takes approximately 100 seconds — well within the daily limit. Without batch operations, the same sync at 10 records/second would take 10,000 seconds.
Salesforce
Salesforce limits are edition-dependent. The key limits:
- API requests: 1,000,000/24 hours (Enterprise/Unlimited) or 15,000 + 2,000 per licensed user (lower editions)
- Concurrent API requests: 25 (long-running requests)
- Bulk API limits: separate limits, much higher — designed for mass data operations
For Salesforce, the Bulk API 2.0 is the right tool for mass data operations. It uses a job-based model — create a job, upload data in batches, close the job, poll for results. The daily limit for Bulk API operations is measured in records processed, not API calls:
def create_bulk_upsert_job(client, object_type, external_id_field):
response = client.post('/services/data/v58.0/jobs/ingest', json={
'operation': 'upsert',
'object': object_type,
'externalIdFieldName': external_id_field,
'contentType': 'CSV',
'lineEnding': 'LF'
})
return response.json()['id']
Mailchimp
Mailchimp’s API limit: 10 simultaneous connections per API key, no stated per-second limit, but documented rate limit responses occur at high request rates. For list operations, the Batch API accepts up to 500 operations per call:
def batch_subscribe(client, list_id, members):
"""Submit up to 500 member subscribe/update operations in one call."""
operations = [
{
'method': 'PUT',
'path': f'/lists/{list_id}/members/{hashlib.md5(m["email"].lower().encode()).hexdigest()}',
'body': json.dumps({'email_address': m['email'], 'status_if_new': 'subscribed', **m.get('merge_fields', {})})
}
for m in members[:500]
]
return client.post('/batches', json={'operations': operations})
Google Analytics (Measurement Protocol)
GA4’s Measurement Protocol for server-side event collection has no enforced rate limit for standard use. However, the debug endpoint (/debug/mp/collect) has a 20-hit limit — use it only for validation, not for load testing production volumes.
For GA4 Reporting API reads, the default quota is 10 concurrent requests and 10,000 requests per day per property. Caching API responses is essential for dashboard applications that query GA4 data frequently.
Frequently Asked Questions
How do we handle rate limits across multiple MarTech tools running simultaneously?
Each tool has its own independent rate limit. The challenge is that your application may be calling multiple APIs concurrently from the same worker pool. Use separate rate limiters per API destination — a shared rate limiter across different APIs produces artificial throttling on API A because API B hit its limit.
Should we store failed API calls for later replay, and how?
Yes — a dead letter queue (DLQ) pattern is essential for production integrations. Failed calls are written to a persistent queue (SQS, Redis list, database table) with the original payload, error details, and retry count. A separate process reads from the DLQ and retries at a controlled rate. This decouples failure handling from the main sync flow and prevents data loss.
What is exponential backoff and when should we use it?
Exponential backoff increases the wait time between retries exponentially after each failure: first retry after 1s, second after 2s, third after 4s, and so on. Add jitter (a random offset) to prevent synchronized retries from multiple workers. Use exponential backoff for transient failures (network errors, 503s) rather than rate limits — rate limit retry timing should follow the Retry-After header, not exponential backoff.
How do we test rate limit handling without triggering actual rate limits?
Build a mock API server that enforces rate limits for local testing. The mock server returns 429 responses with Retry-After headers after exceeding the simulated limit. This allows testing of rate limit handling logic without consuming actual API quota or requiring network connectivity.
What metrics should we monitor for rate limit health in production?
Track: 429 response rate (by API endpoint), queue depth (growing queue indicates production rate cannot keep up), retry count distribution (high retry counts indicate sustained rate pressure), and dead letter queue growth (indicates retry-exhausted failures). Alert on 429 rate exceeding 5% of requests or DLQ growing without depletion.
Further Reading from Authoritative Sources
- MDN Web Docs — HTTP 429 Too Many Requests: The MDN reference for the 429 status code, including the Retry-After header specification and standard retry behavior guidelines.
- IETF RFC 6585 — Additional HTTP Status Codes: The IETF standard that formally defines 429 Too Many Requests and the rate limiting response pattern that all compliant APIs should follow.


