Rate limit errors are the most predictable integration failures in MarTech development and the most commonly handled incorrectly. “Retry on 429” is not a rate limiting strategy — it is a reactive response to having already failed. The integrations that survive production load implement rate limiting as a client-side constraint, not as an error handler.

This guide covers the rate limiting architectures that actually hold up, with implementation patterns for the MarTech platforms that developers integrate most frequently.

Why Naive Retry Handling Fails

The standard bad implementation looks like this:

def call_api(url, params):
    response = requests.get(url, params=params)
    if response.status_code == 429:
        time.sleep(int(response.headers.get('Retry-After', 60)))
        return call_api(url, params)  # Recursive retry
    return response

This fails in several ways:

It sleeps the calling thread. For any concurrent workload, sleeping the thread blocks other work that could be proceeding.

It does not handle cascading retries. When multiple concurrent calls hit the rate limit simultaneously, they all retry with the same Retry-After delay, then all fire simultaneously again, creating a thundering herd.

It does not prevent the limit from being hit. The approach reacts to 429s rather than preventing them. In the time between request bursts and the rate limiter’s response, you may have already failed dozens of calls.

It creates unbounded recursion. Repeated rate limiting during a retry cycle can cause deep recursion and stack overflow in high-volume scenarios.

The Token Bucket Implementation

The token bucket algorithm is the correct client-side implementation for rate limiting. Tokens are added to the bucket at the rate allowed by the API. Each request consumes a token. When the bucket is empty, the request waits for a token before executing.

import threading
import time
from collections import deque

class TokenBucketRateLimiter:
    def __init__(self, rate_per_second, burst_capacity=None):
        """
        rate_per_second: Sustained rate allowed
        burst_capacity: Max tokens that can accumulate (defaults to rate_per_second)
        """
        self.rate = rate_per_second
        self.capacity = burst_capacity or rate_per_second
        self.tokens = self.capacity
        self.last_refill = time.monotonic()
        self.lock = threading.Lock()

    def _refill(self):
        now = time.monotonic()
        elapsed = now - self.last_refill
        refill_amount = elapsed * self.rate
        self.tokens = min(self.capacity, self.tokens + refill_amount)
        self.last_refill = now

    def acquire(self, timeout=None):
        """Block until a token is available, or timeout expires."""
        deadline = time.monotonic() + timeout if timeout else None
        
        while True:
            with self.lock:
                self._refill()
                if self.tokens >= 1:
                    self.tokens -= 1
                    return True
                wait_time = (1 - self.tokens) / self.rate
            
            if deadline and time.monotonic() + wait_time > deadline:
                return False  # Would exceed timeout
            
            time.sleep(min(wait_time, 0.1))  # Check frequently

class HubSpotClient:
    # HubSpot allows 100 requests per 10 seconds = 10/second
    rate_limiter = TokenBucketRateLimiter(rate_per_second=10, burst_capacity=100)
    
    def __init__(self, token):
        self.token = token
        self.session = requests.Session()
        self.session.headers.update({
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        })
    
    def request(self, method, url, **kwargs):
        self.rate_limiter.acquire()
        response = self.session.request(method, url, **kwargs)
        
        # Even with rate limiting, occasional 429s can occur
        if response.status_code == 429:
            retry_after = int(response.headers.get('Retry-After', 10))
            time.sleep(retry_after)
            return self.request(method, url, **kwargs)
        
        response.raise_for_status()
        return response

Adaptive Rate Limiting

Static rate limiters calibrated to the documented API limit leave performance on the table. Most APIs have burst allowances that allow higher rates for short periods. An adaptive rate limiter adjusts based on actual API feedback:

class AdaptiveRateLimiter:
    def __init__(self, initial_rate, min_rate, max_rate):
        self.current_rate = initial_rate
        self.min_rate = min_rate
        self.max_rate = max_rate
        self.success_streak = 0
        self.backoff_factor = 0.5
        self.recovery_factor = 1.1
        self.lock = threading.Lock()
    
    def on_success(self):
        with self.lock:
            self.success_streak += 1
            # Gradually increase rate after consecutive successes
            if self.success_streak >= 10:
                self.current_rate = min(
                    self.max_rate,
                    self.current_rate * self.recovery_factor
                )
                self.success_streak = 0
    
    def on_rate_limited(self):
        with self.lock:
            self.success_streak = 0
            self.current_rate = max(
                self.min_rate,
                self.current_rate * self.backoff_factor
            )
    
    def get_rate(self):
        return self.current_rate

Queue-Based Architecture for High-Volume Sync

For bulk synchronization jobs — importing contacts, syncing product catalogs, updating records — a queue-based architecture separates task production from task execution and naturally rate-limits by controlling worker count and inter-task delay:

import queue
import threading
from dataclasses import dataclass
from typing import Callable, Optional

@dataclass
class Task:
    id: str
    payload: dict
    retry_count: int = 0
    max_retries: int = 3

class RateLimitedWorkerPool:
    def __init__(self, worker_count, rate_limiter, api_client):
        self.task_queue = queue.Queue(maxsize=10000)
        self.dead_letter_queue = queue.Queue()
        self.rate_limiter = rate_limiter
        self.api_client = api_client
        self.workers = []
        self._start_workers(worker_count)
    
    def _start_workers(self, count):
        for i in range(count):
            t = threading.Thread(target=self._worker, daemon=True)
            t.start()
            self.workers.append(t)
    
    def _worker(self):
        while True:
            task = self.task_queue.get()
            
            try:
                self.rate_limiter.acquire()
                result = self.api_client.process(task.payload)
                self.rate_limiter.on_success()
                
            except RateLimitError as e:
                self.rate_limiter.on_rate_limited()
                
                if task.retry_count < task.max_retries:
                    task.retry_count += 1
                    self.task_queue.put(task)  # Re-queue
                else:
                    self.dead_letter_queue.put(task)
                    
            except Exception as e:
                self.dead_letter_queue.put(task)
                
            finally:
                self.task_queue.task_done()
    
    def submit(self, task):
        self.task_queue.put(task)
    
    def wait_completion(self):
        self.task_queue.join()

Platform-Specific Rate Limits and Patterns

HubSpot

HubSpot’s limits: 100 requests per 10 seconds, 40,000 requests per day. Both limits are per portal, shared across all apps. The daily limit is the binding constraint for large sync operations.

The key optimization: HubSpot’s batch endpoints accept up to 100 records per request and count as a single API call. A batch upsert of 100 contacts consumes 1 API call instead of 100. At maximum throughput with batch operations:

100 requests/10 seconds × 100 records/request = 1,000 records/second

For a portal with 100,000 contacts, a full sync takes approximately 100 seconds — well within the daily limit. Without batch operations, the same sync at 10 records/second would take 10,000 seconds.

Salesforce

Salesforce limits are edition-dependent. The key limits:

  • API requests: 1,000,000/24 hours (Enterprise/Unlimited) or 15,000 + 2,000 per licensed user (lower editions)
  • Concurrent API requests: 25 (long-running requests)
  • Bulk API limits: separate limits, much higher — designed for mass data operations

For Salesforce, the Bulk API 2.0 is the right tool for mass data operations. It uses a job-based model — create a job, upload data in batches, close the job, poll for results. The daily limit for Bulk API operations is measured in records processed, not API calls:

def create_bulk_upsert_job(client, object_type, external_id_field):
    response = client.post('/services/data/v58.0/jobs/ingest', json={
        'operation': 'upsert',
        'object': object_type,
        'externalIdFieldName': external_id_field,
        'contentType': 'CSV',
        'lineEnding': 'LF'
    })
    return response.json()['id']

Mailchimp

Mailchimp’s API limit: 10 simultaneous connections per API key, no stated per-second limit, but documented rate limit responses occur at high request rates. For list operations, the Batch API accepts up to 500 operations per call:

def batch_subscribe(client, list_id, members):
    """Submit up to 500 member subscribe/update operations in one call."""
    operations = [
        {
            'method': 'PUT',
            'path': f'/lists/{list_id}/members/{hashlib.md5(m["email"].lower().encode()).hexdigest()}',
            'body': json.dumps({'email_address': m['email'], 'status_if_new': 'subscribed', **m.get('merge_fields', {})})
        }
        for m in members[:500]
    ]
    return client.post('/batches', json={'operations': operations})

Google Analytics (Measurement Protocol)

GA4’s Measurement Protocol for server-side event collection has no enforced rate limit for standard use. However, the debug endpoint (/debug/mp/collect) has a 20-hit limit — use it only for validation, not for load testing production volumes.

For GA4 Reporting API reads, the default quota is 10 concurrent requests and 10,000 requests per day per property. Caching API responses is essential for dashboard applications that query GA4 data frequently.

Frequently Asked Questions

How do we handle rate limits across multiple MarTech tools running simultaneously?

Each tool has its own independent rate limit. The challenge is that your application may be calling multiple APIs concurrently from the same worker pool. Use separate rate limiters per API destination — a shared rate limiter across different APIs produces artificial throttling on API A because API B hit its limit.

Should we store failed API calls for later replay, and how?

Yes — a dead letter queue (DLQ) pattern is essential for production integrations. Failed calls are written to a persistent queue (SQS, Redis list, database table) with the original payload, error details, and retry count. A separate process reads from the DLQ and retries at a controlled rate. This decouples failure handling from the main sync flow and prevents data loss.

What is exponential backoff and when should we use it?

Exponential backoff increases the wait time between retries exponentially after each failure: first retry after 1s, second after 2s, third after 4s, and so on. Add jitter (a random offset) to prevent synchronized retries from multiple workers. Use exponential backoff for transient failures (network errors, 503s) rather than rate limits — rate limit retry timing should follow the Retry-After header, not exponential backoff.

How do we test rate limit handling without triggering actual rate limits?

Build a mock API server that enforces rate limits for local testing. The mock server returns 429 responses with Retry-After headers after exceeding the simulated limit. This allows testing of rate limit handling logic without consuming actual API quota or requiring network connectivity.

What metrics should we monitor for rate limit health in production?

Track: 429 response rate (by API endpoint), queue depth (growing queue indicates production rate cannot keep up), retry count distribution (high retry counts indicate sustained rate pressure), and dead letter queue growth (indicates retry-exhausted failures). Alert on 429 rate exceeding 5% of requests or DLQ growing without depletion.

Further Reading from Authoritative Sources