Every MarTech integration needs to answer the same question: how does data move from the source system to my system? The two fundamental patterns — webhooks (push) and polling (pull) — have different latency profiles, reliability characteristics, and implementation complexity. Choosing the wrong one for the use case creates integrations that are either unnecessarily complex or functionally broken.

This guide walks through the decision framework and implementation patterns for both approaches, with specific examples from common MarTech platforms.

The Core Difference

Polling is your system periodically requesting data from the source: “Hey HubSpot, give me all contacts updated in the last 5 minutes.” Your system controls the timing. The source system does nothing until you ask.

Webhooks are the source system pushing data to your system when events occur: “Hey, a contact was just updated — here’s the data.” The source system controls the timing. Your system must be reachable and ready to receive.

The latency difference is significant. Polling at 5-minute intervals means data is at most 5 minutes stale — but on average 2.5 minutes stale. A webhook delivers the event within seconds of it occurring. For use cases where near-real-time data is important (triggered email automation, live CRM updates, immediate fraud alerts), the polling lag is often unacceptable.

When to Use Webhooks

Webhooks are the right choice when:

Low latency is required. If your use case requires reacting within seconds or minutes of an event, polling cannot meet the latency requirement. Triggered email sequences (send an onboarding email 30 minutes after signup), real-time CRM updates (create a task when a deal stage changes), and live inventory adjustments all require webhook latency.

The source system supports reliable webhook delivery. Not all systems deliver webhooks reliably. A webhook system is only as good as its delivery guarantees — does it retry failed deliveries? How many times? With what backoff? A source that delivers webhooks “at most once” without retry is unreliable for important data.

Your infrastructure can provide a reliable, publicly reachable endpoint. Your webhook receiver needs to be up, fast (respond within the source’s timeout window, usually 5–30 seconds), and publicly reachable from the internet. Local development environments require tunneling (ngrok, Cloudflare Tunnel) to receive webhooks. Flaky endpoints that return errors cause webhook systems to stop sending.

When to Use Polling

Polling is the right choice when:

The source system does not support webhooks. Many APIs provide rich query capabilities but no webhook infrastructure. In this case, polling is the only option.

Reliability is more important than latency. A polling integration that misses a 5-minute window catches up in the next poll. A webhook that fails to deliver (your endpoint was down, network error, the source didn’t retry) loses the event unless the source provides a reconciliation mechanism. For non-time-critical data synchronization, the simpler failure mode of polling is often preferable.

You need to process historical or bulk data. Webhooks deliver current events. They cannot deliver data from before the webhook subscription was created. For initial data load, historical imports, and reconciliation jobs, polling (or API-based export) is required.

The event volume is low and consistent. High-frequency webhook events at volume can overwhelm a receiver. For 100 CRM updates per day, polling is simpler than deploying and maintaining webhook infrastructure.

Implementing a Robust Webhook Receiver

A production-grade webhook receiver handles signature verification, idempotent processing, and asynchronous execution:

from flask import Flask, request, jsonify, abort
import hmac
import hashlib
import json
from datetime import datetime
import redis
import threading

app = Flask(__name__)

redis_client = redis.Redis(host='localhost', port=6379, db=0)

def verify_hubspot_signature(request, secret):
    """Verify HubSpot webhook signature v3."""
    signature = request.headers.get('X-HubSpot-Signature-v3')
    timestamp = request.headers.get('X-HubSpot-Request-Timestamp')
    
    if not signature or not timestamp:
        return False
    
    # Reject requests older than 5 minutes
    if abs(int(timestamp) - int(datetime.now().timestamp() * 1000)) > 300000:
        return False
    
    body = request.get_data(as_text=True)
    source_string = f"{request.method}{request.url}{body}{timestamp}"
    expected = hmac.new(
        secret.encode(), source_string.encode(), hashlib.sha256
    ).hexdigest()
    
    return hmac.compare_digest(expected, signature)

def is_duplicate_event(event_id):
    """Check if we've already processed this event (idempotency check)."""
    key = f"webhook:processed:{event_id}"
    # SET NX: only set if key doesn't exist. Returns True if newly set.
    return not redis_client.set(key, '1', ex=86400, nx=True)

def process_event_async(event):
    """Process event in background thread to respond quickly to webhook."""
    def _process():
        try:
            event_type = event.get('subscriptionType')
            object_id = event.get('objectId')
            
            if event_type == 'contact.propertyChange':
                handle_contact_update(object_id, event)
            elif event_type == 'deal.stageChange':
                handle_deal_stage_change(object_id, event)
        except Exception as e:
            # Log to error tracking, queue for retry if needed
            print(f"Event processing error: {e}")
    
    thread = threading.Thread(target=_process)
    thread.start()

@app.route('/webhooks/hubspot', methods=['POST'])
def hubspot_webhook():
    # 1. Verify signature
    if not verify_hubspot_signature(request, HUBSPOT_CLIENT_SECRET):
        abort(401)
    
    events = request.get_json()
    
    for event in events:
        event_id = event.get('eventId')
        
        # 2. Idempotency check
        if is_duplicate_event(event_id):
            continue  # Already processed, skip silently
        
        # 3. Respond immediately, process asynchronously
        process_event_async(event)
    
    # Return 200 quickly — don't block on processing
    return jsonify({'status': 'received'}), 200

The pattern has three non-negotiable elements: signature verification (ensure the request actually came from HubSpot), idempotency (webhooks are sometimes delivered more than once), and async processing (respond within the source’s timeout window regardless of processing time).

Implementing Reliable Polling

A robust polling implementation tracks the last-successful-sync position and handles gaps:

import time
from datetime import datetime, timezone
import json

class PollingSync:
    def __init__(self, api_client, state_store, process_record):
        self.api_client = api_client
        self.state_store = state_store
        self.process_record = process_record
    
    def get_last_sync_time(self, source):
        """Retrieve the timestamp of the last successful sync."""
        state = self.state_store.get(f'sync_state:{source}')
        if state:
            return datetime.fromisoformat(json.loads(state)['last_sync'])
        return None
    
    def save_sync_state(self, source, timestamp, last_record_id=None):
        state = {
            'last_sync': timestamp.isoformat(),
            'last_record_id': last_record_id,
            'saved_at': datetime.now(timezone.utc).isoformat()
        }
        self.state_store.set(f'sync_state:{source}', json.dumps(state))
    
    def poll_contacts(self):
        """Poll for contacts updated since last sync."""
        last_sync = self.get_last_sync_time('hubspot_contacts')
        sync_start = datetime.now(timezone.utc)
        
        if not last_sync:
            # First sync — use a reasonable historical window
            last_sync = sync_start.replace(hour=0, minute=0, second=0)
        
        page_token = None
        processed_count = 0
        last_record_id = None
        
        while True:
            response = self.api_client.get_contacts_updated_since(
                since=last_sync,
                page_token=page_token,
                limit=100
            )
            
            records = response.get('results', [])
            
            for record in records:
                try:
                    self.process_record(record)
                    last_record_id = record['id']
                    processed_count += 1
                except Exception as e:
                    # Log error but continue — don't let one bad record stop the sync
                    print(f"Failed to process record {record.get('id')}: {e}")
            
            page_token = response.get('paging', {}).get('next', {}).get('after')
            if not page_token:
                break  # No more pages
        
        # Save state AFTER all records are processed
        self.save_sync_state('hubspot_contacts', sync_start, last_record_id)
        return processed_count
    
    def run_continuous(self, interval_seconds=300):
        """Run polling on a fixed interval."""
        while True:
            try:
                count = self.poll_contacts()
                print(f"Synced {count} contacts")
            except Exception as e:
                print(f"Sync failed: {e}")
            
            time.sleep(interval_seconds)

The critical discipline: save the sync position only after all records in the batch are processed. If you save first and processing fails, you have lost those records. If processing fails before saving, the next poll re-processes the same window — slightly inefficient but safe.

Combining Both Patterns: The Hybrid Approach

The most robust integrations use webhooks for real-time updates and polling as a reconciliation mechanism:

  • Webhooks deliver events within seconds for real-time use cases
  • Periodic polling scans for the same time window to catch any events that webhook delivery missed

The combination handles webhook delivery failures, the gap between when a webhook subscription was created and the start of historical data, and edge cases where the source system’s webhook delivery has gaps.

class HybridSync:
    def __init__(self, webhook_receiver, poller):
        self.webhook_receiver = webhook_receiver
        self.poller = poller
    
    def reconcile(self, lookback_hours=6):
        """Poll for recent data to catch any webhook misses."""
        # Poll the last N hours to verify webhook-delivered data
        since = datetime.now(timezone.utc).replace(
            hour=datetime.now().hour - lookback_hours
        )
        return self.poller.poll_since(since)

Frequently Asked Questions

What is the timeout window for responding to webhooks, and what happens if we exceed it?

Most MarTech platforms timeout webhook deliveries between 5 and 30 seconds. HubSpot times out at 20 seconds; Stripe at 20 seconds; GitHub at 10 seconds. If your endpoint does not respond within the timeout, the delivery is marked as failed and the platform will retry (with delays). The correct pattern is to respond immediately with a 200 status and process the event asynchronously.

How do we handle webhook delivery during deployments or downtime?

Deployments and restarts create unavailability windows where webhooks cannot be received. If the source retries failed deliveries (most do, for 24–72 hours), these will be replayed after the endpoint recovers. For sources that do not retry, implement a reconciliation poll that runs after each deployment to catch the events missed during the downtime window.

Should we use a message queue between the webhook receiver and the event processor?

For production workloads with non-trivial processing requirements, yes. The pattern is: webhook receiver validates and acknowledges immediately, writes the raw event to a queue (SQS, Redis, RabbitMQ), and a separate consumer processes from the queue. This decouples receipt from processing, allows rate-controlled consumption, and provides a durable buffer if the consumer is temporarily unavailable.

How do we debug webhook issues in development?

Use ngrok or Cloudflare Tunnel to expose your local endpoint to the internet. Most webhook platforms provide delivery logs showing the payload they sent and the response they received. The combination of tunnel + delivery logs lets you develop webhook handlers locally without staging environment complexity.

Can we use polling for use cases where we previously used webhooks?

Yes, if the latency increase is acceptable. Moving from webhook to polling increases data staleness to the polling interval (5 minutes for a 5-minute poll cycle). If the use case tolerates this — daily batch reporting, non-time-sensitive CRM sync — polling is simpler to operate and more reliable for many platforms.

Further Reading from Authoritative Sources

  • MDN Web Docs — Using Fetch: Reference for the Fetch API patterns used in client-to-server event collection, relevant when implementing the client side of webhook and collection systems.
  • IETF RFC 8288 — Web Linking: The IETF standard for pagination link headers used in polling API responses — understanding Link: rel="next" semantics is essential for correctly implementing cursor-based API pagination in polling integrations.