Salesforce API Integration: A Developer's Practical Guide

Salesforce has the most comprehensive API surface of any CRM on the market and some of the most complex authentication, data model, and governor limit constraints you will encounter in enterprise software integration. The combination makes Salesforce integrations both highly capable and highly capable of going wrong in ways that are non-obvious until production load.

This guide covers the practical integration patterns — authentication, the right API for each use case, SOQL query optimization, and the data model assumptions that most documentation does not explain.

Understanding Salesforce’s API Landscape

Salesforce exposes multiple APIs that serve different integration purposes. Choosing the wrong one produces integrations that work but hit limits or are unnecessarily slow.

REST API. The primary API for individual record operations — create, read, update, delete, query. Uses standard HTTP methods. Supports SOQL queries via the /query endpoint. The right choice for transactional integrations: creating leads from form submissions, updating contact records, querying for specific records by criteria.

SOAP API. The older API that predates REST. Still fully supported and required for some operations (certain metadata operations, complex batch operations that need synchronous processing). Most new integrations should use the REST API; SOAP is for cases where a capability exists only in SOAP.

Bulk API 2.0. Designed for large data operations — ingesting or querying millions of records. Uses an asynchronous job model: create a job, upload data in batches, close the job, poll for completion. Rate limits are measured in records processed rather than API calls. The right choice for initial data loads, historical imports, and daily full syncs.

Streaming API (PushTopic and Change Data Capture). Provides real-time event streaming when Salesforce records change. Change Data Capture (CDC) publishes events for every create, update, delete, and undelete on subscribed objects. The right choice for low-latency integration where you need to react to CRM changes in near-real-time.

Metadata API. For managing the structure of a Salesforce org — custom fields, objects, workflow rules, profiles. Used in CI/CD pipelines and org configuration management, not in data integration.

Authentication: Connected Apps and OAuth

Salesforce authentication for server-to-server integration uses OAuth 2.0, but the flow options matter:

OAuth 2.0 Username-Password Flow (avoid in production). Sends the user’s username, password, and security token directly. Simple to implement but uses a user’s personal credentials — the integration breaks if the user changes their password, is deactivated, or has their security token reset. Many Salesforce orgs have policy restrictions that block this flow.

OAuth 2.0 JWT Bearer Flow (recommended for server-to-server). Your server signs a JWT assertion with a private key; Salesforce verifies with the corresponding public key registered in a Connected App. No password to rotate, no user credential dependency, works with Salesforce’s IP restriction policies:

import jwt
import time
import requests
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend

class SalesforceAuth:
    def __init__(self, client_id, username, private_key_path, sandbox=False):
        self.client_id = client_id
        self.username = username
        self.sandbox = sandbox
        self.token_url = (
            'https://test.salesforce.com/services/oauth2/token' if sandbox
            else 'https://login.salesforce.com/services/oauth2/token'
        )
        
        with open(private_key_path, 'rb') as f:
            self.private_key = serialization.load_pem_private_key(
                f.read(), password=None, backend=default_backend()
            )
        
        self._access_token = None
        self._instance_url = None
        self._token_expiry = 0
    
    def _get_jwt_assertion(self):
        now = int(time.time())
        payload = {
            'iss': self.client_id,
            'sub': self.username,
            'aud': self.token_url.replace('/services/oauth2/token', ''),
            'exp': now + 300,  # 5-minute expiry
        }
        return jwt.encode(payload, self.private_key, algorithm='RS256')
    
    def get_token(self):
        if self._access_token and time.time() < self._token_expiry:
            return self._access_token, self._instance_url
        
        assertion = self._get_jwt_assertion()
        response = requests.post(self.token_url, data={
            'grant_type': 'urn:ietf:params:oauth:grant-type:jwt-bearer',
            'assertion': assertion
        })
        response.raise_for_status()
        
        data = response.json()
        self._access_token = data['access_token']
        self._instance_url = data['instance_url']
        self._token_expiry = time.time() + 7000  # Tokens valid for ~2 hours
        
        return self._access_token, self._instance_url

SOQL: Writing Queries That Don’t Hit Governor Limits

Salesforce Object Query Language (SOQL) is SQL-like but has specific constraints that differ from standard SQL. Understanding them prevents the common “query returned too many rows” and timeout failures.

The 50,000 row return limit. A single SOQL query can return at most 50,000 rows. For queries that might exceed this, use SOQL offset/limit pagination with a loop, or use the Bulk API for large datasets.

The COUNT() limitation. SELECT COUNT() FROM Contact WHERE... is efficient for record counts and does not count against the row return limit. SELECT COUNT(Id) FROM Contact WHERE... returns the count in a result set and is much slower.

Indexed fields for performance. SOQL queries on indexed fields (Id, Name, CreatedDate, LastModifiedDate, owner fields, external ID fields) are fast. Queries on non-indexed custom fields trigger full-table scans and are slow in large orgs. When you need to query on a custom field frequently, ask the Salesforce admin to index it.

SOQL with relationships. Salesforce supports two types of relationship traversal in SOQL:

# Child-to-Parent: Traverse from Contact UP to Account
soql_child_to_parent = """
    SELECT Id, FirstName, LastName, Email, Account.Name, Account.Industry
    FROM Contact
    WHERE Account.Industry = 'Technology'
    AND LastModifiedDate > :since
"""

# Parent-to-Child: Query Account WITH related Contacts subquery
soql_parent_to_child = """
    SELECT Id, Name, Industry, 
           (SELECT Id, FirstName, LastName, Email FROM Contacts 
            WHERE IsDeleted = false)
    FROM Account
    WHERE LastModifiedDate > :since
"""

Querying for Changed Records: The LastModifiedDate Pattern

For polling-based integrations, querying for records changed since the last sync is the standard pattern:

class SalesforceClient:
    def __init__(self, auth):
        self.auth = auth
    
    def query(self, soql):
        token, instance_url = self.auth.get_token()
        url = f"{instance_url}/services/data/v58.0/query"
        
        all_records = []
        next_url = url
        params = {'q': soql}
        
        while next_url:
            if next_url == url:
                response = requests.get(next_url, params=params,
                    headers={'Authorization': f'Bearer {token}'},
                    timeout=30)
            else:
                response = requests.get(next_url,
                    headers={'Authorization': f'Bearer {token}'},
                    timeout=30)
            
            response.raise_for_status()
            data = response.json()
            all_records.extend(data['records'])
            
            # Handle pagination
            next_records_url = data.get('nextRecordsUrl')
            next_url = f"{instance_url}{next_records_url}" if next_records_url else None
        
        return all_records
    
    def get_updated_contacts(self, since_datetime):
        soql = f"""
            SELECT Id, FirstName, LastName, Email, Phone,
                   Account.Name, Account.Id,
                   CreatedDate, LastModifiedDate
            FROM Contact
            WHERE LastModifiedDate > {since_datetime.strftime('%Y-%m-%dT%H:%M:%S.000Z')}
            AND IsDeleted = false
            ORDER BY LastModifiedDate ASC
        """
        return self.query(soql)

Bulk API 2.0 for Large Data Operations

For inserting or updating more than a few thousand records, the Bulk API 2.0 is significantly more efficient than REST API individual calls:

import csv
import io

class SalesforceBulkClient:
    def __init__(self, auth):
        self.auth = auth
    
    def bulk_upsert(self, object_name, records, external_id_field='External_Id__c'):
        token, instance_url = self.auth.get_token()
        headers = {
            'Authorization': f'Bearer {token}',
            'Content-Type': 'application/json'
        }
        api_base = f"{instance_url}/services/data/v58.0"
        
        # 1. Create the job
        job_response = requests.post(
            f"{api_base}/jobs/ingest",
            json={
                'operation': 'upsert',
                'object': object_name,
                'externalIdFieldName': external_id_field,
                'contentType': 'CSV',
                'lineEnding': 'LF'
            },
            headers=headers
        )
        job_id = job_response.json()['id']
        
        # 2. Upload data in batches
        BATCH_SIZE = 10000
        for i in range(0, len(records), BATCH_SIZE):
            batch = records[i:i + BATCH_SIZE]
            csv_data = self._records_to_csv(batch)
            
            requests.put(
                f"{api_base}/jobs/ingest/{job_id}/batches",
                data=csv_data.encode('utf-8'),
                headers={**headers, 'Content-Type': 'text/csv'}
            )
        
        # 3. Close the job (start processing)
        requests.patch(
            f"{api_base}/jobs/ingest/{job_id}",
            json={'state': 'UploadComplete'},
            headers=headers
        )
        
        # 4. Poll for completion
        while True:
            status_response = requests.get(
                f"{api_base}/jobs/ingest/{job_id}",
                headers=headers
            )
            state = status_response.json()['state']
            
            if state in ('JobComplete', 'Failed', 'Aborted'):
                break
            time.sleep(10)
        
        return status_response.json()
    
    def _records_to_csv(self, records):
        if not records:
            return ''
        output = io.StringIO()
        writer = csv.DictWriter(output, fieldnames=records[0].keys())
        writer.writeheader()
        writer.writerows(records)
        return output.getvalue()

Change Data Capture for Real-Time Integration

Salesforce’s Change Data Capture API uses a streaming subscription model based on CometD (a Bayeux protocol implementation). The practical implementation uses the simple-salesforce library’s CDC support or a direct CometD client:

from simple_salesforce import Salesforce, SalesforceLogin
from salesforce_streams import SalesforceStreaming

def handle_contact_change(event):
    """Process a CDC event for a Contact change."""
    change_type = event['payload']['ChangeEventHeader']['changeType']
    record_ids = event['payload']['ChangeEventHeader']['recordIds']
    
    if change_type in ('CREATE', 'UPDATE'):
        # Process the changed contact
        changed_fields = {
            k: v for k, v in event['payload'].items()
            if k != 'ChangeEventHeader'
        }
        for record_id in record_ids:
            sync_contact_to_downstream(record_id, changed_fields)
    
    elif change_type == 'DELETE':
        for record_id in record_ids:
            handle_contact_deletion(record_id)

# Subscribe to Contact CDC channel
streaming = SalesforceStreaming(sf_instance)
streaming.subscribe('/data/ContactChangeEvent', handle_contact_change)

Frequently Asked Questions

What is the difference between API limits and governor limits in Salesforce?

API limits control how many requests your integration can make against the Salesforce API (typically 1M+ per 24 hours for enterprise editions). Governor limits control what a single APEX transaction or request can do — maximum 100 SOQL queries per transaction, 50,000 records returned per query, 150MB heap size. API limits affect your integration’s throughput. Governor limits affect what any individual request can do. Both require planning.

How should we handle APEX triggers that fire during our API operations?

When your integration creates or updates records via the API, any APEX triggers configured in the Salesforce org fire. This is expected behavior — the trigger doesn’t know whether the change came from the UI or the API. Problems arise when triggers have side effects your integration doesn’t expect (sending emails, creating related records, calling external services). Review relevant triggers with the Salesforce admin before production. If necessary, create a custom “integration user” profile with trigger bypass flags.

Can we use Salesforce’s Platform Events instead of Change Data Capture?

Platform Events and CDC serve different purposes. CDC publishes events automatically when standard or custom object records change — no configuration needed. Platform Events are custom event types that APEX code or Process Builder explicitly publishes — useful for custom business logic events that are not directly tied to record changes. For integration use cases where you want to react to CRM data changes, CDC is typically the right tool.

How do we handle the Salesforce sandbox vs. production environment in our integration?

Maintain separate Connected App configurations for sandbox and production, with separate credentials. Sandbox has the same API surface as production but separate API limits and data. Use environment variables or a config service to switch between sandbox and production endpoints — never hardcode the login URL. Test all schema changes in sandbox before production deployment; Salesforce schema changes (new fields, modified validation rules) require integration testing in sandbox.

What is the best way to do a one-time initial load of all Salesforce contacts into our data warehouse?

Use the Bulk API 2.0 Query operation (not Upsert). Create a job with operation: query, provide a SOQL query for all Contact fields, and process the results in batches. The Bulk API can return millions of records efficiently. After the initial load, switch to incremental polling via LastModifiedDate or CDC subscription for ongoing sync.