Salesforce has the most comprehensive API surface of any CRM on the market and some of the most complex authentication, data model, and governor limit constraints you will encounter in enterprise software integration. The combination makes Salesforce integrations both highly capable and highly capable of going wrong in ways that are non-obvious until production load.
This guide covers the practical integration patterns — authentication, the right API for each use case, SOQL query optimization, and the data model assumptions that most documentation does not explain.
Understanding Salesforce’s API Landscape
Salesforce exposes multiple APIs that serve different integration purposes. Choosing the wrong one produces integrations that work but hit limits or are unnecessarily slow.
REST API. The primary API for individual record operations — create, read, update, delete, query. Uses standard HTTP methods. Supports SOQL queries via the /query endpoint. The right choice for transactional integrations: creating leads from form submissions, updating contact records, querying for specific records by criteria.
SOAP API. The older API that predates REST. Still fully supported and required for some operations (certain metadata operations, complex batch operations that need synchronous processing). Most new integrations should use the REST API; SOAP is for cases where a capability exists only in SOAP.
Bulk API 2.0. Designed for large data operations — ingesting or querying millions of records. Uses an asynchronous job model: create a job, upload data in batches, close the job, poll for completion. Rate limits are measured in records processed rather than API calls. The right choice for initial data loads, historical imports, and daily full syncs.
Streaming API (PushTopic and Change Data Capture). Provides real-time event streaming when Salesforce records change. Change Data Capture (CDC) publishes events for every create, update, delete, and undelete on subscribed objects. The right choice for low-latency integration where you need to react to CRM changes in near-real-time.
Metadata API. For managing the structure of a Salesforce org — custom fields, objects, workflow rules, profiles. Used in CI/CD pipelines and org configuration management, not in data integration.
Authentication: Connected Apps and OAuth
Salesforce authentication for server-to-server integration uses OAuth 2.0, but the flow options matter:
OAuth 2.0 Username-Password Flow (avoid in production). Sends the user’s username, password, and security token directly. Simple to implement but uses a user’s personal credentials — the integration breaks if the user changes their password, is deactivated, or has their security token reset. Many Salesforce orgs have policy restrictions that block this flow.
OAuth 2.0 JWT Bearer Flow (recommended for server-to-server). Your server signs a JWT assertion with a private key; Salesforce verifies with the corresponding public key registered in a Connected App. No password to rotate, no user credential dependency, works with Salesforce’s IP restriction policies:
import jwt
import time
import requests
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend
class SalesforceAuth:
def __init__(self, client_id, username, private_key_path, sandbox=False):
self.client_id = client_id
self.username = username
self.sandbox = sandbox
self.token_url = (
'https://test.salesforce.com/services/oauth2/token' if sandbox
else 'https://login.salesforce.com/services/oauth2/token'
)
with open(private_key_path, 'rb') as f:
self.private_key = serialization.load_pem_private_key(
f.read(), password=None, backend=default_backend()
)
self._access_token = None
self._instance_url = None
self._token_expiry = 0
def _get_jwt_assertion(self):
now = int(time.time())
payload = {
'iss': self.client_id,
'sub': self.username,
'aud': self.token_url.replace('/services/oauth2/token', ''),
'exp': now + 300, # 5-minute expiry
}
return jwt.encode(payload, self.private_key, algorithm='RS256')
def get_token(self):
if self._access_token and time.time() < self._token_expiry:
return self._access_token, self._instance_url
assertion = self._get_jwt_assertion()
response = requests.post(self.token_url, data={
'grant_type': 'urn:ietf:params:oauth:grant-type:jwt-bearer',
'assertion': assertion
})
response.raise_for_status()
data = response.json()
self._access_token = data['access_token']
self._instance_url = data['instance_url']
self._token_expiry = time.time() + 7000 # Tokens valid for ~2 hours
return self._access_token, self._instance_url
SOQL: Writing Queries That Don’t Hit Governor Limits
Salesforce Object Query Language (SOQL) is SQL-like but has specific constraints that differ from standard SQL. Understanding them prevents the common “query returned too many rows” and timeout failures.
The 50,000 row return limit. A single SOQL query can return at most 50,000 rows. For queries that might exceed this, use SOQL offset/limit pagination with a loop, or use the Bulk API for large datasets.
The COUNT() limitation. SELECT COUNT() FROM Contact WHERE... is efficient for record counts and does not count against the row return limit. SELECT COUNT(Id) FROM Contact WHERE... returns the count in a result set and is much slower.
Indexed fields for performance. SOQL queries on indexed fields (Id, Name, CreatedDate, LastModifiedDate, owner fields, external ID fields) are fast. Queries on non-indexed custom fields trigger full-table scans and are slow in large orgs. When you need to query on a custom field frequently, ask the Salesforce admin to index it.
SOQL with relationships. Salesforce supports two types of relationship traversal in SOQL:
# Child-to-Parent: Traverse from Contact UP to Account
soql_child_to_parent = """
SELECT Id, FirstName, LastName, Email, Account.Name, Account.Industry
FROM Contact
WHERE Account.Industry = 'Technology'
AND LastModifiedDate > :since
"""
# Parent-to-Child: Query Account WITH related Contacts subquery
soql_parent_to_child = """
SELECT Id, Name, Industry,
(SELECT Id, FirstName, LastName, Email FROM Contacts
WHERE IsDeleted = false)
FROM Account
WHERE LastModifiedDate > :since
"""
Querying for Changed Records: The LastModifiedDate Pattern
For polling-based integrations, querying for records changed since the last sync is the standard pattern:
class SalesforceClient:
def __init__(self, auth):
self.auth = auth
def query(self, soql):
token, instance_url = self.auth.get_token()
url = f"{instance_url}/services/data/v58.0/query"
all_records = []
next_url = url
params = {'q': soql}
while next_url:
if next_url == url:
response = requests.get(next_url, params=params,
headers={'Authorization': f'Bearer {token}'},
timeout=30)
else:
response = requests.get(next_url,
headers={'Authorization': f'Bearer {token}'},
timeout=30)
response.raise_for_status()
data = response.json()
all_records.extend(data['records'])
# Handle pagination
next_records_url = data.get('nextRecordsUrl')
next_url = f"{instance_url}{next_records_url}" if next_records_url else None
return all_records
def get_updated_contacts(self, since_datetime):
soql = f"""
SELECT Id, FirstName, LastName, Email, Phone,
Account.Name, Account.Id,
CreatedDate, LastModifiedDate
FROM Contact
WHERE LastModifiedDate > {since_datetime.strftime('%Y-%m-%dT%H:%M:%S.000Z')}
AND IsDeleted = false
ORDER BY LastModifiedDate ASC
"""
return self.query(soql)
Bulk API 2.0 for Large Data Operations
For inserting or updating more than a few thousand records, the Bulk API 2.0 is significantly more efficient than REST API individual calls:
import csv
import io
class SalesforceBulkClient:
def __init__(self, auth):
self.auth = auth
def bulk_upsert(self, object_name, records, external_id_field='External_Id__c'):
token, instance_url = self.auth.get_token()
headers = {
'Authorization': f'Bearer {token}',
'Content-Type': 'application/json'
}
api_base = f"{instance_url}/services/data/v58.0"
# 1. Create the job
job_response = requests.post(
f"{api_base}/jobs/ingest",
json={
'operation': 'upsert',
'object': object_name,
'externalIdFieldName': external_id_field,
'contentType': 'CSV',
'lineEnding': 'LF'
},
headers=headers
)
job_id = job_response.json()['id']
# 2. Upload data in batches
BATCH_SIZE = 10000
for i in range(0, len(records), BATCH_SIZE):
batch = records[i:i + BATCH_SIZE]
csv_data = self._records_to_csv(batch)
requests.put(
f"{api_base}/jobs/ingest/{job_id}/batches",
data=csv_data.encode('utf-8'),
headers={**headers, 'Content-Type': 'text/csv'}
)
# 3. Close the job (start processing)
requests.patch(
f"{api_base}/jobs/ingest/{job_id}",
json={'state': 'UploadComplete'},
headers=headers
)
# 4. Poll for completion
while True:
status_response = requests.get(
f"{api_base}/jobs/ingest/{job_id}",
headers=headers
)
state = status_response.json()['state']
if state in ('JobComplete', 'Failed', 'Aborted'):
break
time.sleep(10)
return status_response.json()
def _records_to_csv(self, records):
if not records:
return ''
output = io.StringIO()
writer = csv.DictWriter(output, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)
return output.getvalue()
Change Data Capture for Real-Time Integration
Salesforce’s Change Data Capture API uses a streaming subscription model based on CometD (a Bayeux protocol implementation). The practical implementation uses the simple-salesforce library’s CDC support or a direct CometD client:
from simple_salesforce import Salesforce, SalesforceLogin
from salesforce_streams import SalesforceStreaming
def handle_contact_change(event):
"""Process a CDC event for a Contact change."""
change_type = event['payload']['ChangeEventHeader']['changeType']
record_ids = event['payload']['ChangeEventHeader']['recordIds']
if change_type in ('CREATE', 'UPDATE'):
# Process the changed contact
changed_fields = {
k: v for k, v in event['payload'].items()
if k != 'ChangeEventHeader'
}
for record_id in record_ids:
sync_contact_to_downstream(record_id, changed_fields)
elif change_type == 'DELETE':
for record_id in record_ids:
handle_contact_deletion(record_id)
# Subscribe to Contact CDC channel
streaming = SalesforceStreaming(sf_instance)
streaming.subscribe('/data/ContactChangeEvent', handle_contact_change)
Frequently Asked Questions
What is the difference between API limits and governor limits in Salesforce?
API limits control how many requests your integration can make against the Salesforce API (typically 1M+ per 24 hours for enterprise editions). Governor limits control what a single APEX transaction or request can do — maximum 100 SOQL queries per transaction, 50,000 records returned per query, 150MB heap size. API limits affect your integration’s throughput. Governor limits affect what any individual request can do. Both require planning.
How should we handle APEX triggers that fire during our API operations?
When your integration creates or updates records via the API, any APEX triggers configured in the Salesforce org fire. This is expected behavior — the trigger doesn’t know whether the change came from the UI or the API. Problems arise when triggers have side effects your integration doesn’t expect (sending emails, creating related records, calling external services). Review relevant triggers with the Salesforce admin before production. If necessary, create a custom “integration user” profile with trigger bypass flags.
Can we use Salesforce’s Platform Events instead of Change Data Capture?
Platform Events and CDC serve different purposes. CDC publishes events automatically when standard or custom object records change — no configuration needed. Platform Events are custom event types that APEX code or Process Builder explicitly publishes — useful for custom business logic events that are not directly tied to record changes. For integration use cases where you want to react to CRM data changes, CDC is typically the right tool.
How do we handle the Salesforce sandbox vs. production environment in our integration?
Maintain separate Connected App configurations for sandbox and production, with separate credentials. Sandbox has the same API surface as production but separate API limits and data. Use environment variables or a config service to switch between sandbox and production endpoints — never hardcode the login URL. Test all schema changes in sandbox before production deployment; Salesforce schema changes (new fields, modified validation rules) require integration testing in sandbox.
What is the best way to do a one-time initial load of all Salesforce contacts into our data warehouse?
Use the Bulk API 2.0 Query operation (not Upsert). Create a job with operation: query, provide a SOQL query for all Contact fields, and process the results in batches. The Bulk API can return millions of records efficiently. After the initial load, switch to incremental polling via LastModifiedDate or CDC subscription for ongoing sync.
Further Reading from Authoritative Sources
- MDN Web Docs — HTTP Authentication: Background on the Bearer token authentication scheme used in Salesforce REST API calls, including header format and security considerations.
- IETF RFC 7523 — JWT Profile for OAuth 2.0: The IETF standard defining the JWT Bearer grant type used in Salesforce’s server-to-server OAuth flow — understanding this RFC clarifies why the JWT assertion format is structured as it is.


