Customer Data Platform procurement is one of the most consequential MarTech decisions a company makes, and one of the most frequently made on insufficient technical basis. The sales cycle focuses on use cases and dashboards. The implementation reality focuses on data model fit, SDK quality, destination mapping limitations, and the identity resolution edge cases that the demo never shows.
This guide is for developers who need to evaluate CDPs with technical rigor — not for marketers evaluating ease of use, but for engineering and data teams evaluating whether the CDP can actually do what the sales team says it can.
What a CDP Actually Does (and What It Doesn’t)
A Customer Data Platform collects behavioral and transactional data from multiple sources, resolves that data to unified customer profiles, and makes those profiles available for segmentation and activation in downstream marketing tools.
The three core functions:
Collection. SDKs for web, mobile, and server-side data collection. Event routing from your application to the CDP. Source connections for importing CRM, billing, and other data.
Identity resolution. Linking multiple identifiers (anonymous cookie IDs, authenticated user IDs, email addresses, device IDs) to a single customer profile. This is where CDP implementations most commonly fail.
Activation. Destination connections that sync profile data and segments to marketing tools (email platforms, ad platforms, CRMs, analytics tools).
What a CDP is not:
A CDP is not a substitute for a data warehouse. It is not designed for complex analytical queries, historical point-in-time analysis, or the data modeling that a BI team needs. The warehouse-first architecture (using a data warehouse with reverse ETL for activation) can replace a CDP for companies where real-time activation is not required. As discussed in the MarTech data pipeline architecture guide, the right choice between CDP and warehouse-first depends on your activation latency requirements.
The Technical Evaluation Framework
1. SDK Quality and Platform Coverage
The CDP’s SDKs determine the implementation quality and maintenance burden on your engineering team. Evaluate:
JavaScript SDK:
- Does it support both client-side and server-side event sending?
- What is the minified bundle size? (Large SDKs negatively affect Core Web Vitals)
- How does it handle single-page application routing (does
page()fire automatically on route change, or does it require manual calls)? - Does it support consent management (delaying initialization until consent is granted)?
- Is the source code open and auditable?
Mobile SDKs (iOS and Android):
- Does the iOS SDK correctly handle IDFA availability post-ATT (iOS 14.5+)?
- What is the battery and network impact of background event queuing?
- Does it support offline event queuing with replay when connectivity is restored?
- How does it handle app backgrounding and foregrounding for session continuity?
Server-side SDK:
- What languages are officially supported?
- How does the server-side SDK handle batching and retry for failed events?
- Can you send events without using the SDK (raw HTTP API)?
Test quality: Review the SDK’s GitHub repository. Look at the test coverage, the frequency of releases, the age of open issues, and how quickly the maintainers respond. A CDP with a poorly maintained SDK is a technical debt risk regardless of the platform capabilities.
2. Identity Resolution Architecture
Identity resolution is the function where CDPs most commonly oversell and underdeliver. The questions to ask:
What identifiers does the identity graph support? At minimum: anonymous cookie/device IDs, authenticated user IDs, email addresses. Additionally: phone numbers, hashed emails (for ad platform matching), loyalty IDs.
How does anonymous-to-authenticated stitching work? When a user authenticates, their pre-authentication anonymous events should be linked to their identity. Ask specifically: does the CDP retroactively attribute pre-authentication events to the identified user profile? Some CDPs only forward the current session’s events forward — historical anonymous events before authentication remain unlinked.
How does cross-device identity work? Ask the vendor to explain the specific mechanism for cross-device matching. Deterministic matching (same email address across sessions) is reliable. Probabilistic matching (inferring the same person from behavioral signals) is not reliable. What percentage of your user base would have deterministic links vs. probabilistic? If most of your users never share an identifier across devices, cross-device matching will not work well regardless of what the sales deck says.
What happens to identity on deletion (GDPR right to erasure)? Can you trigger a deletion that propagates through the identity graph and removes all events associated with a user from both the CDP and destination systems? Get a documented deletion flow, not a verbal assurance.
Identity graph limits: Some CDPs have limits on the number of identifiers per profile or the time window for identity stitching. These limits become relevant at scale — ask for documentation, not marketing copy.
3. Destination Ecosystem and Mapping Fidelity
The CDP’s value is proportional to its destination ecosystem. Evaluate:
Critical destinations. Does the CDP have native (not custom-built) connections to your most important destinations — your CRM, email platform, ad platforms, and data warehouse? “Native integration” should mean the connection is maintained by the CDP vendor, not a Zapier-style community connector.
Destination mapping limitations. This is where most CDPs have hidden constraints. Ask: for each destination, what event and property names are supported? Some destination connections only support a fixed schema — you cannot send custom properties. Others support custom properties but with length or character limits.
Real-time vs. batch destination sync. What is the delivery latency for each destination? “Real-time” for some CDPs means minutes, not seconds. For triggered email automation where you want the welcome email to arrive within seconds of signup, the delivery latency matters.
Custom destination support. If you need to send data to a destination the CDP doesn’t natively support, what is the mechanism? Destinations Functions (custom JavaScript that the CDP executes), Webhooks (forwarding raw event data to your endpoint), or nothing (you need a different solution)?
4. Data Governance and Compliance Capabilities
Consent management integration. The CDP must respect user consent at the collection and activation levels. Specifically: can you configure the CDP to not collect events from users who have not consented? Can you configure per-destination consent gating (user consented to analytics but not advertising)? Is consent state auditable?
Data residency. For companies with EU users, where is CDP data stored? Does the vendor offer EU data residency? What sub-processors handle the data?
Data retention policies. How long does the CDP retain raw events? Profile data? Can retention periods be configured? For GDPR compliance, you need to be able to configure retention to match your data retention policy.
Access controls. Can you restrict which team members can see raw user data vs. aggregate reports? Can you configure field-level data masking for PII?
5. Query and Segmentation Capabilities
Real-time segmentation. Can segments update in real-time as events occur, or are they batch-computed on a schedule? For real-time personalization and triggered automation, real-time segment membership is required.
Behavioral query language. How do you express “users who have done X but not Y in the last 30 days”? This is the core segmentation use case. Ask the vendor to demonstrate building this query in the product. Look for intuitive query construction and SQL-accessible behavioral data.
SQL access. Can your data team query the CDP’s underlying event and profile data in SQL? Or is all analysis limited to the CDP’s UI? SQL access is a major differentiator for technical teams.
CDP Comparison: Key Technical Differentiators
| Dimension | Segment | mParticle | RudderStack | Amplitude CDP |
|---|---|---|---|---|
| Primary architecture | Event routing + profiles | Event routing + profiles | Event routing (warehouse-native) | Product analytics + CDP |
| Identity resolution | Good deterministic, limited probabilistic | Strong cross-device | Warehouse-based | Strong within-product |
| Open source | No | No | Yes (self-hosted option) | No |
| Server-side SDK quality | Excellent | Excellent | Excellent | Good |
| Real-time segmentation | Yes (Personas) | Yes | Via warehouse | Yes |
| Data warehouse native | Connections only | Connections only | Core architecture | Connections |
RudderStack’s open-source option deserves specific mention: it allows self-hosted deployment with full data control and no per-event pricing, which changes the cost model significantly at high event volumes.
The POC Evaluation
A proof of concept before committing to a CDP contract is not optional — it is the only reliable way to verify that the platform handles your specific data model and use cases.
POC scope: Pick the three most important use cases for your CDP implementation (e.g., “track user actions in-product,” “sync subscription status to HubSpot,” “create re-engagement audiences in Google Ads”). Implement these end-to-end in the POC using real data volumes (or production-representative synthetic data).
Identity test cases to run:
- Anonymous user converts to identified — are pre-authentication events stitched?
- Same user on mobile and web with the same email — are profiles merged?
- User changes email address — how does the CDP handle the identifier change?
- Delete a user — is their data removed from all destinations?
Performance test: For high-event-volume applications, test the SDK’s event throughput without dropped events. At 100 events/second sustained, do events arrive in the CDP with accurate timestamps and no data loss?
Frequently Asked Questions
When should we use a CDP instead of building our own event routing infrastructure?
A CDP is justified when: your team lacks the engineering bandwidth to build and maintain a custom event routing and identity resolution system; you need real-time activation in 10+ destinations and the native integration library saves months of development; or you are in a regulated industry where the CDP provides compliance features (consent management, GDPR deletion) that would be expensive to build. For teams with strong data engineering capability and a warehouse-first orientation, reverse ETL tools plus a data warehouse often provide better capabilities at lower cost.
What is the pricing model difference between CDP vendors and how does it affect architecture decisions?
Most CDPs price on monthly tracked users (MTU) or events per month. At scale, CDP costs can be significant — Segment’s pricing has historically been $10–15 per 1,000 MTU/month at growth tiers. This creates an economic case for the warehouse-first alternative at high user volumes. RudderStack Cloud prices on events rather than users, which is more favorable for high-event-per-user products. Self-hosted RudderStack (open-source) eliminates per-event costs.
How do we migrate from one CDP to another without losing historical data?
CDP migrations are expensive. Historical event data that lives only in the CDP is typically not exportable in a standard format. Before selecting a CDP, ensure you also have raw event data in a data warehouse — if you do, the CDP migration only requires re-connecting sources and destinations, not exporting years of behavioral data. This is one of the most compelling architectural arguments for always having a warehouse alongside a CDP, rather than treating the CDP as the primary data store.
How do we evaluate a CDP’s identity resolution quality without a lengthy pilot?
Request a technical deep-dive on the identity graph implementation — not a marketing demo but a technical conversation with a solutions engineer who can answer: which algorithms are used, what the deterministic match rate is on a realistic dataset, and how identity graph conflicts are resolved. Ask for customer references who can speak to identity resolution accuracy in production.
What should we include in a CDP contract to protect against vendor lock-in?
Include: data export provisions (right to export all raw events and profile data in a standard format with 30 days notice), termination data portability (what happens to data after contract ends), and source code escrow for proprietary SDKs if self-hosting is not available. The most effective lock-in protection is architectural — storing a complete copy of all events in a warehouse you own means the CDP is replaceable.
Further Reading from Authoritative Sources
- MDN Web Docs — Storage API: Reference for browser storage mechanisms (localStorage, sessionStorage, cookies) used by CDP JavaScript SDKs for anonymous ID persistence — understanding storage limits and privacy restrictions is important for CDP implementation.
- W3C — Privacy Principles: W3C’s published privacy design principles applicable to CDP implementations — including data minimization, purpose limitation, and user transparency requirements that should be evaluated in any CDP under consideration.


