Profiles Sync Sample Queries
Unify requires a Business tier account and is included with Engage.
See the available plans, or contact Support.
On this page, you’ll find queries that you can run with Profiles Sync to address common use cases.
The examples in this guide are based on a Snowflake installation. If you’re using another warehouse, you may need to adjust the syntax.
About example schemas
The queries on this page use two example schemas:
ps_segment
, a schema where Segment lands dataps_ materialize
, a schema with your produced materializations
These schema names may not match your own.
Monitor and diagnose identity graphs
These queries let you view and manage identity graphs, which give you insight into unified customer profiles generated by identity resolution.
Show how many profiles Segment creates and merges per hour
This example queries the id_graph_udpates
table to measure the rate at which Segment creates and merges profiles, as well as the type of event that triggered the profile change:
SELECT
DATE_TRUNC('hour',timestamp) as hr,
CASE
WHEN canonical_segment_id=segment_id
THEN 'profile creation' ELSE 'profile merge'
END as profile_event,
triggering_event_type,
COUNT(DISTINCT triggering_event_id) as event_count
FROM ps_segment.id_graph_updates
GROUP BY 1,2,3
Isolate profiles that have reached an identifier’s maximum configured value
Segment’s configurable identifier limits let you set maximum values for identifiers like email. These maximum configured values help prevent two separate users from being merged into a single Profile.
The following query lets you view Profiles that have reached a configured limit for the email identifier:
WITH agg AS (
SELECT
canonical_segment_id,
COUNT(LOWER(TRIM(external_id_value))) as value_count,
LISTAGG(external_id_value,', ') as external_id_values
FROM ps_materialize.external_id_mapping
WHERE external_id_type='email'
GROUP BY 1
)
SELECT
canonical_segment_id,
external_id_values,
value_count
FROM agg
WHERE value_count > 5 -- set to your configured limit
Reconstruct a profile’s traits
Identify the source that generated the value for a particular trait for a canonical profile as well as its child profiles
When a merge occurs, Segment selects and associates a single trait value with a profile. This logic depends on how you materialize the profile_traits
table.
You can break out a profile, though, to see the trait versions that existed before the merge. As a result, you can identify a particular trait’s origin.
The following example inspects a particular profile, use_XX
, and trait, trait_1
. The query reports the profile’s last observed trait, its source ID, and any profiles Segment has since merged into the profile:
SELECT * FROM (
SELECT
ids.canonical_segment_id,
ident.segment_id,
ident.event_source_id,
ident.trait_1,
row_number() OVER(PARTITION BY ident.segment_id ORDER BY ident.timestamp DESC) as rn
FROM ps_segment.identifies as ident
INNER JOIN ps_materialize.id_graph as ids
ON ids.segment_id = ident.segment_id
AND ids.canonical_segment_id = 'use_XXX'
AND ident.trait_1 IS NOT NULL
) WHERE rn=1
Measure and model your customer base
Pull a complete list of your customers, along with their merges, external identifiers, or traits
The following three snippets will provide a full list of your customers, along with:
- The profile IDs merged into that customer:
SELECT
canonical_segment_id,
LISTAGG(segment_id, ', ') as associated_segment_ids
FROM ps_materialize.id_graph
GROUP BY 1
- The external IDs associated with that customer:
SELECT
canonical_segment_id,
LISTAGG(external_id_value || '(' || external_id_type || ')', ', ') as associated_segment_ids
FROM ps_materialize.external_id_mapping
GROUP BY 1
- The customer’s traits:
SELECT * FROM ps_materialize.profile_traits WHERE merged_to IS NULL
Pull the latest subscription status set for every profile identifier in the space
Provides the latest subscription status set for all identifiers in the space. This query will not include identifiers that have no subscription status ever set.
SELECT evt1.user_id, evt1.channel, evt1._id id, evt1.status, evt1.received_at
FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED evt1
JOIN (
SELECT _id, MAX(received_at) AS max_received_at
FROM ps_segment.CHANNEL_SUBSCRIPTION_UPDATED
GROUP BY _id
) evt2
ON evt1._id = evt2._id AND evt1.received_at = evt2.max_received_at
ORDER BY 1
Show all pages visited by a user
To get complete user histories, join event tables to the identity graph and aggregate or filter with id_graph.canonical_segment_id
:
SELECT
id_graph.canonical_segment_id,
pages.*
FROM ps_segment.pages
LEFT JOIN ps_materialize.id_graph
ON id_graph.segment_id = pages.segment_id
WHERE canonical_segment_id = ‘use_XX..’
Show the complete history of a trait or audience membership associated with a customer
Suppose you want to track a user’s entrances and exits of the audience aud_1
. Running the following query would return all qualifying entrance and exits:
SELECT
id_graph.canonical_segment_id,
identifies.aud_1,
identifies.timestamp
FROM ps_segment.identifies
INNER JOIN ps_materialize.id_graph
ON id_graph.segment_id = identifies.segment_id
AND identifies.aud_1 IS NOT NULL
This query works with any Trait or Audience membership, whether computed in Engage or instrumented upstream.
FAQs
Can I view Engage Audience membership and Computed Trait values in my Warehouse?
Yes. Engage sends updates to Audience membership (as a boolean) and computed trait value updates as traits on an Identify call that Segment forwards to your data warehouse.
The column name corresponds to the Audience or Trait key shown on the settings page:
Surface these values the same way as any other trait value:
- The Trait’s complete history will be in
identifies
- The Trait’s current state for each customer will be in
profile_traits
What is the relationship between segment_id
and canonical_segment_id
? Are they unique?
Identity merges change Segment’s understanding of who performed historical events.
For example, if profile_b
completed a “Product Purchased” event but Segment understands that profile_b
should be merged into profile_a
, Segment deduces that profile_a
performed that initial “Product Purchased” event.
With that in mind, here’s how to differentiate between segment_id
and canonical_segment_id
:
segment_id
is a unique identifier representing Segment’s understanding of who performed an action at the time the action happened.canonical_segment_id
is a unique identifier representing Segment’s current understanding of who performed that action.
The mapping between these two identifiers materializes in your id_graph
table. If a profile has not been merged away, then segment_id
is equivalent to canonical_segment_id
. If a profile has been merged away, id_graph
reflects that state.
As a result, you can retrieve a customer’s complete event history by joining an event table, like product_purchased
to id_graph
.
For more information, view the Profiles Sync tables guide.
Does Profiles Sync data ever differ from Unify data?
Profiles Sync mimics the materialization performed by Segment Unify. A user’s merges, external IDs, and traits should be expected whether they’re queried in the warehouse, Profile API, or viewed in the UI.
The following edge cases might drive slight (<0.01%) variation:
- Data processed by Unify hasn’t yet landed in Profiles Sync.
- If you rebuild or use non-incremental materialization for
profile_traits
, Profiles Sync will fully calculate traits against a user. As a result, Profiles Sync would ensure that all traits reflect the most recently observed value for fully-merged users.
By contrast, Segment Unify and incrementally-built Profiles Sync materializations won’t combine already-computed traits across two merged profiles at the moment of merge. Instead, one profile’s traits will be chosen across the board.
What hash function is used for the external_id_hash field by Profiles Sync?
The external_id_hash
is a hash of the external_id_type
and external_id_value
using SHA-1. This field corresponds to the primary_key
for the table: hash (external_id_type and external_id_value)
.
For example, in BigQuery the logic is: TO_HEX(SHA1(concat(external_id_type, external_id_value))) as seg_hash
.
This page was last modified: 04 Jul 2024
Need support?
Questions? Problems? Need more info? Contact Segment Support for assistance!