Mapping the Path to Conversion Across DSP and Sponsored Ads

The Slack message came in on a Tuesday: "Can you map our actual customer journey across DSP, Sponsored Products, Sponsored Brands, and Sponsored Display before they convert? I want to see a Sankey, not a table."

This sounds like a one-query ask. In practice it's the kind of question that eats a week. The data lives across three different AMC tables that have to be joined on user_id inside an attribution window. The query also has to bucket dozens of campaign IDs into a handful of meaningful groups, or the aggregation thresholds will swallow most of the path rows and return NULLs. And the version of the playbook that ships in most blog posts and Stack Overflow answers is the deprecated one. The current one looks similar and produces materially different output.

So I asked our agent. The agent has Amazon Agent Atlas behind it: a curated, semantically-indexed corpus of AMC playbooks, instructional queries, and audience patterns. Here's what came back.

What a model without Atlas gets wrong

I ran the same prompt against a frontier model with no retrieval. The response looked authoritative. It wasn't.

Warning — four failure modes in a single un-grounded response:
It used "Path to Conversion by Campaign" as the playbook name. That IQ was superseded on August 31, 2022 by Path to Conversion by Campaign Groups. The old name is still in older blog posts and SDK examples, so it surfaces in training data more than the replacement does. The deprecated version doesn't have the grouping logic; the output collapses on any account with more than a handful of campaigns.
It listed two tables (dsp_impressions and amazon_attributed_events) and missed sponsored_ads_traffic. The current IQ joins all three on user_id to assemble each user's pre-conversion exposure sequence. Without sponsored_ads_traffic, every SP, SB, and SD touchpoint disappears from the Sankey.
It didn't mention the 14-day attribution wait. Attribution events are still landing in amazon_attributed_events_by_traffic_time for two weeks past the query end date. Running the IQ before that window closes produces a path distribution that quietly under-counts late conversions.
It didn't mention campaign grouping. AMC enforces aggregation thresholds: any path row with fewer than the minimum unique users gets nulled out. With twenty raw campaign_id values fanning out across DSP, SP, SB, and SD, most of your path rows fall under the threshold. The grouping step ("DSP-Awareness", "SP-Branded", and so on) is the difference between a populated Sankey and a Sankey full of NULLs.

None of this is exotic AMC knowledge. It's in the playbook. It just isn't in the un-grounded model's working set.

What Atlas retrieves

When the agent gets the question, it does a semantic search across the amazon_ads collection and pulls four chunks before it writes a line of SQL:

The Customer Journey Analytics playbook, which is the parent document for this entire workflow. It defines what customer journey analytics is, names the funnel phases (awareness, consideration, retargeting), and ends with the Sankey-shaped output the operator was asking for.
The Path to Conversion by Campaign Groups instructional query, including the comments block that explains how to populate the grouping section. This is the post-Aug-2022 version. The agent flags the deprecated "Path to Conversion by Campaign" IQ and avoids it.
The Tables used chunk, which names the three required tables: dsp_impressions, sponsored_ads_traffic, and amazon_attributed_events_by_traffic_time. The agent surfaces this explicitly so the operator can audit the join shape.
The Sankey diagram transform chunk, which describes the Python step for shaping the AMC output into the source/target/value triples that a Sankey visualizer expects, plus the CSV export and the interpretation pattern ("loyalty and consideration overlap; awareness alone underperforms").

Atlas doesn't generate the SQL. It surfaces the playbook, and the agent adapts it.

The agent's output

Here's the query the agent produced. The grouping section is filled in for a brand with DSP awareness and consideration tactics plus the usual Sponsored Ads mix. The IQ template uses a CTE block named /* Campaign Group Start */ to make this easy to audit:

-- Path to Conversion by Campaign Groups
-- AMC Instructional Query, post-Aug-31-2022 version.
-- Joins dsp_impressions, sponsored_ads_traffic, and
-- amazon_attributed_events_by_traffic_time on user_id within the attribution window.

/* Campaign Group Start */
WITH campaign_group_mapping AS (
  -- DSP campaigns grouped by funnel intent
  SELECT campaign_id, 'DSP-Awareness'      AS campaign_group FROM (VALUES (101), (102)) AS t(campaign_id)
  UNION ALL
  SELECT campaign_id, 'DSP-Consideration'  AS campaign_group FROM (VALUES (201), (202)) AS t(campaign_id)
  UNION ALL
  -- Sponsored Products grouped by branded vs non-branded
  SELECT campaign_id, 'SP-Branded'         AS campaign_group FROM (VALUES (301), (302)) AS t(campaign_id)
  UNION ALL
  SELECT campaign_id, 'SP-Non-Branded'     AS campaign_group FROM (VALUES (401), (402)) AS t(campaign_id)
  UNION ALL
  -- Sponsored Brands & Display, single bucket each
  SELECT campaign_id, 'SB-All'             AS campaign_group FROM (VALUES (501)) AS t(campaign_id)
  UNION ALL
  SELECT campaign_id, 'SD-All'             AS campaign_group FROM (VALUES (601)) AS t(campaign_id)
),
/* Campaign Group End */

dsp_exposures AS (
  SELECT i.user_id,
         COALESCE(g.campaign_group, 'DSP-Others') AS campaign_group,
         i.impression_dt_utc                       AS event_dt_utc
  FROM dsp_impressions i
  LEFT JOIN campaign_group_mapping g
    ON i.campaign_id = g.campaign_id
),
sa_exposures AS (
  SELECT s.user_id,
         COALESCE(g.campaign_group,
           CASE
             WHEN s.ad_product_type = 'sponsored_products' THEN 'SP-Others'
             WHEN s.ad_product_type = 'sponsored_brands'   THEN 'SB-Others'
             WHEN s.ad_product_type = 'sponsored_display'  THEN 'SD-Others'
           END
         )                                         AS campaign_group,
         s.event_dt_utc                            AS event_dt_utc
  FROM sponsored_ads_traffic s
  LEFT JOIN campaign_group_mapping g
    ON s.campaign_id = g.campaign_id
),
all_exposures AS (
  SELECT user_id, campaign_group, event_dt_utc FROM dsp_exposures
  UNION ALL
  SELECT user_id, campaign_group, event_dt_utc FROM sa_exposures
),

paths AS (
  SELECT
    e.user_id,
    array_join(array_distinct(array_agg(e.campaign_group ORDER BY e.event_dt_utc)), ' > ') AS journey
  FROM all_exposures e
  JOIN amazon_attributed_events_by_traffic_time a
    ON e.user_id = a.user_id
   AND e.event_dt_utc <= a.event_dt_utc
  WHERE a.conversion_event_subtype = 'order'
  GROUP BY e.user_id
)

SELECT
  journey,
  COUNT(DISTINCT user_id) AS converters,
  SUM(1)                  AS conversions
FROM paths
GROUP BY 1
ORDER BY converters DESC
LIMIT 20;

Three things are worth noticing about what the agent chose to do. First, the grouping CTE is populated, not skipped. AMC's default tagging (DSP-Others, SP-Others, SD-Others, SB-Others) is a safety net, not a strategy. Leaving every campaign in its default bucket produces a Sankey that says "DSP-Others touched everyone" and tells the operator nothing. Naming groups by funnel intent (Awareness, Consideration, Branded, Non-Branded) is what turns the visualization into a decision tool.

The COALESCE against the default tags is the rule that prevents aggregation-threshold nulls. AMC drops any path row whose unique-user count is below the threshold. With dozens of raw campaign_id values, most rows fall under that floor. Collapsing the long tail into named groups lifts most of those rows above the threshold and back into the output.

The outer join uses event_dt_utc <= a.event_dt_utc to ensure exposures only count if they happened before the conversion. Reversing the inequality (or omitting it) inflates path counts with exposures that happened after the order, which doesn't make sense as a customer journey.

The footnotes the agent surfaced unprompted

Things Atlas surfaced that the operator didn't ask for:
Wait 14 days past the query end date. Attribution events keep landing in amazon_attributed_events_by_traffic_time for two weeks after a conversion. If you run the IQ for last week, you're reading an under-counted picture. Push the query window back fourteen days, or accept that the Sankey will shift after the next refresh.
Use the post-2022 IQ, not the original. The deprecated "Path to Conversion by Campaign" still appears in older docs and SDK samples. It doesn't support grouping, and its output structure differs enough that downstream Sankey transforms will silently break on it. The agent named the current IQ explicitly: Path to Conversion by Campaign Groups.
The three-table join is non-negotiable. Skipping sponsored_ads_traffic (a common mistake because amazon_attributed_events_by_traffic_time already has some sponsored-ads data) loses the SB/SD touchpoints, which is the entire point of mapping cross-channel paths.
Aggregation thresholds are silent. A row whose unique-user count is below the AMC threshold is simply absent from the output, not flagged with a "suppressed" marker. If your Sankey looks sparse, the first hypothesis is grouping is too granular, not that the data is missing.
Sankey rendering is a separate step. AMC returns rows of (journey, converters, conversions). A Sankey visualizer expects (source, target, value) triples. The transform is in the Customer Journey Analytics playbook as a Python notebook step; it exports to CSV for whatever visualization tool the team uses.

What happens next

The Sankey output is the artifact, but it's not the decision. The decision layer asks: where in the journey is the converter overlap highest, and which campaign groups are reaching users who never reach the next step?

For most accounts the answer is the same shape. Loyalty and consideration campaigns share many converters with each other and with sponsored branded traffic. Awareness campaigns appear in journeys but rarely overlap with later phases, which tells the operator that the upper-funnel investment is reaching a different population than the mid-funnel work. That's a budget question, not a measurement question.

Once the Sankey is in hand, the next post in this series picks up what amazon_rules says to do about it.

Why this matters

If your agents are guessing at the customer journey, the wrong query name is the cheapest mistake they'll make.

Next: how the agent uses the amazon_rules collection to turn a populated Sankey into a campaign-group reallocation plan.