Segmenting Your Highest-Value Customers: Three Seed Strategies

The Slack ping landed on a Thursday from our CRM lead: "Who are our actual best customers, and can we build a DSP audience that targets people who look like them?"

It sounds like a five-minute question. Ask ChatGPT and you get a single "high value" SQL block, almost always a total-spend filter, usually against the wrong table variant, almost never inside the seed-size band AMC actually requires. Search the AMC console's Instructional Query library and you find the answer, except it lives in two separate playbooks you have to stitch together yourself. By the time you've reconciled them, the campaign launch has slipped a day.

So I asked our agent. It runs on Amazon Agent Atlas, a curated corpus of AMC playbooks, DSP activation guides, and event-subtype references indexed for semantic retrieval. The answer came back in the shape the IQ library actually recommends: three seed audiences, tested separately, each sized into the 1,000 to 450,000 buffer before activation.

What a model without Atlas gets wrong

I ran the same prompt through a frontier model with no retrieval first. The SQL it produced looked competent. It would have failed at audience creation in at least four distinct ways.

Warning: five failure modes I saw in a single un-grounded response:
It selected user_id from conversions_all. Audience-build queries that return user IDs must run against the _for_audiences variant, in this case conversions_all_for_audiences. The sizing wrapper does the opposite: it counts user IDs against conversions_all. The Atlas chunk is explicit, "change user_id to count(user_id), remove the suffix of _for_audiences in the table name."
It returned one blended seed that combined SnS, multi-purchase, and a spend threshold joined with AND. The result was an overly specific cohort of around 300 users. The IQ guidance is the opposite, "we recommend testing these audiences separately to avoid overly specific seeds."
It hallucinated the event filter as event_subtype = 'SnS'. The real values are event_subtype IN ('firstSnSOrder', 'repeatSnSOrder'). The repeatSnSOrder value was added to Flexible Shopping Insights on 02/05/2024, so any model with a training cutoff before that date does not know it exists and will quietly miss every recurring SnS shipment.
It said nothing about the 500 to 500,000 hard sizing bound. Audience refresh fails outside that range, silently for the operator until the DSP line item runs empty.
It left a comment as the final line of the SQL. AMC's audience pusher rejects any query whose last line is a comment, with an error message that does not tell you which line.

Any one of these would have cost the launch day. The combination would have looked like a working audience right up until the DSP line item failed to spend.

What Atlas retrieves

When the agent gets the question, it does a semantic search across the amazon_ads collection and pulls six chunks before writing a single line of SQL:

The Introduction to AMC Lookalike Audiences playbook, section 3.2, which enumerates the three high-value seed strategies (Subscribe and Save, Multiple Purchases, Total Purchase Value) and gives the verbatim recommendation to test them separately to avoid overly specific seeds.
The same Introduction to AMC Lookalike Audiences chunk on the companion measurement query template, the 500 to 500,000 hard bound, and the recommended 1,000 to 450,000 seed-size buffer.
The Understanding Amazon's Subscribe & Save Repeat Purchases chunk, which documents the firstSnSOrder and repeatSnSOrder event subtypes, the 02/05/2024 enhancement date when repeat signals were added, and the Sandbox restriction on SnS data.
The Flexible Shopping Insights Trial Guide chunk, which carries the working conversions_all_for_audiences template for the SnS seed and the audience-vs-sizing table-name swap pattern.
The Identifying High Value Customer Segments chunk, which frames Total Spend seeds through percentile rank (top 71-100, top 96-100) rather than a flat dollar threshold, and endorses these segments specifically as lookalike seeds.
The AMC Lookalike Audiences for Promotional Events chunk, which explains the gap lookalikes fill (the high-value new-to-brand and impulse buyers that DSP rule-based audiences miss) and warns against using lookalikes in the lead-out phase of a promotion.

Atlas does not generate the SQL. It surfaces the playbooks with the right caveats attached, and the agent adapts them.

The agent's output

The agent returned four SQL blocks: three seed audience queries and one companion sizing wrapper. The seed queries go into the AMC Audiences query editor, the sizing wrapper runs in the main query editor against the non-_for_audiences table.

-- =========================================================================
-- Companion measurement query (sizing): runs in the MAIN AMC query editor
-- against conversions_all (not _for_audiences). Use this BEFORE pushing each
-- seed to audience creation. Hard fail: <500 or >500,000.
-- Recommended buffer: 1,000 to 450,000.
-- =========================================================================
SELECT
    COUNT(user_id) AS user_count
FROM (
    {UPDATE: paste one of the three seed queries below here, but swap
     conversions_all_for_audiences -> conversions_all and
     SELECT DISTINCT user_id -> SELECT user_id }
)
GROUP BY 1


-- =========================================================================
-- SEED 1: Subscribe & Save subscribers
-- Run in the AMC AUDIENCES query editor (NOT the main editor) because it
-- selects individual user_id values.
-- STRIP ALL COMMENT LINES before pushing to audience creation. The trailing
-- GROUP BY 1 at the very end is what makes the push succeed.
-- =========================================================================
WITH sns_users AS (
    SELECT
        user_id,
        COUNT(DISTINCT conversion_id) AS sns_purchase_times
    FROM conversions_all_for_audiences
    WHERE event_subtype IN ('firstSnSOrder', 'repeatSnSOrder')
      -- Optional: scope to your ASIN(s). Remove this AND clause to capture
      -- all SnS purchasers across the brand.
      AND tracked_asin IN ('{ASIN_1}', '{ASIN_2}')   -- {UPDATE} or remove
    GROUP BY user_id
)
SELECT DISTINCT user_id
FROM sns_users
WHERE sns_purchase_times >= 1
GROUP BY 1


-- =========================================================================
-- SEED 2: Multi-Purchase (users with multiple distinct ASIN orders)
-- =========================================================================
WITH multi_asin_users AS (
    SELECT
        user_id,
        COUNT(DISTINCT tracked_asin) AS distinct_asin_count
    FROM conversions_all_for_audiences
    WHERE event_subtype = 'order'
      AND tracked_asin IN ('{ASIN_1}','{ASIN_2}','{ASIN_3}')  -- {UPDATE}
    GROUP BY user_id
)
SELECT DISTINCT user_id
FROM multi_asin_users
WHERE distinct_asin_count >= 2   -- {UPDATE: raise to 3+ if seed > 450k}
GROUP BY 1


-- =========================================================================
-- SEED 3: Total Spend threshold
-- Threshold is configurable. Atlas recommends a percentile-rank approach
-- (e.g., top 71-100 percentile) rather than a hardcoded dollar value.
-- A flat dollar threshold is shown here for simplicity.
-- =========================================================================
WITH spend_by_user AS (
    SELECT
        user_id,
        SUM(purchase_amount) AS total_spend
    FROM conversions_all_for_audiences
    WHERE event_subtype = 'order'
      AND tracked_asin IN ('{ASIN_1}','{ASIN_2}','{ASIN_3}')  -- {UPDATE}
    GROUP BY user_id
)
SELECT DISTINCT user_id
FROM spend_by_user
WHERE total_spend >= {SPEND_THRESHOLD}   -- {UPDATE: configurable, e.g., 200}
GROUP BY 1

Three queries, not one. The IQ library explicitly recommends testing these audiences separately to avoid overly specific seeds, and the math agrees: combining filters with AND collapses the overlap into a cohort that often falls under the 500-user hard bound. The operator runs all three through the companion sizing query first, confirms each lands in the 1,000 to 450,000 buffer, and only then optionally unions them. The agent does not start by unioning. The table name swaps between the seed and the sizing wrapper: conversions_all_for_audiences is required for the seed queries because AMC's audience builder only permits SELECT user_id against the _for_audiences table variants, while the sizing wrapper runs in the main query editor against the regular conversions_all table. The agent stages the swap inside the wrapper so the operator cannot paste the wrong table name into the wrong editor.

The SnS filter includes both firstSnSOrder and repeatSnSOrder. Limiting to repeatSnSOrder alone would exclude users on their first scheduled subscription order, which is exactly the cohort you want to lookalike against. Both values together capture the full SnS-engaged audience that became available on 02/05/2024 when Flexible Shopping Insights was enhanced to include repeat purchase signals. The Multi-Purchase and Total Spend queries are not verbatim Atlas snippets. The agent adapted the SnS template pattern, swapping the WHERE filter to event_subtype = 'order' and the aggregation to either COUNT(DISTINCT tracked_asin) or SUM(purchase_amount) per user.

The Total Spend threshold is a placeholder. The right number is corpus-specific. Atlas points at a percentile-rank approach (top 71-100 percentile, or the tighter top 96-100 for premium-spend seeds) rather than a hardcoded dollar value. The {SPEND_THRESHOLD} token in the SQL is operator-configurable, and the parenthetical "e.g., 200" is illustrative only, not a recommendation.

The footnotes the agent surfaced unprompted

This is the part that separates retrieval-grounded responses from fluent guesses. Without being asked, the agent attached a short list of caveats to the artifact.

What Atlas surfaced that the operator didn't ask for:
Hard sizing bounds: 500 and 500,000. Audience refresh fails if the seed size falls below 500 or above 500,000. Aim for the 1,000 to 450,000 buffer to keep refreshes alive as the cohort drifts.
Sandbox gap. The sns_subscription_id field and the repeat SnS purchase signals are not available in AMC Sandbox. You cannot dry-run the SnS seed there. Run against production, or you will see zero rows and assume the seed is broken.
Version cliff on 02/05/2024. Repeat SnS purchase signals were added to Flexible Shopping Insights on that date. Any model whose training data predates it will produce a SnS seed that returns zero rows. This is one of the cleanest examples of what Atlas catches that an ungrounded model cannot.
Comment-line trap. Per the IQ playbook, we recommend removing all comment lines from the query before pushing to audience creation. The trailing GROUP BY 1 works as a safe terminator because the audience pusher will not accept a query whose last line is a comment, but only if the comments above are also stripped.
Category eligibility. Only certain product categories (Beauty, Grocery, and a handful of others) are eligible for Subscribe & Save. If your catalog sits outside those categories, the SnS seed will undersize regardless of how broad you make the ASIN filter.
Not for the lead-out phase. Lookalike audiences are not appropriate for the lead-out phase of a promotional campaign, where the NTB mix shifts. For lead-out, switch to AMC rule-based audiences instead.

Each of these would have cost the operator at least an hour to discover by hitting the failure first.

What happens next

Once each seed lands inside the 1,000 to 450,000 buffer, the operator pushes it to AMC Audiences from the Audiences query editor. AMC compiles the audience and activates it to Amazon DSP. The standard DSP activation lag is around 48 hours before the audience materializes and becomes targetable in line items, so the push has to land at least two days before the campaign launch, not on the day of.

The agent also recommends setting each seed query as a recurring AMC workflow so the lookalike model rebuilds on a cadence. Weekly is typical for high-value cohorts, because the SnS seed in particular grows as repeat purchase signals accumulate week over week. Static lookalikes age out fast.

For the DSP build, the agent targets each lookalike in a separate line item rather than stacking them in one. Three line items, one per seed, gives a clean read on NTB rate and ROAS by seed strategy after the first 14 days. The Total Spend lookalike usually wins on order value, the Multi-Purchase lookalike on order frequency, and the SnS lookalike on retention. Knowing which is which informs the next round of seeds.

Why this matters

Three seeds, three sizes, one DSP activation: the difference between a good guess and a working audience. The seed strategies are not novel, but the constraints around them are, and most of those constraints are not in any single playbook. The agent's value is not that it wrote better SQL than a model could write from memory. The agent's value is that it pulled the right two playbooks, applied the right table swap, surfaced a six-month-old event_subtype that most models still do not know about, and attached the activation timing the operator needed before they hit it.

If your agents are guessing at AMC high-value seeds, they don't have to be.

Part of an ongoing series on how agents grounded in Amazon Agent Atlas approach real AMC workflows. Next: lookalike audiences for Prime Day, taking these three seeds through the promotional-event activation playbook, including the lead-in versus lead-out split that determines whether a lookalike belongs in the line item at all.