Netflix Threw Its Customer Segments in the Garbage
For years Netflix sorted its viewers the way almost every company sorts customers: by who they are. Age, gender, geography. A 38-year-old woman in Ohio went in one bucket, a 22-year-old man in Texas in another. It is the most natural cut in the world, and Netflix had more of that data than almost anyone alive.
Then they threw it out. In 2016 their VP of product, Todd Yellin, said it plainly: "Geography, age, and gender? We put that in the garbage heap." (Fortune, 2016.) The demographics described the viewer perfectly and predicted almost nothing about what they'd actually watch. There are 19-year-old men who love Dance Moms and 73-year-old women deep into Breaking Bad. The variation inside any age-and-gender box turned out to be wider than the gap between boxes.
What replaced it was a cut by behavior — clusters of people who watch alike, whoever they are. Someone in New Orleans and someone in New Delhi can land in the same cluster. And that cut decided something the demographic one never could: which slice of the catalog to put on your home screen tonight. One cut named the person. The other changed the product.
I've spent the better part of a decade watching a smaller version of Netflix's mistake play out in room after room — and almost no one throws it in the garbage. I've sat through hundreds of segmentations cut by demographics and firmographics that named people beautifully and moved not one decision. The last chapter was about why those die. This one is about the thing underneath that nobody could hand me for years: the test that tells you, before you burn a quarter on it, whether a segment is real or just a label.
The maddening part is that teams keep making the dead kind anyway. They make a fresh one every year knowing the last one never changed anything, because everyone makes them. A deck cut by company size and industry looks exactly like every other company's deck, so it feels safe — and a thing that feels normal is very hard to stop doing.
If You Can Describe Them, You Think You've Got a Segment
If you can describe a group of customers, you think you have a segment. Enterprise. Premium. Parents. Churned. High-intent. Uses a competitor. Each one names a real set of people you can point at, count, and put on a slide — and the slide feels like a strategy. Most of the time it's a fresh coat of paint over the same confusion the team walked in with.
A Causal Criterion Changes One of Three Answers
A criterion is real — causal — only when knowing it about the segment changes your answer to one of three questions about that segment's Jobs:
- Do we know how to create value for those Jobs?
- Do we know how to earn our target margin per unit, competing for those Jobs?
- Do we know how to create demand, competing for those Jobs?
The criterion has to move at least one of those from a shrug to a real answer. If it moves none, it's a fake — a label that names a slice and decides nothing.
The examples are everywhere once you have the test. Netflix's taste cluster answers the first: knowing it, Netflix knows what value it can create — which slice of the catalog to surface tonight. "They run 40 field technicians, and every missed appointment turns into paid overtime" answers the second: the budget and the cost of failure are right there, so you can see the margin closing. "A head of revenue, three weeks into a board mandate to cut acquisition cost" answers the third: the demand is live, and the mandate tells you exactly what will trigger the call. Each one reads like a situation, because a real segment is a situation.
Notice what that first question rests on: the segment's Jobs. Name the work someone is trying to get done, and you're already most of the way to knowing how to create value for it: the Job points at what would help. The other causal criteria mostly do one thing on top of that — they sharpen the Job, usually by sharpening the context it runs in, and a sharper context tells you more exactly what to build. "Files a tax return" barely points at a product. "Files a return, runs an LLC with K-1 income, and got burned by a missed deduction last year" sharpens the same Job until the thing to build is obvious. The extra criteria didn't sit beside the Job. They refined it.
Fake criteria are expensive precisely because they feel like progress. The team names the segment, charts it, builds a dashboard around it, and walks out believing it understands its market. It has bought the feeling of understanding and nothing else. It still can't say what to ship, why the unit economics close, or how acquisition is supposed to work.
Run the test across the usual roster and most of it fails. "Enterprise," "premium," "parents," "churned," "high-intent," "uses a competitor" — each is useful for something, and each is blank on all three questions until you make it concrete. Enterprise starts deciding things the moment it stops meaning "big logos" and starts meaning SSO, SAML, audit logs, a procurement cycle, a security review, and a budget big enough to pay for all of it. Now it tells you what to build (the security and admin layer Notion and Slack spent years on), why the margin closes (a contract that carries the cost), and how to create demand (land the champion, then survive the security questionnaire). The word never changed. What it pointed at did. The cause was never the label on the box. It was always the concrete condition inside it.
Intent Is a Signal, Not a Segment
The most seductive fake of all is intent, because it smells like a buying signal — and an entire industry sells it to you as a segment. 6sense and Bombora will score your accounts, surface "high-intent" visitors, tell you who is "in-market right now." A 38-person Series A team pointed two quarters of work at exactly that label: a lead-scoring model, a retargeting budget, a sales alert that pinged a rep the moment a high-intent visitor opened the pricing page. The machine was clever. It could count the segment, chart it, and alert on it. It could never say what to build for it, why the margin closed, or who to actually call.
Intent tells you someone is moving. It doesn't tell you what to build, why the money works, or who they are. It's a demand signal that sits on top of a real segment, not the segment itself. Strip "high-intent" off that team's dashboard and ask what was underneath, and the real segment was concrete: a head of revenue at a 50-to-200-person SaaS company, three weeks into a board mandate to cut acquisition cost by 30%. That sentence routes the sale (call the VP, not the analyst who downloaded a whitepaper), justifies the price (a quarter of saved ad spend dwarfs the seat cost), and writes the pitch in one line. Intent told them when. The situation tells them what, why, and for how much.
And here's the part the dashboard hides. A downloaded whitepaper tells you someone is curious. It does not tell you whether there's a real, budgeted Job underneath (something this person has actually spent money or effort on before) or whether they're a tire-kicker who was never going to act. The intent score looks identical either way. You only find out which one you've got by looking at what they've already done, not what they just clicked.
A Criterion Is a Cause, Not a Symptom
The most common counterfeit is a symptom wearing a criterion's clothes. "They'll save $2,000 a year" is the outcome you hope to deliver, dressed up as a description of a person. The real criterion answers a harder question: what is going on in this customer's situation that turns $2,000 in savings into a reason to buy?
Symptoms are sneaky because they describe your good customers accurately. "Spent over $1,000 in their first six months," "placed more than two orders," "gave us a 9 on the survey" — every one of those is true of your best accounts, and not one can route a brand-new prospect, because a new prospect has zero past orders and zero survey answers. A symptom describes the outcome after the fact. A cause is the thing in the customer's world that produced it. "They run 40 field technicians and the dispatcher is already drowning" is a cause. "They deploy twenty times a day and every failed release blocks the whole engineering team" is a cause. You can act on a cause before the customer has done anything at all.
The Criterion Has to Survive Contact With a Stranger
A causal criterion earns its place by becoming observable before you spend real money on the customer. If a property explains your value, your margin, or your demand, then sales, marketing, or the product has to be able to detect it — a form field, a usage event, a firmographic proxy, or one question a rep can ask a stranger on a first call.
This is where the test turns into daily operations. An interior-design studio that knows its real criteria turns them into three inbound questions: "Is this your first renovation? What's your budget per thousand square feet? Does time matter more to you, or money?" The answers route the lead into the right group in two minutes. A B2B tool sets a cutoff at the funnel's mouth: "Under 5,000 SKUs we're not your fit; over 5,000, here's a rep." The qualification is the criterion, made operational. A segment you can't turn into a question is a segment you can't actually choose — you can only hope it walks in.
The best place to hunt for these criteria is the customers who already pay you the most and complain the least. The cause is usually hiding in their situation, not their job title — and there's a whole method for finding exactly those customers, later in the book.
Even a Demographic Can Be Real
None of this bans demographics. The test isn't "Jobs good, demographics bad." A demographic or firmographic becomes a causal criterion the moment it answers one of the three questions. "Companies with 200+ engineers" is a fake label — until it means "big enough to trigger a mandatory security review," which decides what to build, why the contract pays for it, and how the demand gets made. "Parents" is a demographic until "children under five" changes the success criteria, the frequency, the channel, and the willingness to pay. The label was never the problem. Being blank on all three questions was.
This is not a vocabulary complaint. A fake segment never announces itself — it feels like understanding, which is exactly why it survives. You name it, chart it, staff it, point a quarter of work at it. The bill arrives two or three quarters later, when the value doesn't land and nobody on the team traces it back to the afternoon the label went up on the wall.
Wes Says Half the Marketing Org Is Useless
The Test Can't Tell You Where to Start
The test tells you which criteria are real. It doesn't tell you which one to reach for first — and the order you cut the market in can throw away your best segment before you've ever met it.
Source · anchor
segmentation.md §6 (the four questions a segment must answer) · §7 (causal vs fake criteria; cause-not-symptom) · §8 (causal criteria become lead-qualification questions) · §2 (demographics become causal only when they change value/margin/demand) · ajtbd-key-theses.md §12 · job-types-and-properties.md §8 (intent without past action may sit on a Fake Job) · abcdx-segmentation-key-theses.md §8 (symptom-vs-cause criteria; qualifying questions) · Netflix: Fortune, 2016