RFM versus Predictive Modeling
First published 2/21/02
This article was written after this
article ran describing how "predictive modeling techniques outperformed Recency-Frequency-Monetary value
(RFM) targeting in a back-to-school
campaign." I received a ton of e-mail asking for an
explanation of this confusing claim.
For those of you not well versed in what behavioral modeling
is all about, this article provides a look inside and addresses some
very Frequently Asked Questions on modeling.
For those looking for some resolution on issues brought up in the DM
news article, I decided to just write this
response and point all the queries to it (saves much typing!). Thanks to all the
fellow Drillers out there who thought there was something a bit off
in this article.
Introduction
Let me make it clear upfront that I don't know either company
involved and am not making any judgments on the way this promotion
was designed or executed. I do however have a problem with the
presentation of the article, especially the opening paragraph -
"predictive modeling outperformed RFM" - which at best is
very misleading based on the facts provided, and at worst is an
intentional obscuring of the facts to push a particular agenda.
The following is my best guess as to what is going on here and why
the results ended up as they did based on the facts
provided.
RFM as Straw Man?
Think about this campaign: it was a back-to- school
promotion. It's held at a fixed point in time, happens every year.
The people running the campaign seem to have a lot of experience using
RFM, both on the agency and client side.
One thing they should know given the type of promotion and
experience of the players is this: RFM is not a valid scoring
approach for at least one segment of the population - heavy cyclical
buyers. These are the folks who are primarily promotional
buyers, not "regular customers." Given back-to-school
is the first major promotion in the retail calendar, it may have been
quite some time since these promotional buyers had made a purchase in
the promoted categories - perhaps since the after holiday blow-out
sale.
Knowing all this, they would certainly be aware RFM scoring would
demote this promotional buyer because they are not
"Recent." So a sub-optimal scenario is set up relative
to the usage of RFM scoring. RFM has at least one hand tied
behind its back on this promotion, because some (perhaps many,
high volume) known heavy buyers are intentionally excluded.
Under these conditions, it's not surprising just about any model,
including "let's mail to heavy buyers who bought last year"
would beat RFM if you were in fact mailing the entire population in
a controlled test.
So let's look at some possible scenarios to explain the results
claimed in this case.
They're smart, but awful case writers, or
the case was edited and many of the key facts people would want
to know excluded
There is no mention of methodology in this case, not even the
phrase "controlled test" and there are no ROI
comparisons. To make the statement about "beating RFM"
one would expect some shred of evidence besides the top line "spent
2.5 times more per direct mail piece than those chosen through
RFM." OK, but what was the profit comparison? How
much did the model cost? Was there discounting, and if so, what
about subsidy costs? Were control groups used to measure subsidy
costs? And on and on. You get what I mean. If this
group included heavy cyclical buyers, my first question is this: how
many of them would have bought anyway without mailing
them? If you don't know the answer to this question, any claims
become suspect.
They're smart, but not terribly honest
There were 40,000 customers chosen with the model and they "would not have mailed to any of the 40,000
using RFM." Yet they mailed 60,000 customers using
RFM. Why? If the model was so much better at selecting
targets, why use RFM at all, and in such a big way? Clearly,
they mailed a lot more people than you need to execute a controlled
test.
This is not normally how one would execute this promotion - unless
one knew they were working with different populations (one Recent,
one not) and used different scoring approaches for each. If this
is the way they did it, that's smart. But in no way does it
support the statement "predictive modeling outperformed
RFM"; different groups were scored differently, and the gig was
rigged. Any claims under this scenario could be assumed to be
intentionally designed to mislead a reader, or represent a significant
lack of experience on the part of people making this kind of claim.
They're not as smart as they seem
Serendipity is a wonderful thing and my favorite part of direct
marketing. Yea, it's all pretty scientific, but sometimes you
just get results that you didn't expect or plan for - one way or the
other. What if they simply said, "Hey, let's run a model on
everyone we didn't pick with RFM, and see what happens if we mail
them." Essentially a model test, but with a huge percentage
of the population, which is a bit strange if you don't already have
"gut feel."
In this case, they were not thinking of the heavy cyclical buyers
at all, and not thinking of the obvious impact of using RFM scoring on
a population "rigged" to fail - they would simply run a
model and follow the output. And it worked very well, because
the model teased out a pretty obvious mailing strategy from the
customer base (as models frequently do). They simply were not
aware of and had not thought of the implications underlying the
results and made an inappropriate comparison.
In fact, look at the parameters of this model they provided us
with:
- customer purchase behavior, such as
the average number of months between purchases and the amount
spent
Well folks, that's a Latency
model if I've ever heard one, and certainly implies this
group had a Recency problem, at the very least. RFM would be
rigged to fail under this scenario
- only a half a percent the model selected were previous junior
apparel buyers or previous children's apparel purchasers
Hard to tell what this means without knowing the full story, but
here's one thought - product history didn't matter a bit. These
were just buyers who bought whatever, whenever prompted at the right
time with the right offer - the classic sign of a discount prone,
highly subsidized promotional buyer.
In this scenario, the players are innocent of any intentional
malice - but still cannot make any claims about modeling versus
RFM. They intentionally created two populations and scored them
differently, and got rewarded for trying something new. Hey,
that's great!
OK, now that we've gone through these examples, let me address some
issues on RFM and custom modeling in general. Hopefully, this
information will be of value to people when they are faced with
interpreting data and making decisions in the analytics area.
You say Tomato, I say Celery
Let's talk briefly about populations and target selection.
Those of you who know RFM and response
models in general know they are ranking systems. They rank the
likelihood of people to respond to the promotion, from highest
likelihood to lowest likelihood. People at the "top"
of the ranking are the very most likely to respond; people at the
bottom of the ranking are the very least likely to respond.
Offline, the top 20% of the ranking usually has a response rate from 5
to 40 times higher than the bottom 20% of the ranking.
Online, the difference is even greater.
On any scored
population, RFM or customer model, I can select how far down into
the ranking to
mail. Do I want to mail the top 10% most likely to respond, the top
20%?
As you include more and more people, the average likelihood to respond
drops rapidly.
If you mailed deeply into
an RFM scored population, let's say covering the top 50% of the
rankings, and
did a very shallow mailing to the custom model population, say
covering the top 10% of the rankings, then I have no doubt in my mind you could get
the per mailer results and comparative stats mentioned:
"Names selected using predictive modeling had a four times
higher average monthly spending rate... a three times higher
purchase rate... spent 2.5 times more per direct mail piece than those
chosen through RFM."
"Selected" is the operative word here. If only the
best and most likely to respond were selected using the model, but on
the RFM side you mailed much more deeply into the scores, including
lots of people with lowered likelihood to respond, you end up with a
completely self-fulfilling prophesy, not a "predictive model that
beats RFM." Not even close.
I'm not saying this happened in the article we just looked
at. I'm saying a statement along the lines of "the top 20% most
likely to respond groups in both the RFM and custom model
populations were selected" is something you always, always look
for when you are in this space. If you have people pitching you
any kind of analytics, make sure you are dealing with fair
comparisons. You can make anything look fantastic by fooling
with the knobs and levers in the background.
Kissin' Cousins
Folks, RFM is a predictive
model. It
predicts behavior based on past activity; RFM is no different in that
respect than a "predictive model" you paid some modeler
$50,000 for. So to make the statement "predictive modeling
beat RFM" is just a bit circular in the first place, and one wonders what the intent
of making a statement like that could be. If you said "A Latency
model beat a Recency model in a
Seasonal Promotion" then I'd have no problem with that at all,
but would wonder why it's a news item. As explained above, it's
pretty much common sense.
Latency is nothing more than Recency with a twist; instead of
counting "days since" using today, you count "days
since" using a fixed point in time. Latency can work much
better than Recency when there are external cyclical factors involved
- like seasonal promotions.
For example, if you have not filed a tax return Recently, it does
not mean you are less likely to file one in the future. All it
means is there is an external cyclical event (April 15th in the US)
controlling your behavior. If you had not filed one in 18 months
(18 months Latent), then I would start to question likelihood to file.
The optimum solution is often to use RFM (Recency Frequency
Monetary) and LFM (Latency Frequency Monetary) in tandem targeting the
appropriate populations, as was (apparently) done in this
promotion. Smart.
Crop dusting with the SST
If you are not doing any data modeling at all, the ROI of
implementing an advanced model can be substantial. But the real
question is this: will the improvement gained by using an advanced
predictive model be enough to cover the cost of it relative to
the improvement gained by using a simple model?
Given that most
advanced "response models" like the one in the article use
Recency or Latency and Frequency as the primary driving variables, it's a
valid question to ask. Here's a dirty little modeling secret:
most, if not every "response model" built includes Recency /
Latency and Frequency as primary variables, whether created "top down" by a human or
"bottom up" by a machine (so called data mining). The
primary difference is this: they add 3rd, 4th, 5th etc. variables
which incrementally improve ROI - all else equal.
In other words, RFM is the low hanging fruit, often buying you 10x
or 20x response rate improvement. You want the next
10%? Get a custom model, and make sure the price you
will pay is worth the diminishing returns.
Just because RFM is a simple, easy to implement, standardized
predictive model, people pick on it. They want you to pay
through the nose for a "good model" because, my simple friend,
you could not possibly do any modeling
yourself. Now, am I
saying that RFM is better than a model created by a roomful of
modelers? Of course not. The question, as always, is this:
will it improve your performance enough to cover the modeling cost;
what is the ROI?
RFM Slandered - again?
Speaking of picking on RFM, I was wondering what's up with this
statement in the article:
"When working with
RFM, you are really only looking at three elements, and you never get
to see the rest of the prospects in a database that have other
characteristics that could lead them to become buyers in a given
area." Well, that may be the way they use RFM, but
it's certainly not the only way.
There is no reason you can't load up on any variable you
want with RFM scoring. Those who have read my book know this approach is
fundamental to the Drilling
Down method. RFM is the Swiss Army knife of behavioral
models, and can be used in very many ways.
Choosing to use the original, pre-computers, late 1950's version of
RFM is simply that - a choice. Or you could choose to use a
totally bastardized
version from who knows where. Like any tool, you need to
really know how to use it to get the most out of it.
I'm a solution, I'm a problem
To conclude, I have nothing against custom
models. I use them when appropriate. I have nothing
against the design or execution of the promotion. I have a big
problem with the way the article was presented, resulting in
a claim appearing to lack sufficient backup.
Those of us in the modeling space need to help people understand
how behavioral modeling works by presenting clear and clean
examples. Fast and loose "cheerleading" is what
got the CRM folks into the mess they are in, and we don't want
Business Intelligence or Customer Analytics or whatever
"space" we are in this month to experience the same
fate.
If anyone, including the original retail and agency players
in the article above have comments on my analysis or in general on
this topic, I'd be glad to post them. Heck, if the players have
"the whole" case study available for review, I'll provide
the download link right here. It was a sweet promotion,
really.
But what I want to know is exactly what happened with all the
glorious details - so I can learn something from it, or use the stats
to confirm what I already know. And we owe those folks just
beginning to get a grip on behavioral modeling the same courtesy.
That's why we're all here. To learn.
|