In 2003, an angry man stormed into a Minneapolis-area Target store clutching a handful of advertisement mailers with the store’s distinctive red bullseye logo on them. He demanded to see the manager. When they sat down to talk, the man was livid. The mailers contained coupons for discounts on baby clothes, nursery furniture, and other newborn-related products. They were also addressed to the man’s teenage daughter. He was upset that Target was apparently encouraging high schoolers to get pregnant.
The manager apologized profusely, though he wasn’t sure what had happened to cause the coupons to be sent. Nonetheless, he followed up with the man by phone a week later to again apologize for the mix-up.
Only it turned out not to have been a mistake at all, the man sheepishly acknowledged. Unbeknownst to him, his daughter had gotten pregnant. And in the meticulously-tracked purchasing history kept by Target corporation, algorithms designed to predict that status from cues mined from the aggregated buying habits of millions of shoppers had sussed out her secret and sent her some helpful products to help welcome her new little bundle of joy.
What was invisible to her family and friends was plain as day to Target’s algorithms, derived from statistical analysis of historical purchasing patterns and used to predict likely interests based on shopping track records. Such algorithms are increasingly used by every retailer and many other American businesses as a part of the big data revolution.
Big Data Analysis is Shining a Light Into Behavioral Patterns
Today, the Target story has entered data science folklore as a tale of both the predictive power of big data and a cautionary tale about how that power is used.
But, as they were quick to stress, Target had broken no laws. And their system had, in fact, performed exactly as designed: it correctly identified a customer likely to need baby clothes and nursery furniture in the near future, and offered her discounted pricing on those items. The information used to make that prediction was not secret or even personal in any traditional sense. Dozens of people probably observed the girl making her purchases. There was no effort made to hide them and no reason to do so—the products were perfectly normal things to purchase.
It’s clear to most observers why the situation feels uncomfortable, however, even if we can’t quite articulate it. Pregnancy is a sensitive subject, with social rituals surrounding its announcement and outcomes. Target’s algorithm, for all its accuracy, couldn’t possibly take those social aspects into account. And it saw, unlike the other shoppers who might have seen the girl making her purchases, a little more keenly into what her motives might have been.
Computers care little for human mores and sensitivities. Although in Target’s case, a data scientist had been asked explicitly to identify likely pregnancies among customers and built an algorithm to that exact purpose, machine learning systems could have designed the same system without ever knowing what pregnancy was or even what the products involved are. Simply correlating SKU purchases over time could have identified the pattern.
One example of this is the predictive policing tools offered by PredPol. PredPol avoids using any personal data or information about individuals in its analyses. However, working from historical crime data and geographic information, the company is able to identify areas within a city that are likely to experience increased crime activity at a certain time.
The company then builds on that by continuing to monitor incoming crime data, identifying propagation patterns that both exist beside and result from the increased enforcement operations conducted by police as a result of the initial predictions. The resulting predictions take in adaptive behavior from both criminals and police… without explicitly knowing who those criminals or police are.
Nonetheless, privacy objections and concerns about how law enforcement uses the information are being raised. The idea that a computer might predict our behavior even before we have settled on a course of action is inherently unsettling, even when it is being used to protect society.
Customers Can Be Uncomfortable With Having Their Secrets Exposed
When the invisible is made visible, even if it’s not something that is explicitly a secret, people can become uncomfortable. Although the discreet items of data that are used to make these connections are harmless and unobjectionable, the power of analytics to tie them together and reveal less open aspects of our lives gives many consumers pause.
This is an aspect of big data where data scientists may be able to have the biggest impact, although it is not yet a focus of many companies. While machine learning systems can be built to analyze patterns for non-obvious insights, data scientists have the human capacity to determine whether or not those insights could be too sensitive to reveal.
This requires psychology as well as analytical skills. There appears to be a certain threshold where data insights become creepy rather than helpful, and that threshold isn’t driven by numbers.
When Amazon notes that you have recently purchased a new computer and shows you information about good deals on monitors, mice, and other common accessories, that is viewed as a helpful application of data analysis. When Target does something similar, using cues from purchases of unscented lotion and vitamin supplements, items that aren’t immediately tied in the popular consciousness to pregnancy, people get queasy about the very same process.
Target’s solution to fix that queasy feeling of being spied upon wasn’t to stop spying; it was to obscure it, by continuing to send coupons for baby products to women who it identified as likely to be or become pregnant, but also to mix in coupons for completely unrelated products like lawnmowers and hardware, so the targeting would be less obvious.
It worked. Target’s Mom and Baby sales exploded, and to date there have been no further complaints about the data science behind it.