Cambridge Analytica was a little-known British political consulting firm when it catapulted to prominence in 2016, credited with being part of the digital strategies that resulted in the United Kingdom voting to leave the European Union… and Donald Trump being elected President here in the U.S.
Both events were surprise upsets and the company basked in the limelight of their success of their data-driven voter targeting algorithms.
But when news broke earlier this month that the company had based that targeting on personal information harvested from 50 million Facebook users under misleading pretenses, and possibly with Facebook’s complicity, the backlash unleashed a tidal wave through the data science community.
Uncorking the shady aspects of the Cambridge Analytica story has also turned up numerous other sketchy practices on the part of Facebook and other big data collectors, including Google and Amazon. Collecting data only tangentially related to purported permissions… deceptive and difficult opt-out practices… and loose control over collected paints a clear picture of the trends surfacing in recent days.
Legal consequences are already rolling, with the British High Court granting a search warrant to the UK’s Information Commissioner to search and seize evidence from Cambridge Analytica, and the Federal Trade Commission opening an investigation into Facebook’s privacy practices.
And the waves don’t stop at Facebook’s shores… President Donald Trump has taken to tweeting about Amazon’s practices, sending the stock nearly 5 percent lower on concerns over similarly targeted investigations, before refocusing on red herring issues related to the US Postal Service incurring losses from Amazon deliveries.
The stock bounced, but the social and business consequences may be playing out over a longer time span. And data scientists throughout the field are doing some soul searching over their role and their future on the playing field that the Cambridge Analytica scandal has uncovered.
Privacy Breaches A Chain Of Events Wholly Owned By Data Science
It’s a scandal that is wholly owned by data science. The implications of the field are such that these events were inevitable. The fact that Cambridge Analytica was funded by Robert Mercer, a prominent data scientist who helped develop the Brown clustering model for natural language processing, only ties it more strongly to the data science world.
A lot of prominent executives at Facebook, such as found Mark Zuckerberg, find themselves trying to argue it both ways: oh, sure, data science is a powerful, world-altering force that can make great changes in society and human behavior… but, oh, no, our modest little social network couldn’t possibly have swayed a presidential election.
The position is ridiculous, particularly coming from an industry that was quick to seize on accolades for enabling the series of revolutions that ousted a series of despotic Middle Eastern leaders and led to reforms in many other regional governments. It also undermines the ad sales business that propels the company.
But these are the promises data science, and data scientists, have always made. The real shock should be that achieving them isn’t anyone’s idea of a real accomplishment.
Revelations of What Big Data Really Consists of Rock Both Worlds
It’s a moment when the mask has been ripped off. Many data scientists are amazed that average Facebook users aren’t already aware of what data is being collected about them and how it is being used. In the industry, these assumptions are commonplace.
It may be only now that data scientists are being confronted with the fact that not everyone in their social graphs are comfortable with the way the information about them is being used.
It may also be a moment where at least some of the public is confronted with the new reality data science has ushered in, a reality that data scientists have had time to become comfortable with already: massive data collection, and the implied uses of those large data sets, are baked into the modern world. Absent a return to pre-internet consumer society, the genie isn’t going back in the bottle.
So when Facebook issued a public statement claiming that it had been given permission to collect call and SMS data, it was read as tone deaf largely because it was coming from within a world where such collection and use is simply recognized as an accomplished fact.
Using the word “breach” to describe what happened—as many major media outlets have done—is a serious misunderstanding of both the incidents and the consequences.
In fact, as acknowledged, many of the actions taken were both entirely within the scope of the terms of service and explicitly envisioned as uses of the data involved. Indeed, Facebook’s business model depends on demand for that data and the targeting capabilities it offers.
The Future of Data Science and Modern Society Is at Stake
Facebook has argued that these collection activities—and the subsequent uses of that data, whatever they turn out to be—have been countenanced by click-wrap agreements made by users when they signed up for the service. Despite being utterly tone-deaf, this is largely correct, but it’s also only a small slice of the argument.
If the trade-off for exposing personal data to Facebook is being able to exchange funny cat pictures with your cousin in Dubuque, there’s a non-trivial number of users who will say no to that. But that’s not really the trade off. Facebook is only the face of the issue. The same massive data collection and algorithmic processing that take place there power plenty of other services and advances that consumers wouldn’t be so sanguine about giving up.
Amazon, for instance, has revolutionized the way the world shops… and they’ve done it on the back of big data analysis. Google dives even deeper, silently watching and analyzing what you look for, everywhere you go on the web. What you get for that is instant access to information of every sort; a snapshot of your location and surroundings, directions to the nearest restaurants and activities, anywhere in the world.
Everything from credit card processing to mobile mapping and navigation rests on the premise that detailed information will be collected, processed, and used in ways very similar to how Amazon and Facebook operate. Big data does significantly influence the modern world, and many of the ways that it does so are popular. Giving up cat videos isn’t the question; giving up the modern way of life is.
Solutions Aren’t Clear, But The Debate Isn’t Over
Privacy regulation, a solution to abuses that both Zuckerberg and Apple’s Tim Cook suggest, could at this point simply serve to squash possible competition and entrench the current major players as the de facto providers in the industry. Nor is it clear that government has the agility or the expertise to adapt to such a fast-moving industry.
It’s going to be up to data scientists to manage both the fallout and future practices in a way that threads the needle between causing further damage to credibility and becoming wholly ineffective.
There are good reasons to think it will turn out better than you might expect from the current hyperbolic headlines. Although Mercer, a prominent data science researcher turned investor, can be seen as one of the villains in the scandal, it’s also true that the heroes in the story are data scientists. Christopher Wylie, the key source behind many of the 2018 revelations, has been vocal about his change of heart over the uses and perceived abuses of the data.
Data scientists everywhere are going to be confronted with a choice: be Mercer, or be Wylie. Use data for underhanded purposes to manipulate people against their will, or be aboveboard and forthright about how data could be used? Confront the realities and choices involved in big data collection, or try to sweep them under the rug as Facebook has done?
Ethics courses are an increasingly important part of the curriculum in data science master’s programs, and you can be sure there will be some new discussions taking place at universities in the coming years thanks to the Facebook/Cambridge Analytica scandal.