Today Attorney General Jeff Sessions will meet with states’ attorneys general to discuss his theory of bias among social media sites and what, if anything, should be done about it. Ostensibly at issue is whether social media companies like Facebook use their proprietary algorithms to suppress conservative viewpoints or boost liberal ones. It is virtually certain names like Alex Jones and Diamond and Silk will feature in the conversation.
The importance of the conversation is difficult to overstate. Although Mark Zuckerberg himself famously underestimated social media’s power to put its finger on the political scale both here and abroad (a position he has since walked back), it’s clear from both academic research, Robert Mueller’s indictments, and certainly from my own experience at the Obama campaign and beyond that in fact social media in not only key to elections, but is also aggravating political polarization in the United States.
Before even digging into the merits of Sessions’ conspiracy theory, it’s easy to call BS on Republicans who have routinely argued that corporations, in keeping with Citizens United, have first amendment rights to political speech and, therefore, spending. Facebook and Twitter are both corporations superficially much like those whose rights have been tested before. Social media companies could argue that maybe they’re biased, maybe they’re not, but they have a right to be any way we want.
But as a technologist who is neck deep in the media industry who has done my time in both the political world and Silicon Valley, I would argue that this superficial analysis should be challenged. I am not necessarily arguing that government regulation or prosecutorial discretion is the right way to handle the situation, but those are tools that may need to come into play both for the giants to be able to justify potentially unprofitable changes to their boards and investors or to get them to reexamine some of their entrenched positions.
At issue are the algorithms that determine what social media users actually see and how it gets prioritized. Feeds are no longer simply chronologies, but have become what the Internet portals of yesteryear once were – along with search, they are the jumping off point for virtually all of everyone’s online activity from news to purchases to their original purpose of keeping up with friends. They aren’t simply compilations where everything posted by people you follow is arranged chronologically interspersed with occasional ads. Instead, content is prioritized using a myriad of behavioral data (likes, shares, history of engagement with similar content, etc.) to achieve specific outcomes. The outcomes can be seen as an improved user experience (cutting through the clutter) or an optimization of the social media sites own KPIs (e.g. visits, opportunities to display ads). In Sessions’ view, they can also be seen as more subversive deliberate attempts to advance a political agenda.
Personally, I do not believe in deliberate liberal or progressive bias in these algorithms. Despite the perceived liberal leanings of a majority of people working in the tech industry, the reality is that Valley politics, especially at the billionaire executive level, continues to swing more and more libertarian. Is it possible that line engineering and technical staff have implemented some kind of tweaks and changes without the blessing of corporate overlords, a sort of Silicon Valley deep state? It’s very unlikely since the prioritization of content is absolutely key to these companies’ revenue models and is under relentless internal scrutiny.
But is it possible that, in spite of this, bias still exists? Yes. For two reasons. The first is the manipulation of these systems by outside actors. Certainly marketers around the world are obsessed with how to optimize their messages for Facebook, and so too are political operatives. Both Mueller and Cambridge Analytica whistleblower Chris Wylie have described ways that this can be done that go well beyond marketing. Since in politics many campaigns and organizations seek to discredit each other or sow discord as much as promote their own candidate, it’s actually significantly easier.
The second reason has to do with technology. We don’t know much about the specifics of the algorithms these companies use, but we can look at the foundational technologies they are doubtless based on and make some inferences. For example, the algorithms could be purely rules-based, but that’s very unlikely given especially Facebook’s lauding of the importance of behavioral targeting, the number of variables involved, and the quantity and variety of data in play. It’s more likely that it’s some combination of rules, collaborative filtering, and machine learning, the last of which is a self-proclaimed area of expertise for Facebook.
While ML is certainly a good tool for solving the problem of creating customized experiences for billions of users, it is itself subject to hidden biases. ML models are trained using large sets of real world data and the output they create is subject to the influences and selection biases of the input data. For example, you could create a machine learning model to help screen great candidates from a pool of job applications using the resumes of your past employees scored by performance as input. In theory, this should result in the model figuring out which applications are more likely to be successful at your company. But it could also result in extended hidden biases – it might continue to pick majority male candidates even if it doesn’t know the gender of the applicants based on related signals like name (e.g. “John”), sports, or even the biased makeup of other companies you’ve successfully poached from.
ML researchers have shown that understanding not just how successful algorithms are at their jobs but also understanding how they work is crucially important. For example, one team demonstrated the ability to fool the system from a prototype autonomous car into thinking a stop sign was a 45 mile per hour speed limit sign, a technique now called adversarial images. If they could manipulate the image directly, the changes weren’t even perceptible to the human eye. The point of this was that the car’s recognition system was accurate at detecting stop signs, but it wasn’t just looking for red octagons. It had learned something else that was working for it.
It is likely that if Facebook and the others are using machine learning in their prioritization algorithms they could well be subject to either hidden bias or direct manipulation. While the company would certainly do it’s best to look for these things, outside scrutiny could also be extremely helpful and would also let us know what the companies are actually optimizing for.
While the companies might claim that these algorithms represent some kind of secret sauce and that their exposure would be very damaging, that doesn’t completely stand up to scrutiny. Facebook is a monopoly or near monopoly because of network effect, not the power of its prioritization algorithm. In fact, most people do not have a positive perception of the product as represented by their net promoter score (NPS) of -21 (Twitter’s NPS of 3 is a little better but still terrible.) No competing company would emulate the algorithms with the intent of creating a customer experience like that.
It’s more likely that the exposure would be in terms of how it promotes ads, but that could even be taken out of the picture for this purpose. Just a public understanding of the prioritization of the natural news feed would be an enormous win for transparency.
The last significant pushback would be that opening the algorithms would enable manipulators like Cambridge Analytica. That is almost plausible, but relies on the dubious principle of security through obscurity. The key to securing these systems lies in trust models, identifying bad actors, and in the scrutiny of trending content. It does not lie in keeping the algorithms secret.
Social media companies have resisted classification as media companies or public utilities although they actually have a lot in common with both. Opening their algorithms for prioritizing even non-commercial content to public scrutiny could do a huge amount to restore trust, might actually improve their products, and would be one of the least invasive ways of doing it. I’m not minimizing how hard it is to do – this isn’t just code, but also large samples of anonymized user data would need to be sanitized and made available as well. But I think it’s the best way forward. If there’s nothing to hide, it would defang Jeff Sessions and the attorneys general. If there is something hidden (perhaps even from the companies themselves) it would help bring that to light.
If you want to support organizations working to make social media better, you can do it below.