Skip to content Skip to footer

Sacrificing accuracy and efficiency for fairness

At 53:17 or so of the video, in response to a question, Cathy O’Neil says: “Yes, absolutely. That’s probably the hardest thing that you guys are going to hear from me today. But I definitely think we need to sacrifice accuracy for fairness”. This is in a Talk at Google, available on Youtube,1Cathy O’Neil, Weapons of Math Destruction, Talk at Google: https://youtu.be/TQHs8SA1qpk where she presents her book Weapons of Math Destruction and answers questions by the audience. Very instructive and inspiring, I prescribe it to students in my courses. I even had the occasion to attend a similar talk by her when she was in my home town.221 november 2017, in Tivoli Vredenburg.

I have taken the liberty to add efficiency. In my view, this is a very crucial and to-the-point observation. It says it all! Let me explain; We are talking about algorithmic decision making, or AI: using computer programs to build profiles of people, to sort them, to assess them, to take decisions affecting them. Both in vertical (government – citizen) and in horizontal (mostly companies – consumers) contexts.

There is a lot of data about all of us out there. Nobody knows exactly which data and nobody knows exactly where “out there” is. But it is clear that not only the so-called big tech companies (Google, Facebook, Amazon, Apple – hmmm, they’re all American) but also our governments know a lot about each of us. And they use it. The companies to enhance our user experience, to serve us better, to tailor search results and recommendations to our needs, to keep us glued to the screen, to serve us targeted ads, etc. The government to take decisions on us, to allocate investigation capacity efficiently, to keep an eye on some of us, etc. They do that by building profiles, on the basis of perceived correlations in the data. Typically, someone such as this particular person: likes this, hates that, is at an x risk of doing such-and-such, so you had better treat her in this way, it is probably safe (with certainty y) to offer that option to her (but rather not something else), etc. It is by now well known that this may come down to discrimination (like not inviting women for interviews for high profile jobs), thereby creating harmful self-fulfilling prophecies (see: no women in high profile jobs!). Thus existing inequalities are replicated, deepened, legitimized, and institutionalized.

It is sometimes said that this happens because the data that the algorithm is trained on is “biased”.3See art. 10(2)(f) of the EU Proposal for a Regulation on AI. But what is exactly meant by bias is not always clear. Obviously, the good things in life that are scarce and that people compete for (wealth, jobs, prestige, and such) are not evenly distributed. However, there is no consensus on what a fair way to distribute these would look like – distributive justice has been on the debating agenda ever since Aristotle.4See for example Samuel Fleischacker, A short history of distributive justice. Harvard University Press, 2009.

So what is biased then? Uneven: more men than women in high-paid jobs. A skewed distribution. But that is just the way it is. Do we then correct the data in the sense that we want the same number of men and women (and how about non-binary people?) on every job level in our dataset? How about the other characteristics relevant for discrimination? The Dutch constitution opens with the following provision: “All persons in the Netherlands shall be treated equally in equal circumstances. Discrimination on the grounds of religion, belief, political opinion, race or sex or on any other grounds whatsoever shall not be permitted.”5Art. 1 Constitution, to be found at https://www.rechtspraak.nl/SiteCollectionDocuments/Constitution-NL.pdf. So do we correct the data-set for every unacceptable discriminatory ground? There will not be much data left!

Bias is not only used in relation to data, but also to people. All of us are biased.6Daniel Kahnemann, Thinking fast and slow. Macmillan, 2011. Interestingly, Kahnemann is very positive and hopeful regarding the future possibilities of AI (especially compared to very limited human possibilities). See his comments on the talk by Colin F Camerer on AI and behavioral economics, part of the book The Economics of Artificial Intelligence: An Agenda edited by Ajay Agrawal, Joshua Gans & Avi Goldfarb. Both be downloaded from https://www.nber.org/books-and-chapters/economics-artificial-intelligence-agenda. I think that bias in people is fundamentally different from bias in data and that it is unfortunate that the same word is used here. Bias in people is, or so we hope, more easily corrected, disproven, updated. Like: you think by the looks of someone that they are probably not that smart, but when you talk to them you update your prior belief (or bias) – or not when they are really stupid. It also depends on how deeply rooted one’s biases are or how open-minded someone is.

However, the point I want to make in this blog is a different one. Algorithms can find correlations between data. Some are what we call “spurious”, non-sensical, meaningless: nobody in their right mind would propose budget cuts to science, space, and technology in order to bring down suicides by hanging, strangulation, and suffocation.7See this website for some hilarious and very clear examples: https://www.tylervigen.com/spurious-correlations. Other correlations are not spurious: clearly, as a rule,  someone’s gender is a good prediction of their chances of succeeding in a high-profile job (and thus their chance of being given such a job and thus …. and so on). However, non-discrimination law and our sense of fairness and justice demand that we do not act on this correlation in our decision-making – and if we do, we can rightfully be accused of discrimination. By ignoring certain non-spurious correlations, we try to make the world a better, fairer place, by breaking out of the mechanism of harmful self-fulfilling prophecies.

The same is true for the allocation of enforcement capacity. If more police capacity is dedicated to a certain neighborhood or a certain population, of course, more crime is found. Just as the exercise of discretionary power by officials is limited by non-discrimination law (like: the police may not stop-and-search people just on the basis of skin color),8See Rb. ’s-Gravenhage 22 September 2021, ECLI:NL:RBDHA:2021:10283, at 8.18. then so should the allocation of resources by an algorithm. Even if we can understand that for reasons of efficiency some enforcement agency would like to dedicate its scarce capacity to those fields where it assumes (on the basis of data analysis) that violations of the relevant rules are most likely to occur – if that comes down to selectively enforcing against groups that are defined by the categories of anti-discrimination law, then that is not okay! Even if it is, indeed, efficient.9Assuming (as I think we safely can) that the vast majority of “victims” will not be aware and will therefore not challenge this practice. And even if the correlations are non-spurious. In the execution of government power, and also in (not) offering services, opportunities, we need to sacrifice accuracy and efficiency for fairness. That is called: justice.

Tina van der Linden

Amsterdam Law & Technology Institute
VU Faculty of Law
De Boelelaan 1077 Amsterdam