Even if all the requirements for the development and putting onto the market of high-risk AI are complied with, they can still produce results that are discriminatory or otherwise unfair. The proposed AI Regulation relies on a human-in-the-loop to function as a safety-valve. Is that a realistic option?
AI is “hot”. By now it is acknowledged that using AI to make decisions about people carries certain risks. Cathy O’Neil coined the term Weapons of Math Destruction (WMD), and discussed some appealing and recognizable examples of how use of AI can lead to unwanted, discriminatory and extremely unfair outcomes, leaving the AI’s victims essentially powerless to challenge those outcomes. We know by now that we in the Netherlands have our own WMDs: the systems that caused the childcare-benefit-scandal, the SyRI system struck down by the judge, the systems used by the immigration services, and possibly more.
At the same time it is clear that use of AI has much to offer (efficiency gains, but also potentially more objective decisions), so we do not want to throw the baby out with the bathwater by completely banning the use of AI. At the end of the day, humans are far from objective and infallible, so who knows whether the net result of use of AI will result in better and fairer decisions.
For this ALTI forum contribution, what I mean by AI is: any software yielding outputs that are unexplainable for their users. Incomprehensible, computer-says-no, so be it, no idea why. This is a rather broad, pragmatic definition. I also only have AI use by government agencies in mind.
Bias is one of the fundamental problems of AI use. If a self-learning system is trained with a dataset that is biased, the resulting system will obviously have the same bias and perpetuate it. Perhaps the most well-known example is Amazon using an algorithm to select the most promising candidates that applied for a well-paid job. Guess who were selected. Another well-known example is the algorithm used in the US to estimate recidivism-risk. Obviously, the risk for certain people is estimated to be higher than for others of a different ethnic background.
So yes, the dataset used to train the algorithm is biased, but this dataset is correct and the correlations the algorithm finds are correct. Correlations between gender and career prospects and between ethnic background and recidivism-risk are not so-called “spurious correlations”, but do have predictive value. We need to acknowledge that the current state-of-affairs is biased, at least on the grounds mentioned in the first article of our constitution: religion, belief, political preferences, race and gender. This means that the dataset used for training cannot be both accurate and unbiased at the same time. As Cathy O’Neil puts it: “We have to sacrifice accuracy for fairness”.
Obviously, not all suspect attributes (religion, belief, political preferences, race and gender) need to be included in the dataset. Various proxies, such as zip-code, shopping behaviour, hobbies, and places visited frequently allow algorithms to distinguish people along the lines of suspect attributes. Neither the system itself, nor its makers, trainers or users necessarily have the intention of treating people differently according to these characteristics. It just happens. Therefore, in order to prevent indirect discrimination, it is not helpful to exclude the attributes themselves. In any case, they most likely may not even be part of the dataset according to data protection legislation.
So what we need to do is be extremely critical of AI’s outcomes. Are the candidates selected by the AI for an interview all male by any chance? Hmmmm. Do the people that were assigned a high recidivism-risk-score all share the same dark skin colour? That is amazing. Do the people on the black-list of the tax-authorities all happen to have a double nationality? Do people indicated as potential fraudsters all live in the same kind of neighbourhoods? These examples are unfortunately well-known by now, but I am convinced that a lot more indirect discrimination takes place; inadvertently, unknown and not yet uncovered. To Loosely quote Johan Cruijff: you only see it once you understand how it works.
So who is going to look in such a critical way at AI’s output? Just one candidate, a human being. One who is well-informed, critical, up-to-speed about these mechanisms, with an in-depth knowledge of the inner workings of the systems she is supposed to evaluate, and who has the power, guts and the right incentives to say both in general and in specific cases: okay, whatever, it ain’t gonna happen. Too bad: bureaucracy, efficiency targets, election promises, their own career prospects, this human says no!
Let me introduce you to the hero in this story: HITL, the Human-In-The-Loop. No longer the insignificant and subservient radar in the Moloch Machine from the old film Metropolis (brought back to life by Pink Floyd’s Welcome to the Machine). Rather: the crucial safety valve with the formidable task of saving humanity from the evils of the very same Moloch.
Indeed, even the proposed AI Regulation itself acknowledges in so many words (article 14) that the risks for health, safety and fundamental rights (that may emerge when a high-risk AI system is used in accordance with its intended purpose or under conditions of reasonably foreseeable misuse) may persist notwithstanding the application of other requirements set out in the second chapter on high risk AI. In the HITL we trust. She does not have an easy task; she should at least be capable of the following things (formulated a bit more formally than I did above, but basically coming down to the same things):
(a) to fully understand the capacities and limitations of the high-risk AI system and be able to duly monitor its operation, so that signs of anomalies, dysfunctions and unexpected performance can be detected and addressed as soon as possible;
(b) to remain aware of the possible tendency of automatically relying or over-relying on the output produced by a high-risk AI system (‘automation bias’), in particular for high-risk AI systems used to provide information or recommendations for decisions to be taken by natural persons;
(c) to be able to correctly interpret the high-risk AI system’s output, particularly taking into account the characteristics of the system and the interpretation tools and methods available;
(d) to be able to decide, in any particular situation, not to use the high-risk AI system or otherwise disregard, override or reverse the output of the high-risk AI system;
(e) to be able to intervene in the operation of the high-risk AI system or interrupt the system through a “stop” button or a similar procedure. (art. 14(2) proposed AI Regulation)
One can only wonder if our HITL will be up to this task? Are the expectations not too high? Is the HITL not just of symbolic value, a cover-up, a fig leaf, a rubberstamp to legitimize the AI Moloch’s output?
From the management literature dating back to the eighties of the previous century, we know the phenomenon of “street-level bureaucracy” and the problems inherent to this type of organisation. A street-level bureaucracy is a government agency that communicates directly with citizens. The officials working there are called street-level bureaucrats (SLBs), and well-known examples of SLBs include police-officers, border-control officers, public servants of social welfare agencies, etc. Teachers are street-level bureaucrats, so I am one as well, to the extent that I take binding decisions over students. SLBs are the bridges between the rule-makers and the people to whom these rules are applied. SLBs typically have some freedom, in this context called discretion, in applying general rules to concrete cases, and this discretion is necessary to do their work properly. At the same time, with discretion come the inevitable risks of unequal treatment, nepotism and abuse of power.
Street-level bureaucracies face fundamental problems, such as a structural lack of resources (both in terms of human power, time and funding). Efficiency demands that as many cases as possible are dealt with as fast as possible. It is, after all, taxpayer’s money being spent here. All standard, normal, clear cases are treated as such and dealt with automatically, at least in theory under the responsibility of an SLB. And then if every once in a while there is a hard case that demands use of discretion, we might take a proper look at it. Not too often though. Honestly, we lack both the time and the motivation to really look into such cases, due to a lack of organisational incentives to do so.
As an SLB, how would you know that you are dealing with a hard case anyway? The AI system is not going to tell you. It does not know and just processes it. The citizen concerned? Does she have a voice loud enough to raise the alarm and catch the attention of an SLB, who may then really look into her case? Or will the SLB herself, as the proposed Regulation assumes, monitor the output of the AI, looking for hard cases in which the AI took the wrong decision?
In everyday practice, all organisational incentives are aimed at deviating in as few cases as possible from the outcome suggested by the system. This output, even if officially disguised as “advice”, will function as the de-facto standard. Sure, as an SLB you can deviate, but you better be sure of what you are doing. You should give reasons, and you get yourself into a lot of work and trouble. That was the case in the Netherlands twenty years ago at social services and in deciding upon appeals to traffic fines.
And what if, even worse, it is not only about an individual case, but the system seems to structurally and consistently generate output that is indirectly discriminatory? A system in which a lot of money was invested, that took a long time to develop, and that does have the required (art. 16-29 proposed AI Regulation) CE-conformity mark? Certified AI says no! Good luck challenging that!
In sum, I am not convinced at all. Of course I fully support all the ideals of Human Centric AI, ethical AI and methods to design ehtical AI systems. However, in order to be able to rely on a human-in-the-loop as a safety-net, fundamental organisational reforms in street-level-bureaucracies are required. As law-professor Van den Herik famously said (already) in 1991: “After three months of flawless advice, the computer will be the judge, whatever either of them thinks about this”. Who is and can be held responsible?
The human-in-the-loop should be judged by the number of cases that she manages to save from the claws of the AI Moloch, and be treated as a hero for unmasking a WMD. Heroes are what we need!
Citation: Tina van der Linden, AI: We Need A Hero!, ALTI Forum, January 9, 2023
Photo by mahdis mousavi