Back to overview

Online Magazine

Selective training data for an ethical AI

How do we make AI systems more ethical? AI ethicist Thilo Hagendorff from the University of Tübingen argues for machine learning training based on pre-selected data. Small Data instead of Big Data, an intended bias. Because that's the only way we can move AI in the direction we want it to go.

Eliane Eisenring spoke with Thilo Hagendorff

AI has by now taken over almost every area of life and its reach is growing. How urgent is the debate on the ethical direction of AI?
The fact that here is a technology penetrating many areas of life at the same time does indeed provide a certain urgency. AI systems are increasingly being used in so-called high-stakes areas, for example in medicine, road traffic or the police. If in such areas we want to leave decision-making to computers, it is important that appropriate reviews of these systems take place. This is happening in the EU right now, which is certainly a good thing.

There are many voices expressing concerns when it comes to AI and ethics. In your paper "Linking Human And Machine Behavior", instead of just denouncing abuses, you effectively show ways to make AI more ethical.
I argue that AI ethics should not just be seen as a discipline that draws red lines or issues moral prohibitions. Ethics can also develop positive visions and make concrete, implementable suggestions for how to arrive at good technology development.

Your approach is based on training machines to serve the common good, right?
Yes, it is about making technology more socially acceptable. Machine behaviour is very much the product of the data with which the systems are trained. It's similar with us humans: We get stimuli from outside and they determine to a certain extent how we behave. It is the same with AI systems that fall into the area of supervised machine learning.

Machine behaviour is very much the product of the data with which the systems are trained. It's similar with us humans: We get stimuli from outside and they determine to a certain extent how we behave.

Or to put it another way, the performance of an AI depends on the quality of the data it is given. How is data quality defined nowadays? And how should it be defined in your opinion?
Currently, data is mainly selected to serve a specific business purpose and the quality criteria are technical – how up-to-date is the data, how many errors are there in the data set, how readable is it, and so on. Moreover, the focus lies on using enormous amounts of data for training – in theory, all the data that you can somehow collect from a certain area.
I now argue that we should not continue with this "bigger is better" approach, apart from technical considerations, but that we should start to make more qualitative selections. That one selects data, insofar as it is behavioural data, under ethical aspects and only uses it from certain subpopulations, namely from those people who show behaviour that is desirable from an ethical perspective.

Can you give an example?
Let's take sustainable consumer behaviour: In an online shopping platform, I have many machine learning systems that rank or suggest products and that might dynamically adjust prices. Now I have two options: Either I train these algorithms on the shopping behaviour of all people, or I identify a specific customer segment that I know through tracking has a more sustainable consumption behaviour and use only the behavioural data of this subpopulation to train my ranking algorithm. In the end, this means that more sustainable products get a higher ranking or are more likely to be recommended.

This selection leads to using Small Data instead of Big Data for training: Isn't Big Data a necessary imperative for Machine Learning?
Not necessarily – there are already many technical approaches and improvements on how learning can also take place with less data.
Big Data also has disadvantages: Many data traces that are collected have simply been used on the side, and often the data has not been collected for the exact purpose for which it was later used. One simply collects and collects and collects and then draws conclusions from data that one has gotten incidentally from apps or wherever.

I now argue that we should not continue with this "bigger is better" approach, but that we should start to make more qualitative selections. That one selects data from people who show behaviour that is desirable from an ethical perspective.

One could say selective data, as you suggest, leads to a bias. Isn't that problematic?
Within the AI field, bias is something bad, something to avoid because it leads to algorithmic discrimination. However, I think we should see biases as something ambivalent and not as something purely negative. We can agree on an intentional bias, which also leads to discrimination, but one that is desirable from an ethical perspective.

So biases are only critical if they arise without our wanting them to?
That's right. At the moment we take behavioural data from wherever we can get it which leads to biases in the data, namely those that are already prevalent in society. My suggestion is: let's preselect this training data so that it represents biases that we actually want, for example the bias that text data contains as few insults as possible. Or the bias that sustainable products are preferred in online shops.

Who determines what a desirable bias is?
This is a legitimate question which is also often posed critically, especially in relation to aggravated cases where there is a dilemma – who should the car run over: The child or the elderly woman? For the most part, however, it's about applications where we have a cultural consensus anyway. For example, we agree that we want autonomous cars to be as safe as possible, or that sustainability is important. Accordingly, we can train AIs on values like safety or sustainability.

I think we should see biases not as something purely negative. We can agree on an intentional bias, for example that we want autonomous cars to be as safe as possible, or that sustainability is important. Accordingly, we can train AIs on values like safety or sustainability.

Your approach presupposes that companies, like e-commerce platforms, put moral interests above monetary ones. Do you think that is realistic?
Ethical actions often contradict economic imperatives. The question then is which logic to follow. The fact that prioritising ethical guidelines is unrealistic in some cases should not stop us from making these demands.

The topic of social responsibility, which you mention here, is no longer entirely new in the business world ...
Yes, by now, it has arrived there too. At the end of the day, we have to ask ourselves what the absolute overarching values are – peace, security, sustainability, for example. If we only follow economic standards, we compromise these other values that make human life worth living.

In your paper, you mention Facebook: There were critical voices regarding AI training that were simply overruled or ignored.
There was a proposal to exclude the behavioural data of so-called superusers from the calculations, so that they no longer flow into the training of the AI, and this was apparently rejected by Facebook. The maxim still applies here: a lot of interaction with the feed ensures a lot of advertising value. However, especially with Facebook, it is essential to finally become aware of the enormous damage that this platform is causing to society. With great power comes great responsibility. Money is no longer everything.

Ethicists warn against ethics being used as a purely cosmetic label that companies can hide behind.
There are indeed many cases in which AI ethics has a kind of fig leaf function. An ethics committee or policy is presented to the public in order to calm critical voices. In fact, however, it is "business as usual".
But the criticism of researchers goes even further: the constant invocation of ethics guidelines is actually a defence mechanism against legal norms. The aim is to prevent them by referring to internal governance mechanisms and thereby denouncing binding AI laws as superfluous.

You mentioned the EU at the beginning: it is currently working on an Artificial Intelligence Act. What do you think about that?
This is a pretty big step, because this Act will bring binding regulation for AI systems, at least for Europe, with corresponding bans for applications with unacceptable risk, such as biometric facial recognition of people in real time. I can imagine that this will have a global impact – similar to the General Data Protection Regulation, with which Europe has taken a certain pioneering role. Large companies want to optimise their products for different markets, wherefore it makes sense for them to adapt to the highest legal standards from the outset.

The constant invocation of ethics guidelines is actually a defence mechanism against legal norms. The aim is to prevent them by referring to internal governance mechanisms and thereby denouncing binding AI laws as superfluous.

To get back to your work on ethical machine learning: Are you currently involved in projects that use training through the approach you describe?
I am planning something along these lines, but I haven't got around to it yet. I know, however, that it is already being done in the car industry: Behavioural data is selected so that only the data of those drivers is used who show behaviour in manual mode that can be described as safe.

A philosophical question to end on: You write that machines are only as moral as the data they get and as the people who generate that data. Is the ethics discussion around AI therefore actually a discussion about the need to expand ethical behaviour in general?
One could certainly see it that way. Interestingly, at the moment hardly anyone is aware of how often he or she contributes to training an AI. If I leave a comment full of swear words somewhere, it is very likely to become part of training data and that will determine the machine behaviour of any application. If one is aware that we are all the "teachers" of AI systems, one could say that everyone bears a – very small but nonetheless existing – partial responsibility for how these systems ultimately behave. And to take this responsibility seriously would mean interacting with computers in a way that stands up to a certain ethical scrutiny.

About Thilo Hagendorff

Thilo Hagendorff, 34, has been researching machine learning and AI ethics at the University of Tübingen since 2014. Hagendorff is the author of several non-fiction books and countless other specialist articles on the topic of artificial intelligence and ethics: in 2021 alone, he (co-)authored ten scientific publications. In his spare time, the Baden-Württemberg native also advocates ethical behaviour – towards animals. Hagendorff has been vegan for 14 years, and at the same time competes in bike races, preferably in mountain bike or ultra-marathons.

IN THE MOOD FOR A CHAT? YOU CAN FIND MORE CONVERSATION PARTNERS HERE:

In conversation with
AI

... Marlon Nuske
In conversation with
AI Analytics

... Elodie Briefer
In conversation with
AI

... Nadja Verena Marcin