Every year, the city of Rotterdam in the Netherlands gives some 30,000 people welfare benefits to help them make rent, buy food, and pay essential bills. And every year, thousands of those people are investigated under suspicion of committing benefits fraud. But in recent years, the way that people have been flagged as suspicious has changed.
In 2017, the city deployed a machine learning algorithm built by consulting firm Accenture. The algorithm, which generates a risk score for everyone on welfare, was trained to catch lawbreakers using data about individuals who had been investigated for fraud in the past. This risk score is dictated by attributes such as age, gender, and Dutch language ability. And rather than use this data to work out how much welfare aid people should receive, the city used it to work out who should be investigated for fraud..
When the Rotterdam system was deployed, Accenture hailed its “sophisticated data-driven approach” as an example to other cities. Rotterdam took over development of the algorithm in 2018. But in 2021, the city suspended use of the system after it received a critical external ethical review commissioned by the Dutch government, although Rotterdam continues to develop an alternative.
Lighthouse Reports and WIRED obtained Rotterdam’s welfare fraud algorithm and the data used to train it, giving unprecedented insight into how such systems work. This level of access, negotiated under freedom-of-information laws, enabled us to examine the personal data fed into the algorithm, the inner workings of the data processing, and the scores it generates. By reconstructing the system and testing how it works, we found that it discriminates based on ethnicity and gender. It also revealed evidence of fundamental flaws that made the system both inaccurate and unfair.
Rotterdam’s algorithm is best thought of as a suspicion machine. It judges people on many characteristics they cannot control (like gender and ethnicity). What might appear to a caseworker to be a vulnerability, such as a person showing signs of low self-esteem, is treated by the machine as grounds for suspicion when the caseworker enters a comment into the system. The data fed into the algorithm ranges from invasive (the length of someone’s last romantic relationship) and subjective (someone’s ability to convince and influence others) to banal (how many times someone has emailed the city) and seemingly irrelevant (whether someone plays sports). Despite the scale of data used to calculate risk scores, it performs little better than random selection.
Machine learning algorithms like Rotterdam’s are being used to make more and more decisions about people’s lives, including what schools their children attend, who gets interviewed for jobs, and which family gets a loan. Millions of people are being scored and ranked as they go about their daily lives, with profound implications. The spread of risk-scoring models is presented as progress, promising mathematical objectivity and fairness. Yet citizens have no real way to understand or question the decisions such systems make. [Continue reading…]