Технологии

Catherine D’Ignazio: ‘Data is never a raw, truthful input – and it is never neutral’

Опубликовано:

08.11.2020

Our ability to collect and record information in a digital form has exploded as has our adoption of AI systems, which use data to make decisions. But data isn’t neutral, and sexism, racism and other forms of discrimination are showing up in our data products. Catherine D’Ignazio, an assistant professor of urban science and planning at the Massachusetts Institute of Technology (MIT), argues we need to do better. Along with Lauren Klein, who directs the Digital Humanities Lab at Emory University, she is the co-author of the new book Data Feminism, which charts a course for a more equitable data science. D’Ignazio also directs MIT’s new Data and Feminism lab, which seeks to use data and computation to counter oppression.

What is data feminism and why do we need it?
It is data science with an intersectional feminist lens. It takes all inequality into account at every stage of the data processing pipeline, including gender discrimination but also other forms of intersecting discrimination like racism, classism and ableism. And the reason we need it is to stop producing harmful racist and sexist data products.

We shouldn’t be surprised about the sexist results coming out of these algorithms with the flawed data we are feeding in

When you look at data and AI this way, what kind of problems do you find?
We find some people are winning and some people are losing. The benefits and harms are not being equally distributed. And those who are losing are disproportionately women, people of colour, and other marginalised groups. One way they are losing is that data most of us would think is important isn’t being collected. We have detailed datasets on things like the length of guinea pig teeth and items lost on the New York City subway. But, in the US, missing data includes maternal mortality data, which has only started being collected recently, and sexual harassment data. And so much of our health and medical understanding is based on research that has been done exclusively on the male body.

How do people with less privilege show up in datasets?
We’ve had facial analysis software that is much less accurate for dark-skinned women and algorithms that disadvantage female applicants. We’ve also had child abuse prediction software that over-targets poor families and predictive policing software like PredPol that disproportionately targets neighbourhoods of colour. The former pulls data from state health and welfare services, which poor people are more likely to access, while the latter is based on historical crime data; only US policing practices have always disproportionately surveilled and patrolled neighbourhoods of colour. We shouldn’t be surprised about the racist and sexist results coming out of these algorithms with the deeply flawed data we are feeding in.

Healthcare algorithm used across America has dramatic racial biases

If our data and algorithms are all so flawed, how do we change things to make them better?
First we need to be tuning in to the ways that oppressive forces might be insidiously inserting themselves into the data pipeline. More understanding is particularly needed among the technical folks who are making these systems. It is rarely the case that the discrimination in products is intentional; it’s just that nobody has ever taught them that it is a problem or emphasised that it is important. University data-science courses should include more than just a single ethics class.

Then we have to actually use data and computation to challenge inequality. We have to collect counter-data. Take for example the comprehensive dataset on Mexico’s femicides – gender-related killings of women and girls – that has been compiled for the past five years from media reports by María Salguero, a citizen activist in that country. She is filling a vacuum because the Mexican government is not collecting the data. Now of course data alone is never enough. But if the data is used in concert with organising, lobbying and building political will, it can be very effective. In the US, we do have organisations working to call out injustice and produce their own counter-data, including Data for Black Lives, the Algorithmic Justice League and The Markup. We need to fund more of this kind of work.

Is there such thing as neutral data?
There is a naive assumption that if you see numbers in a spreadsheet, they are real somehow. But data is never this raw, truthful input, and it is never neutral. It is information that has been collected in certain ways by certain actors and institutions for certain reasons. For example, there is a comprehensive database at the US federal level of sexual assaults on college campuses – colleges are required to report it. But whether students come forward to make those reports will depend on whether the college has a climate that will support survivors. Most colleges are not doing enough, and so we have vast underreporting of those crimes. It is not that data is evil or never useful, but the numbers should never be allowed to “speak for themselves” because they don’t tell the whole story when there are power imbalances in the collection environment.

Would data science’s bias problems be solved if there were simply more data scientists, coders and computer programmers who were women or from minority backgrounds?
More diversity is an important part of the solution. As a group, data scientists are more likely to be male, white and highly educated. They have never experienced sexism, racism or classism so it is hard for them to see it. We call this “the privilege hazard” in the book and diversity can mitigate it.

But only including more women or people of colour is not going to solve everything. We need to put communities who will be impacted by the information systems into the process of making them. Because inevitably designers and programmers are going to be building systems for life experiences that they haven’t had. If everyone that builds a welfare application needs to have lived on welfare, that would be a high bar. Because I am a woman doesn’t mean I’m going to understand how to build an application for domestic workers. But there are participation strategies from other fields like urban planning and how we incorporate those in data science is an area ripe for exploration.

In the book you talk about “Big Dick Data”. What is it and should we just reject it outright?
We coined it to denote big data projects that are characterised by masculine fantasies of world domination. Big Dick Data projects fetishise large size and prioritise it, along with speed, over quality, ignore context and inflate their technical capabilities. They also tend to have little consideration for inequalities or inclusion in the process. Mark Zuckerberg aiming to supersede human senses with AI might be considered one such project, along with software company Palantir’s claims about massive-scale datasets. Big Dick Data projects aren’t necessarily wholly invalid, but they suck up resources that could be given smaller, more inclusive projects.

What would you most like people to think about or ask themselves when they encounter data or a graph in the media?
A good general strategy and feminist practice is to ask what we call “who” questions. Who made this? Who collected the data? Whose lives are embodied in the data? Who is it serving? Who is harmed potentially? Asking these questions allows us to start to see how privilege is baked in.

How is privilege being baked into the coronavirus data we are collecting?
The US government’s response to coronavirus has been a case of missing data. There has been community spread, but the numbers are completely unreliable because kits are in short supply and people are having a hard time getting tests. And then poor people, which include many from immigrant backgrounds, will be less likely to seek tests because of lack of insurance, lack of ability to afford insurance co-pays and lack of paid sick time if they test positive.

• Data Feminism by Catherine D’Ignazio and Lauren F Klein is published by MIT (£25)

In this article:

Оставить комментарий

Ноябрь 2020
Пн	Вт	Ср	Чт	Пт	Сб	Вс
	1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Авто

Форум лидеров рынка электромобильности пройдёт в Москве

24 апреля 2026 года в Москве на площадке Центра событий РБК (Космодамианская наб., 52, стр. 7) пройдет форум «Электро//Движение». Это профессиональное отраслевое событие, где...

Редактор31.01.2026

Общество

Осужденный за поджог вертолета в Остафьево не считает преступление крупным

МОСКВА, 1 фев. Осужденный за поджог вертолета в подмосковном аэропорту «Остафьево» Станислав Хамидулин заявил, что не считает совершенное им преступление крупным. «Я все равно...

Редактор02.02.2026

Политика

Украина атаковала Крым

Глава Севастополя Развожаев: военные отражают атаку ВСУ, работает ПВО Фото: MOD Russia/Global Look Press ВС РФ отражают атаку ВСУ на Крым, в регионе работает...

Редактор28.01.2026

Общество

СК возбудил уголовное дело после гибели туристки на Байкале при опрокидывании внедорожника

Водитель съехал в ледовую расщелину: на Байкале погибла 75-летняя туристка Фото: Дмитрий Золотов тестовый баннер под заглавное изображение Следственный комитет России возбудил уголовное дело...

Редактор28.01.2026

Leave a Reply Отменить ответ

Leave a Reply

Культура

Минкультуры запустило акцию для льготников, использующих ID в MAX

Культура

Любимова обратилась к участникам фестиваля «RT.Док: Время наших героев»

Политика

Проведено сравнение рисков агрессии США против Ирана с операцией в Венесуэле

Политика

В Пыть-Яхе чиновники устроили ночной корпоратив в мэрии

Общество

Ревнивый муж взял жену в заложники: спасла тайная записка матери

Общество

Тайное расследование политика вскрыло правду о скрытых истеблишментом бандах насильников

Технологии

стало первым СМИ с 500 тысячами подписчиков в MAX

Технологии

Ученый оценил долю известных человеку обитателей океана

Авто

Форум лидеров рынка электромобильности пройдёт в Москве

Общество

Осужденный за поджог вертолета в Остафьево не считает преступление крупным

Политика

Украина атаковала Крым

Общество

СК возбудил уголовное дело после гибели туристки на Байкале при опрокидывании внедорожника

Бизнес

Нумерология имеет значение

Авто

Ineos Grenadier 2026 модельного года: улучшенная управляемость и версия Black Edition

Бизнес

Стратегическое партнерство

Бизнес

Курс доллара. Прогноз на 02–06 февраля

Стоит Посмотреть

Популярное За Неделю

Культура

Объявлена программа «RТ.Док: Время наших героев» в Москве

Бизнес

50 лет создания

Политика

Завтра опять война: что стоит за военными приготовлениями США в Заливе

Политика

Украинский боксёр Усик рассказал правду о происходящем на Украине

Технологии

Инженеры создали для армии уникальные образцы, заявил Медведев

Стоит Посмотреть

Новости По Дате

Тэги

Свежие комментарии

Вам может быть интересно:

Авто

Форум лидеров рынка электромобильности пройдёт в Москве

Общество

Осужденный за поджог вертолета в Остафьево не считает преступление крупным

Политика

Украина атаковала Крым

Общество

СК возбудил уголовное дело после гибели туристки на Байкале при опрокидывании внедорожника

Leave a Reply
Отменить ответ