Unethical Analysis of Collected Human Data

My personal values and ethics viewpoints clash deeply with the collected data about humans that fuels many forms of data science and analytics projects. I hate to hop on a “moral high horse”, but hear me out.


It is unethical to reduce humans and their activities down to numbers and categories – anonymized or not.

The only way to have humans and human activities recorded inside a dataset is by reducing those humans and their actions down to numerical and categorical variables, which is inherently unethical, because it literally strips them of their humanity. Humans have intrinsic value beyond the handful of qualities or characteristics that can be recorded about them in a table, and humans exist in a multifaceted world whose nuances cannot be represented properly in a database. Therefore, humans and their actions are inherently misrepresented in datasets. The models built on such data may negatively impact their lives in the future for no justifiable reason – the models themselves are built from incomplete information.

Missing context and human perspective of collected information

This incomplete information completely misses the motives, purposes, reasons for why certain things happen. Analyzing the husk of myself – one so shallow that it can be represented by a single data table – is a crime against humanity. It whittles away all the human aspects of life into easily definable chunks for the computer. In doing so it strips away all the things that make me human. Databases treat me like a number and not a human, like a product with my barcode, SKU and item characteristics, and it is wildly unethical to analyze that data and make decisions that impact my life, as if this shallow representation of me is even close to the human being I am. Models are built to approve a loan application or determine insurance pricing, from nothing but a husk of who the person applying is.

Take all the data the world has on me and try to build a Daniel. You won’t even get close. Digital information systems are too rigid to effectively reflect what happens in the world around us or provide a picture more than a shallow interpretation of a person.

Humans are complex and deserve to be treated as “innocent until proven guilty”

Humans deserve to be treated with the dignity that comes with being a living human being. It is “cheap”, naïve, and inhuman to toss people’s entire lives into boxes and judge them about it. People have an entire spectrum of motives for taking the actions that they do. And sometimes they have no choice to be put into a certain category. Automatically treating someone as “guilty” for a data point, say a black mark about their credit history, doesn’t give them any option to explain or appeal that decision.

You are nothing in a dataset beyond the set of characteristics put in there about you. And you can’t add context, change it, or in many cases, even view it! No human will hear your story about the context behind that late night credit purchase at the cookie shop and be able to wrangle the machine to go easier on you – from the data point of view, you are now a late-night sweets muncher prone to a hundred medical problems, so your life insurance policy will cost more, and there is nothing more to say. There is no human reasoning to provide checks and balances in a digital system that is unable to properly represent human lives, personality, motives, actions. This is not an ethical way to treat human beings.

Unverified and false data

Datasets are chock-full of data points that were never verified as true by the very people whose information was recorded in them! Data might not only be false, but also unknown to be false, which creates a particularly insidious level of danger during modeling.

For example, years ago I downloaded my Instagram content when I decided to delete my main account. Alongside the pictures I had uploaded, it had a variety of other files and data they had about me, which included my interests. Seriously, Instagram just guessed on what my interests were based on the things I liked, and there was a list of dozens of interests and interest groups, many of which I had zero real interest in. One that comes to mind was “magazines”. Where did this even come from, and why am I tied to it inside a database somewhere? If they had asked me about it, I would have laughed. I’ve read maybe one issue of a physical magazine in the past 10 years.

Anonymizing is not a solution

Even anonymizing the data, like GDPR requires, provides no ethical justification to reduce them down to numbers and categories. What good is it if I change your name to Bob, and still reduce you to numbers and categories as before? The anonymization is more of an attempt at the privacy-concern solution and doesn’t address my core argument here. Frankly, the fact that GDPR ruins data science projects is a good thing – it means people are being protected.

Yes, ethical analytics exists

Disclaimer: Not all data science projects deal with data that has humans and their activities recorded in it, or use this information like heuristics against humans in negative ways. Plenty of major data science efforts are in meteorology, physics, and other fields. The pop culture view of data science is “the algorithm” which tracks you around and shows you ads, and this doesn’t represent the entirety of the field.

Is analytics just a sly way to judge people in shallow ways?

Most of my argument is essentially: All we’re doing is using a computer to make shallow judgments about people like we were specifically taught NOT to do, and it should be easy to see that this is unethical. If a human were to act as judgemental and arrogant like I explained the algorithm would (reduce people down to data points and use it against them), we would scoff at their childishness and (hopefully) they wouldn’t be given much power in the world. Putting this arrogance and coldheartedness into a computer only justifies it for data scientists because nobody feels bad about it. In the abstract world of datasets, it is easy to shirk the responsibility of treating human beings as humans because it doesn’t feel real – the people in the data are already numbered and categorized on the screen in front of you. The weight of the fact that these people are reduced to this inhuman state is forgotten.

The whole thing almost reminds me of the Milgram experiment (and how the fact that these people couldn’t see the human getting hurt is potentially a deciding factor in their decision to press the button).

Sorry to be on some moral high horse, I’m just being honest. I am not the face of moral perfection, I am human.

David Foster Wallace - “This is Water”

Daniel