The Human Side of Data Science Can No Longer Be an Afterthought | Opinion

MAR 04, 2022

ata science increasingly shapes our world, from the news we read to who is selected for a job interview to how long someone is sentenced to prison. As the amount of data we collect and analyze grows exponentially, we are beginning to see concerning consequences: algorithms that reinforce social biases, threats to our privacy, the spread of fake news and even the weaponization of disinformation for use in cyberwarfare.

Our approach to data science so far has treated social and ethical concerns as an afterthought. The "move fast and break things" mindset encourages people to tackle technical challenges first and then focus on the human impact and unintended consequences later (if at all). In recent years, tech companies have started hiring ethicists and social scientists, but they are often siloed from the technical teams and their warnings go unheeded. Facebook whistleblower Frances Haugen revealed that the company failed to act on internal research identifying numerous harms caused by its algorithms, from promoting hate speech around the world to increasing body image issues among teens.

Instead, we must integrate human perspectives throughout data science. People touch every step in the data science process, from what data is captured, to how it is categorized, labeled and manipulated, to what it is used to do. No part of data science is value neutral. This is why my colleagues and I are building human-centered data science, a new interdisciplinary field that combines computer science, social science and human-computer interaction.

To make data science more human-centered, we need to train and promote data scientists who are ?-shaped. In higher education, we talk about T-shaped scientists who have depth in one field alongside breadth of understanding in several other fields, including social sciences and humanities. This is seen as an improvement over I-shaped people, who have only a very narrow knowledge base.

But being T-shaped is not enough. We need data scientists that are ?-shaped, with a deep understanding of both the technical and human sides of their work. Just as a doctor cannot do their job effectively if they don't know how to interact with patients, data scientists must have a thorough understanding of the social and ethical implications of what they do. This doesn't negate the value of having dedicated ethicists and social scientists on a team, but they don't replace the need for data scientists themselves to have a human-centered perspective.

There are many ways ?-shaped people can make data science more human-centered. One of my co-authors on the new book Human-Centered Data Science, Shion Guha, has used this approach in his work with a large urban police department to identify and overcome biases that affect crime maps. By combining information science and statistics and looking deeper into the human side of the crime mapping tool, not just the technical components, Guha noticed that the model used an outdated legal definition of what constitutes sexual assault. This produced inaccurate maps of sexual offenses in the city, which influenced how officers responded to sexual assault complaints and how complaints were recorded in police databases. With the error identified, the police were able to get a more accurate picture of sexual assault in their city and respond accordingly.

Detail of a portable computer

Detail of a portable computer unit showing "Power ISR (Intelligence, Surveillance and Reconnaissance)" technology.IN PICTURES LTD./CORBIS VIA GETTY IMAGES

While some critics portray the humanistic side as soft or imprecise, Guha's work—along with that of many others—shows that these perspectives actually make data science more rigorous and accurate. The idea that a human perspective is the opposite of technical rigor is a false dichotomy (something I'll be talking about more as the keynote speaker at the Women in Data Science (WiDS) Worldwide Conference on March 7). In fact, human-centeredness strengthens our capacity to accurately represent the world around us through data.

Human-centeredness can make data science more rigorous by bringing users and other stakeholders into the process of designing technical tools. Understanding how users think can help us find innovative approaches to presenting data in ways people can more easily understand and use. At the Human Centered Data Science Lab I lead at the University of Washington, we launched the Traffigram project to create more intuitive maps. Rather than showing you how far away a place is in miles, these maps show how long it will take to get there given your location, current traffic and public transit options.

Human-centered data science often integrates quantitative and qualitative methods and computer science and social science approaches. At my lab, we've developed several tools to help social scientists analyze qualitative data like text chats and social media posts. Traditional methods are too time-intensive for organizing and analyzing large amounts of text, but our applications speed up the process using visualization, one of the most effective ways for humans to absorb large amounts of information. These hybrid tools for analyzing conversations have been combined with insights from psychology to study issues like how people collaborate and interact when working remotely by analyzing their chat logs.

We urgently need to reimagine what being a good data scientist means and recognize that it's not just about technical skills. Every company that uses data science—which now includes almost all of them—needs to prioritize hiring, promoting and supporting ?-shaped people with training in both technical and social science fields, all the way up to the C-suite. Higher education institutions need to incorporate ethical perspectives and social science training throughout their data science and artificial intelligence curricula. It is increasingly irresponsible, to both students and society at large, not to train the next generation of data scientists to have a nuanced understanding of how their algorithms can impact society.

As data science exerts an ever-increasing influence on our lives, the societal consequences will only become more complicated and the stakes higher. Pressure from the public and policymakers to address issues like algorithmic bias and misinformation will continue to grow. Any forward-thinking institution that wants to be competitive in five to 10 years needs to make human-centered data science a priority now.

Dr. Cecilia Aragon is a professor in the College of Engineering at the University of Washington and director of the university's Human-Centered Data Science Lab. She will deliver the keynote address, "The Rigorous and Human Life of Data," at the Women in Data Science (WiDS) Worldwide Conference happening March 7 at Stanford University and online. Her latest book is Human-Centered Data Science, from MIT Press.

The views expressed in this article are the writer's own.