How data can map and make racial inequality more visible (if done responsibly)

This article was originally published on Medium by The GovLab on June 8, 2021. For any reactions, concerns, suggestions, and recommendations: contact Stefaan G. Verhulst, Co-Founder of The GovLab at sverhulst @ thegovlab.org.

The piece is supplemented by a crowdsourced listing of Data-Driven Efforts to Address Racial Inequality.

NOTES

The GovLab developed this living reflection document with diverse input from our network to help identify the opportunities, risks, challenges, and lessons about the use of data to make racial inequalities more visible and the ways it may be systematically and collaboratively countered.
The document also serves as our contribution to New York City’s Racial Inclusion & Equity Task Force Data/Research Subcommittee. We hope that it can provide value to the Subcommittee’s deliberations and agenda setting.
We share this document not as a finalized list of recommended priorities or practices but as a tool for deliberation on and assessment of data’s role in racial justice.
We have additionally assembled a list of data-driven organizations working on racial inequality-related issues here.

Introduction

Racism is a systemic issue that pervades every aspect of life in the United States and around the world. In recent months, its corrosive influence has been made starkly visible, especially on Black people. Many people are hurting. Their rage and suffering stem from centuries of exclusion and from being subject to repeated bias and violence. Across the country, there have been protests decrying racial injustice. Activists have called upon the government to condemn bigotry and racism, to act against injustice, to address systemic and growing inequality.

Institutions need to take meaningful action to address such demands. Though racism is not experienced in the same way by all communities of color, policymakers must respond to the anxieties and apprehensions of Black people as well as those of communities of color more generally. This work will require institutions and individuals to reflect on how they may be complicit in perpetuating structural and systematic inequalities and harm and to ask better questions about the inequities that exist in society (laid bare in both recent acts of violence and in racial disadvantages in health outcomes during the ongoing COVID-19 crisis). This work is necessary but unlikely to be easy. As Rashida Richardson, Director of Policy Research at the AI Now Institute at NYU notes:

“Social and political stratifications also persist and worsen because they are embedded into our social and legal systems and structures. Thus, it is difficult for most people to see and understand how bias and inequalities have been automated or operationalized over time.”

We believe progress can be made, at least in part, through responsible data access and analysis, including increased availability of (disaggregated) data through data collaboration. Of course, data is only one part of the overall picture, and we make no claims that data alone can solve such deeply entrenched problems. Nonetheless, data can have an impact by making inequalities resulting from racism more quantifiable and inaction less excusable.

In seeking to reflect upon ways that data can make a difference, The GovLab used a rapid-research methodology to compile a list of topic areas below where data and data analysis could help illustrate where racial inequality exists in the United States and support evidence-based efforts to promote equity. Given recent events, it focuses mainly on how racism harms Black communities.

Resulting projects might use data to improve existing policies, identifying those that are reductive or unable to address systemic failures. By using data to improve situational awareness of a problem, identify causes and effects in racist incidents, and predict outcomes or assessing policy impact, those committed to anti-racism can develop better solutions to the challenges Black communities face every day.

Needless to say, this rapid topic map is simply a scan of the issues and a basic overview of the situation. It is far from comprehensive. We realize racism is an enormous, odious, and deeply entrenched problem that has persisted in the United States since its founding. We also recognize that many in The GovLab operate from a position of power and privilege that require us to listen to those who do not and amplify their views. Both communities of color and white allies can take action to advance racial justice.

Prioritizing any of these topics will also require increased community engagement and participatory agenda setting. Likewise, we are deeply conscious that data can have a negative as well as positive impact and that technology can perpetuate racism when designed and implemented without the input and participation of minority communities and organizations. While our report here focuses on the promise of data, we need to remain aware of the potential to weaponize data against vulnerable and already disenfranchised communities. In addition, (hidden) biases in data collected and used in AI algorithms, as well as in a host of other areas across the data life cycle, will only exacerbate racial inequalities if not addressed.

Topics where data could make racial injustices more visible and advance understanding of its depth and causes and ways through which it can be systematically addressed

With these caveats in mind, we find that the following areas might be most amenable to improvements in how to leverage data to develop racially equitable solutions and approaches:

Criminal Justice Inequalities

Resource Misallocation: Communities of color have less financial and institutional support for justice-related activities than their white counterparts. Fewer crimes are solved. Victims of crime receive less support. There are fewer programs that offer alternatives to incarceration. These inequalities are not new but the result of institutional divestment and policy choices driven by multiple factors, including radicalized cultural pathologies. While data offers no easy solution to this problem, driven by bad actors, it can provide tools to activists and media to expose those who facilitate racism and provide an evidentiary basis for groups to demand change in their cities, states, and country. Indeed, the Washington Post has compiled an original database of homicide arrest data from the United States’s 50 largest cities to demonstrate how little many cities invest in solving homicides with minority victims and the consequences of those decisions.
Mass Incarceration and Criminalization: Countless studies indicate Black people suffer due to unequal arrest rates, plea deals, and sentencing alongside other forms of discrimination in the criminal justice system. However, the United States still lacks a comprehensive national racial demography of arrests and criminal records or nationwide data on the basic nature of the prison experience. This gap makes evaluating the success of programs intended to address mass incarceration or poor prison conditions. In 2014, the Manhattan District Attorney worked with the Vera Institute and identified significant racial disparities in which defendants were more likely to be prosecuted. As part of the cooperation, the office agreed to pursue strategies that would reduce racial and ethnic inequalities.
Police Violence: In recent years, the press and everyday people have recorded images of police using excessive force, often against Black persons. Yet, there is no official, reliable collection of civilian deaths and injuries caused by law enforcement. A partial database released by the FBI is considered by experts to be misleading. This lack of information makes it difficult for the public to exercise oversight over police and understand the full scope of police violence. Only with independent databases such as Fatal Encounters, Mapping Police Violence, and The Washington Post’s Fatal Force project have these episodes begun to be counted in a systematic fashion.

Economic Inequalities:

Income Inequality: There are major racial disparities in family wealth resulting from the legacies of slavery and modern-day segregation, redlining, and other forms of discrimination. While many know of these policies which deprive families of color of equality of opportunity, a lack of data can hide or exacerbate their effects. Long and persistent undercounting of certain Black populations has led to bias and inaccuracies in how government funding is distributed, limiting the resources provided to communities of color. Efforts such as the Black Census Project and Data for Black Lives have attempted to improve collection and address inequalities.
Educational and Training Achievement Gap: Systemic oppression has also led to significant gaps in educational outcomes. School districts serving majority minority populations receive significantly less funding than their majority white counterparts; Black and Hispanic students receive lower test results. Yet, while race is accepted as a defining factor in who graduates from schools in the United States, there is incomplete data in how it manifests. Recent studies have attempted to use available data in innovative ways to address this issue. Recently, the Center for the Analysis of Postsecondary Readiness used data analytics to create a multiple-measure placement algorithm that resulted in higher course completion levels for marginalized students.
Access to Infrastructure: People of color, including Black workers, are less likely to own cars and have access to other means of transportation; in some metropolitan areas, research shows that nearly half of Americans without internet access are people of color. Using data, public institutions can better identify where these areas are and deploy resources to address long-perpetuated inequalities in access to critical infrastructure. The nonprofit EducationSuperHighway uses publicly available information from the federal government to publish information about available bandwidth in schools and identify districts they can support.

Inequalities in Health

Reduced Quality of Care: In a 2005 report, the National Academy of Medicine noted Black patients were less likely than white patients to be given the appropriate care for certain conditions due to implicit and explicit bias. These conditions contribute to the fact that Black women are three to four times more likely to die in childbirth than white women. A related concern is the increasing role of algorithms and the possibility that biases in them might undermine care of minority patients. In 2019, a publication in Science reviewed a commercial healthcare algorithm used by doctors to recommend treatment. The study found the algorithm demonstrated significant racial bias, failing to note the complex health needs of black patients relative to white patients. The study’s authors reported the findings to the company responsible and is working with them without salary to improve the algorithm.
Exposure to Environmental Contaminants: High exposure to particulate matter, unclean water, and other pollutants can have serious health consequences, increasing the incidence of cancer, low birth weights, high blood pressure, asthma, and other health conditions. While there are significant anecdotal reports of increased health threats in communities of color, these stories are often ignored by policymakers until they reach a crisis point. Data can allow residents to prove their case and seek restitution. In the City of Zanesville, Ohio, Black residents used open data assets to demonstrate that African American homes were connected to contaminated water sources while white homes suffered none of the same issues. The analysis contributed to the residents’ victory in a court case and a $10.9 million settlement.
Distrust and Historical Trauma: One consequence of a legacy of discrimination, exploitation, and mistreatment is that many Black people do not trust the public health (and other) establishments. This distrust leads to people often not seeking out the help and services needed. Learning the impact of distrust on behavior and outcomes; as well as how to redesign public health services to increase trust and access requires new data driven initiatives.
Mental Health: Significant differences by race exist in mental health care due to differences in access, quality, and cost between white patients and patients of color. Data could allow for ways to address these gaps, through the creation of new services or better identification of needs. In one recent study for the AMA Journal of Ethics, researchers explored whether data-driven artificial intelligence could help mental health care practitioners better identify those in need of support.

Social Justice and Rights Inequalities

Access to Housing: Inequities in housing between white people and people of color are perpetuated through both laws on land use and more informal systems of discrimination. Questions remain regarding the optimal policy and social responses to these formal and informal barriers to equitable access to housing. Some organizations are experimenting with data-driven ways to understand and visualize phenomena like gentrification and segregation — including the MIT Media Lab’s Atlas of Inequality and those initiatives from Los Angeles and other cities compiled by Harvard.
Hate Crimes and Hate Speech: People of color, especially Black persons, are more at risk to be the targets of hate crimes relative to non-Hispanic whites. The 2018 FBI Hate Crime Statistics notes that 59.6% of reported violent hate crimes were motivated by race and ethnicity. While these figures suggest a significant problem, sources such as the FBI are notorious for undercounting and underrepresenting hate crimes and do not include many instances of hate speech, which anecdotal reporting indicates is pervasive in online and offline settings. Large social media and technology companies are experimenting with AI-based detection systems to identify and remove hate speech that appears on their platforms, though many such methods remain unproven.
Pandemic Surveillance and Privacy Rights: Community and advocacy groups are seeking ways to use data to reduce the impact of COVID-19 in communities of color while avoiding its weaponization through unchecked surveillance and inappropriate data access controls. However, there is little transparency about how institutions are using the data, which means communities have little input into what constitutes inappropriate use. Revealing these relationships could expose racial biases, as has become evident in uses of AI and facial recognition technologies, and develop toolkits to guard against tech-enabled racial inequities.
Voting Rights and Representation: Many members of minority communities face barriers to participating in the political process due to photo ID laws, literacy tests, and other requirements. Groups such as the Black Census Project and Data for Black Lives are seeking data-driven strategies to improve data collection to avoid under-representation of marginalized communities. Data has also informed efforts by organizations such as the Brennan Center in revealing the flaws in state attempts to purge their voter roles and the inequalities these activities produce.

Moving forward:

The above topics point to major manifestations of bias and injustice in the United States against Black people. While the existence of racism in many of these areas may not be news, many of the topics here can be addressed with policies informed by data or data analysis. With increased access to data, it is possible to advance understanding of the depth and causes of these inequalities and identify ways through which they can be systematically addressed. They can help decisionmakers identify their own biases and prejudices and understand how it has reproduced inequities.

In this conclusion, we summarize some of these avenues, describing how new data methods, increased access to data, improved data responsibility and hiring decisions can help policymakers and others chip away at the entrenched racism and bias evident in our society through data.

Revealing Hidden Inequalities: Sometimes racism is starkly apparent, but often it is more subtle and insidious. Data analysis can help policymakers make visible patterns and trends and take steps toward addressing them. Recent uses of data in metropolitan areas show that, despite efforts to promote integration and combat discrimination, many US cities remain deeply segregated. Experts such as Dayna Bowen Matthew at the Brookings Institution have sought to identify factors that contribute to this fact and recommend policies to address them.
Making The Data Life Cycle Less Racist: As described, finding hidden patterns within our society and economy is one important step toward addressing racism. Increasingly, though, policymakers also need to search for patterns of racism within data itself. Issues of algorithmic bias are often discussed in the context of a growing reliance on artificial intelligence, yet bias may exist across the data lifecycle from collection to analysis to reporting and dissemination. As our discussion of healthcare and sentencing inequities above suggests, such problems are indeed prevalent. We need an end-to-end data life cycle approach to ensure data is used responsibly, ethically, and does not exclude any part of the public. Already, Actionable Intelligence for Social Policy and the University of Pennsylvania has published a toolkit to help policymakers center racial equity in their use of data.
Outreach to Disenfranchised and Excluded Communities: Much as data can be used to identify lacunae and gaps that indicate racism, so data can be used in a more positive way by minority and historically excluded communities. In other words, we can use data to address patterns of racism, discrimination, and exclusion. A good example of this can be found in participatory budgeting, which engages marginalized groups and allows them to be directly involved in policy making. By soliciting the public’s input on the questions, issues, and indicators they care about, as The GovLab’s Data Assembly project does — researchers can ensure data use reflects public concerns — including the concerns of vulnerable and marginalized individuals. In the United Kingdom, the Citizens’ Biometric Council has centered the perspectives of those traditionally marginalized in its discussions on biometric technology governance to ensure the technology does not perpetuate and amplify existing injustices.
Increasing Access to Disaggregated Data: As decision-making becomes increasingly data-led, so equality of access to data becomes a core issue. Policymakers need to take steps to ensure that minority groups have access to data sets and their resulting insights, for example by expanding the types of information available on open data platforms and ensuring police departments comply with requests for data. Importantly, in order for this access to be meaningful, minority groups also need training and skill-building from which they are often excluded.
Trusted Intermediary: Many data projects suffer from a lack of clarity regarding the entity positioned to act on data-driven insights. The lack of a clear demand can lead to valuable or transformative insights going unused. Communities of color might also distrust actors that could represent the demand for data — e.g. police departments and over-policing data. An independent body could be empowered to help steer the responsible and participatory use of data to help address issues of race. As a first step, stakeholders could create an international body to build an evidence base and governance model for such a trusted intermediary — potentially modeled on Data2X, the gender data institution housed at the UN Foundation.
Hiring Decisions: Finally, while not strictly a data solution, it is important to emphasize the key role that hiring decisions for data (and related) roles can play in addressing racism and prejudice. Placing minority and disenfranchised candidates in data roles can have a dramatic effect on reducing the extent of racism and discrimination embedded in datasets.
Supporting Organizations Leading this Work: Many of the issues discussed in this piece are the focus of various organizations. The GovLab, seeking to amplify and generate support for those voices, has compiled a listing of some of those organizations here.

The first iteration of this piece was developed by The GovLab at New York University Tandon School of Engineering with contributions from: Stefaan Verhulst, Andrew J. Zahuranec, Andrew Young, Danuta Egle, Mary Ann Badavi, Nadiya Safonova, Rashida Richardson, Beth Simone Noveck, Charlton McIlwain, Mona Sloane, Juliet McMurren, and Amen Ra Mashariki.