How data is adding to the unfolding crisis in Afghanistan

This post originally appeared as part of the Data Values Digest, a weekly update with thought-provoking reflections on current events. Subscribe to it here.

Like you, I’ve seen the haunting scenes at the Kabul airport and heard the warnings from Afghan human rights activists, politicians, tech advocates, and journalists. Amidst the chaos, the Taliban has seized U.S. military biometric devices, and many Afghans are scrambling to erase their digital footprints. It’s clear that data pose an enormous threat to personal safety in Afghanistan under the Taliban.

As development agencies and others rush to reduce the potential damage of their data footprint in Afghanistan, the global data community needs to listen to this wakeup call and finally confront the tradeoffs and risks inherent to collecting, storing, and sharing data in sensitive contexts. These events should lead us to question our assumptions about what data should be collected and stored in the first place and what protocols should exist to govern and protect that data, including against worst-case scenarios of breach or seizure.

This shouldn’t be a surprise: The U.S. government began scrubbing personally-identifying information from websites and social media accounts on Wednesday but clearly had not given sufficient priority to this in advance. Yet none of this should come as a surprise—it’s not the first time even this year that data collected for ostensibly benevolent purposes have fallen into the hands of regimes with power to harm vulnerable people.

Disaggregated data in the wrong hands can exacerbate harm: Last month, Taliban officials ordered religious leaders to turn over lists of girls older than 15 and widows under 45 for marriage. This worst-case scenario is a key lesson of what to guard against since data disaggregation by age and sex is best practice in monitoring sustainable development.

Risk is contextual. While in a large city, it may be sufficient to anonymize data for individual privacy, the presence of even aggregated data for a 200-person village may be enough to put targeted groups at risk. In vulnerable contexts, our aim should be to preserve individual privacy and protect against risks of group-level harms, but most conversations on data disaggregation assume stable trajectories, good actors and benevolent use cases. What should we do differently when these are not safe assumptions?

Data can be a double-edged sword: Afghans who applied for jobs or worked for the U.S. government may be eligible for resettlement abroad, but that same employment data could put them at risk of reprisal from the Taliban. Conversations about personal data rights often assume people possess a level of agency and the mechanics of gathering consent are sufficient. It’s not enough to solely consider data rights when the mere existence of data can put people in a vulnerable position. Development agencies and NGOs as a result need to be more thoughtful in exercising discretion prior to collecting data in the first place. (On this point, I highly recommend Zara Rahman’s recent article in the New Humanitarian, The UN’s refugee data shame.)

Ultimately, this raises questions about what we value in data for development: Do aid agencies care more about demonstrating impact or about preserving the safety of vulnerable program participants? What trade-offs are we accepting when collecting and storing information in vulnerable contexts? That these stories include so many types of data prompts the question of who should be responsible or held accountable for protecting people’s information and safety?

While there are no easy answers to unring the bell of unsafe data practices in Afghanistan, one purpose of the Data Values Project is prompting these discussions and moving the community forward to help prevent future data disasters.

Until next week,

Josh