Last fall, I entered a competition called Visualize 2030 hosted by Google Cloud in collaboration with the World Bank, the UN, and the Global Partnership for Sustainable Development Data. They called it a data storytelling contest, and the object was to create a data-driven narrative based on the 2030 Agenda.
The 2030 Agenda is an initiative that was launched by a United Nations Summit in 2015. Its intent is to define a framework of what they call Sustainable Development Goals — or SDGs — to make the world a better place by 2030. The goals cover a wide range of societal and environment improvements intended to end poverty, help society, and preserve the environment.
The goal of the narrative was to provide insight into how two or more of the SDGs influence each other. The contest provided two tools that competitors were required to use: public data sets provided by the UN Statistics Division and the World Bank for the content of the story and Google DataStudio as the platform for the story. There were no other requirements for or even suggestions about length, format, organization, or content — it was wide open.
From the start, I was very interested in SDG 5, Gender Equality. The more research I did about gender equality as it relates to sustainable development, the more intrigued I became with the idea that gender equality can’t exist as an abstract concept — it must be embodied in our core social structures to have an impact in the everyday lives of women. So, I decided to explore other development goals that also rely on these structures to see if there was a relationship. In particular, I thought that gains in health, education, and employment (SDGs 3, 4, and 8) might have the potential to dramatically improve gender equality — and gains in gender equality would necessarily result in improvements in these areas. Out of these ideas came my project, which I called “The Building Blocks of Gender Equality”, and out of my project came some useful lessons.
1. How to Wrangle Big Data
I chose to work with UN Sustainable Development Indicators (SDIs) data set, which included data for 100 different SDIs. SDIs are measurable data that are used as barometers of progress toward the Sustainable Development Goals. The data set included over 1 million records, gathered from over 300 countries and regions. In many cases, the data was multidimensional, organized in complex structures.
In order to tame this huge data set, I used the Google Explorer tool, which is included in Google DataStudio. It’s an early stage “labs” feature that facilitates data exploration. It allows for quick, on-the-fly manipulation and visualization. I thought of it as kind of a digital sketchbook for my ideas about the data. Perhaps the best feature is that once you are ready to commit, you can export the results of your explorations to DataStudio report, so you don’t have to recreate anything or duplicate effort.
2. How to Create Interactive Data Viz
D3.js, used by the New York Times, the Pudding, and many other high-quality outlets for data visualization, is the state of the art for interactive data viz. However, by most accounts, D3 has a very steep learning curve. Even those who have a strong computer science background say that it is a challenging skill to acquire. By contrast, Google DataStudio — which was originally intended for people in marketing — is very accessible. It’s quick to learn, quick to use, it requires no coding and no special hardware. It allowed me to produce interactive visualizations for my ideas in a very short amount of time.
The important takeaway from these first two lessons is not, “Hey, look at these cool new tools!” But rather, it is about the democratization of data literacy. As someone who is investing a lot of resources into becoming a professional data scientist, I have learned Python and R and I’m planning to learn D3 — and I plan to use these tools in my career. As much fun as it was to learn how to use DataStudio, I don’t see it changing how I will practice data science in the future. However, I think DataStudio — and other tools like it — may change how the world does data science in the future. When my 4-year-old son and his friends are running things, I believe that basic data manipulation and interpretation will be a necessary life skill, like writing and arithmetic. Easy, accessible tools — like Data Studio — will make that transformation possible.
3. The Potential of Feminist Data Science
Feminist data science is not a term I’ve heard used much (or really at all). However, studying data science in the midst of the #MeToo movement, surrounded by debates in the tech industry ranging from toxic masculinity in Silicon Valley to algorithmic fairness, it’s something I’ve thought a lot about.
These are the questions I’ve been asking myself:
- How can we use data science to quantify and communicate inequalities?
- How can we use data science to replicate successes in diversity and inclusion?
- How can we promote social good through our practice of data science?
I don’t know the answers to these questions yet, but I have learned from a long career in a male-dominated industry that the playing field often truly appears to be level to those who are standing on the high side of it. It’s a kind of optical illusion. I think to these people, numbers that prove inequality are more persuasive than being told of feelings of inequality. This applies not only to gender issues but all kinds of socio-economic issues.
As data scientists, we are uniquely qualified to engage in this kind of persuasion, if we choose. I did not enter this field with any particular ambitions about using my new skills to help others — but now I am beginning to see how possible — and powerful — it could be.
4. To Observe Is To Be Changed
Most people have probably heard of the Observer Effect, which is the scientific principle that the act of observing will influence the phenomenon under observation. But I think there is another Observer Effect that is particularly relevant for data scientists — and that is that the act of observing will change the observer.
I learned an enormous amount about the world through working on the analysis for this project. I already had a strong sense that being a woman in Scandinavia is a lot easier than being a woman in Africa. But there is so much more to that story.
I learned heartbreaking things about countries like Eswatini and Lesotho, tiny sub-Saharan African nations where women routinely face unimaginable hardship and danger. I learned that in Yemen, less than 5% of girls are in school, and in the State of Palestine, more than 40% of women are unemployed.
But I also learned about Cuba, where in spite of extremely high poverty, virtually every girl learns to read. I learned about Bolivia, where female representation in parliament has increased 460% since 2000, to over 50% in 2018. Perhaps the most extraordinary country I learned about is Rwanda, whose leaders came together after their horrific genocide in 1994 to face the hard fact that with so many of their young men dead, it was critical for their country’s survival to empower their women. This resulted in the establishment of a 30% quota for female representation in their parliament. Today, over 60% of the Rwandan parliament is women — the largest proportion in the world.
I travel, I read the news, I like to think that I know what is happening in the world — but there was so much I didn’t know, that I learned by doing this project. It’s a privilege, if you think about it, to have access to so much information about people and their lives. As data scientists, it’s really important that we recognize that we have that privilege. The data are just numbers — but the insight that we extract from them can make us more thoughtful, more compassionate, more ethical people — and I think we should welcome the opportunity to change in that way.
5. Just Try!
I did not enter the Visualize 2030 competition thinking that there was any possibility that I could win. I had no prior experience with the data. I had no prior experience with the DataStudio platform. I had no special knowledge about the global development domain. I had never done a data viz project. I didn’t think that my idea was innovative enough. Worst of all — the contest began on July 24, but I didn’t learn about it until November 9 — and submissions were due on November 16. I literally had one week to work on it.
So, the odds really did not seem to be in my favor. But I thought it would be interesting and a good learning experience — so I did it anyway. About six weeks after submitting, I learned that I was one of five winners! Google gave me some lovely exposure, including presenting my project at the Davos World Economic Forum in January, which resulted in lots of accolades. But more important than that was the validation that I could do something good, something meaningful, in this space. Even more fundamental than that was the reinforcement of an idea that is so important, it underpins every other lesson I’ve written about here, and many more: you have to just try.
As a lifelong perfectionist, it has always been hard for me to risk being less than perfect. As someone who has become quite established professionally, it has been scary for me to contemplate starting over as a new data scientist. But when I decided to apply to graduate school just over a year ago, I made myself a promise — that I would measure my own success by the quality of the effort that I put into the experience. If I hadn’t won this competition, that would have been fine — I still would have walked away having learned the things that I’ve written about here. The critical thing is to put our fears aside and just try. It is amazing what we can accomplish when we do.