Building a predictive model of deforestation to inform conservation

Illustration of Abdoul Diallo — Abdoul Diallo has a graduate degree in Mathematics with a specialization in big data from the Africa Institute of Mathematical Sciences. As part of the fellowship program, he was embedded at Agricultural and Rural Prospective Initiative (IPAR), in partnership with the National Agency for Statistics and Demography (ANSD) and Planning and Environmental Watch Department (DPVE) of the Ministry of Environment and Sustainable Development, Senegal.
Illustration by Maryjane Uzodinma/Global Partnership for Sustainable Development Data.

What drew you to taking part in this fellowship program and how have you found the overall experience?

What attracted me most about this program is the fact that it focuses on using data for social good, which is something I am passionate about. My goal is to strengthen my data science abilities and gain more experience in solving problems related to machine learning. As a recent graduate, I saw this as a unique opportunity to apply my skills towards data-driven solutions in the professional world.

Can you give an overview of the project and its objectives?

The goal of this project was to analyze the current rate of deforestation in the country as a consequence of agricultural activities and urbanization in order to predict how it will progress, and thus inform national policies aimed at curbing desertification, deforestation, and forest and soil degradation. I had two main tasks to perform: (1) to analyze how deforestation is related to agriculture and urbanization, and (2) to develop a ML model to predict forest cover change in a given agro-ecological zone.

How can this data be used to support decision making for sustainable development?

The loss of trees and other vegetation can result in climate change, desertification, soil erosion, flooding, and loss of habitats for up to 70% of plants and animals. Forests also contribute to the livelihoods of low-income communities. The role of predictive modeling is to facilitate decision-making by allowing authorities to easily detect areas highly threatened by deforestation in order to take quick and effective action against the causes of deforestation in those areas. The model has the capability to identify areas at risk of biodiversity loss and potential future carbon dioxide emissions. The data produced is intended for IPAR and for partner institutions like DPVE to use in their advocacy and policy work. But these data will also be publicly available for use by anyone working on the issue of deforestation, particularly on modeling the extent of deforestation. This means the public will also be empowered and informed, which is important so that citizens are also able to act or demand the necessary government action.

Which aspects of data did you work with and which data science techniques did you apply?

I worked with three key datasets. One was an urbanization dataset that contains the rates of urbanization in each of Senegal’s 14 regions over the period of 48 years, from 1970 to 2018. This dataset was taken from the Open Data for Africa website. The next dataset was on the areas of deforestation in each of Senegal’s 14 regions, from 2000 to 2020. This is satellite data taken from the Earth map website. The third dataset contained the total areas of agricultural land in Senegal, from the year 2000 to 2020. This dataset came from IPAR’s own analyses. For data cleaning and exploration, I used the Python packages Pandas, Numpy and Matplotlib to clean up and consolidate the datasets into a single dataset. At this stage, I used all the variables in each dataset to get a better understanding of the data. For the modeling part, I used the ensemble methods of the Sklearn package, especially the RandomForest ML algorithm. I used only the variables that are most correlated with the target variable (risk of deforestation) at this stage. To deploy the models, I used the Streamlit framework, which allows you to easily array your model and share it with others.

What are some of the challenges you encountered during the project, and how did you overcome them?

The main challenge I faced during this project was getting access to secured data, since it required me to submit a form and meet certain conditions, which was mostly time-consuming. However, I understand that this is inevitable as part of ethical data handling and sharing. I also secured the outputs I generated from this data before submitting it to IPAR and ANSD. However, some of the data was taken from public data sites, so I was able to work with this data while waiting for approval to access the secured data. The other challenge was the labeling of data because the available data were not labeled. In machine learning, labeled data means that data is annotated to show the target, which is the answer you want your machine learning model to predict. It was necessary for me to label the data before I started the modeling process. This is also a highly time-consuming, but necessary step.

What did the capacity transfer process entail and how have you ensured sustainability of this project after your fellowship ends?

Regarding capacity transfer, it was a ‘Trainer of Trainers’ approach where I mentored one colleague, Ndèye Fatou Mboup, who is leading the data science team at IPAR. I explained certain concepts of ML related to the use of data and the establishment and deployment of a model. Apart from that, I led a workshop with Ndèye Fatou at IPAR on AI technologies. Fourteen people participated in this workshop, including eight men and six women. This workshop built the foundation for the team at IPAR to pursue further work using AI, as well as to sustain the functionality of the model I developed especially as the team is managed by Cheikh Faye and Ndèye Fatou, who are both skilled in data science, having taken part in the training component that preceded my fellowship.