Deep Learning for Crop Yield Prediction in Senegal

As a part of the COVID-19: Data for a Resilient Africa initiative with the UN Economic Commission for Africa, the Global Partnership brokered collaboration between Omdena, one of our technical partners, and stakeholders Senegal to support use of AI in addressing data gaps on food security. The Global Partnership contributed to shaping the initiative’s objectives, while Omdena worked with technical teams in Senegal to provide support in developing data-driven tools.

This resource was originally published here by Omdena.

Problem statement

Senegal has had strong and stable economic growth in recent years. However, more than one-third of the population still lives below the poverty line, and 75 percent of families continue to struggle financially.

Agricultural success is essential to fight poverty and malnutrition, but with 70 percent of the crops in Senegal being rain-fed, the increase of droughts caused by climate change is a threat to essential crops and has a direct impact on the availability and prices of food.

AI can be used to predict crop yields all around the country in order to assure food security by guiding the farmers, planning food storage and transport, and by helping policymakers focus on the most vulnerable communities.Yield prediction in developing countries can help prevent famine, support the local economy, and improve sustainable agricultural practices.

Therefore, in this project, we studied a cheap approach using satellite imagery to predict crop yield. The main challenge we faced was the lack of ground truth data: surveys reporting the yields in Senegal.

Our approach

Identify literature about using Deep Learning (DL) to predict crop yield

The first step was to find research papers that could guide us to start this project efficiently considering the fact that we only had two months to implement a solution. The most interesting papers we found and used are: County-Level Soybean Yield Prediction Using Deep CNN-LSTM Model, Deep Gaussian Process for Crop Yield Prediction Based on Remote Sensing Data, and Deep Transfer Learning for Crop Yield Prediction with Remote Sensing Data.

Data

So the first question answered thanks to these papers was: What data do we need to train a Deep Learning model predicting crop yield?

Indeed, the authors used two types of raw data:

Remote sensing data downloaded with Google Earth Engine (GEE)
Ground truth crop yield data: we had yield data collected by IPAR for the production of maize, rice, and millet in 2014

So, we downloaded the datasets MOD09A1.006 Terra Surface Reflectance 8-Day Global 500m and MYD11A2.006 Aqua Land Surface Temperature and Emissivity 8-Day Global 1km for the regions and departments of Senegal using Shapefiles. The first dataset has 7 bands of surface spectral reflectance that can be used to calculate the Normalized Difference Vegetation Index (NDVI), an indicator of vegetation’s health. The NDVI is calculated from the red light (which vegetation absorbs) and near-infrared light (which vegetation strongly reflects) reflected by vegetation.

Illustration comparing ratios of near infrared and visible light to two different outcomes of trees — *Source: NASA Illustration by Robert Simmon*

The second dataset has two bands: temperature during the day and temperature during the night.

Which Crop Land Cover should we use? What is the Crop Land Cover used for?

The Crop Land Cover dataset is used as a crop cover mask. This means that all pixels of the reflectance and temperature images that are not classified as cropland pixels will be removed from the images so that the model will only be exposed to data from the crops, and not from cities for example.

In the end, we used a different Land Cover dataset than the papers above. After comparing the MCD12Q1.006 MODIS Land Cover Type Yearly Global 500m that these papers used with the Copernicus Global Land Cover Layers, we decided to use the latter. This decision was taken after comparing the datasets with cropland maps of Senegal:

Cropland map of Senegal — *Locations of crops from the IPAR study (black=millet, blue=rice, purple=rice irrigated, yellow=maize). Source: Omdena*

Here, we see that the crops are mostly located in the South-West and the Northern regions of Senegal. However, when we look at the MCD12Q1.006 MODIS dataset where the cropland is labeled as brown, we see that a lot of the cropland is actually missing (Northern crops and most of the rice crops next to the Casamance River) but this is not the case with the Copernicus dataset, where the cropland is in labeled as pink. So we came to the conclusion that the Copernicus dataset was the most accurate for Senegal.

Copernicus land cover maps of Senegal — *Copernicus Land Cover 2015 VS MCD12Q1.006 MODIS Land Cover 2014 in Senegal . Source: Google Earth Engine*

The only problem with the Copernicus dataset is the time range: 2015–2020, knowing that we had ground truth yield data for the year 2014. However, we assumed that the land cover for the year 2014 was close enough to the one from 2015 and still better than the 2014 MCD12Q1.006 MODIS dataset, so used the 2015 Copernicus land cover as a crop mask for images from 2014.

At the end of this step, we had collected data for the three GEE datasets presented above, for the entire country, the regions, departments, and GPS locations from the IPAR study.

Three different Copernicus maps — Left to right: MYD11A2.006 Aqua Land Surface Temperature and Emissivity 8-Day Global 1km Senegal, MOD09A1.006 Terra Surface Reflectance 8-Day Global 500m Senegal, Copernicus Global Land Cover Layers Senegal . Source: Google Earth Engine

Preprocessing of the Data

According to the papers cited previously, using 3-D pixels count histograms instead of raw satellite images for the prediction of yield helps to avoid the model from overfitting (model too closely fit to a limited set of data points).

First band of the masked images of reflectance (right) and temperature (left) and their corresponding 32-bins histograms (only showing first band) for the department Foundiougne in the Fatick Region of Senegal year 2015 . Source: Omdena.

What is the number of weeks mentioned above?

Crop calendar in Senegal (French) — *Crop calendar of Senegal . Source: FAO*

In order to focus on the growing season, we only studied the images of the weeks following the planting of the seeds to the harvesting, which is, for instance, week 19–30 for maize.

Deep Learning Model

We decided to use the Deep Learning architecture from this paper which is a CNN-LSTM:

“CNN can learn the relevant features from an image at different levels similar to a human brain. An LSTM has the capability of bridging long time lags between inputs over arbitrary time intervals. The use of LSTM improves the efficiency of depicting temporal patterns at various frequencies, which is a desirable feature in the analysis of crop growing cycles with different lengths.”

*Architecture of the CNN-LSTM model used . Source: https://www.mdpi.com/1424-8220/19/20/4363/htm*

We also tried to use the CNN architecture proposed in this paper but the results were not satisfying, the model was finding a random value for all data points that minimized the loss, and then did not learn or improve afterward even with different hyperparameters. Since we were having better results with the CNN-LSTM, we decided to only focus on the latter.

*Example of CNN stuck at ~1.2 T/ha . Source: Omdena*

Transfer Learning

We used transfer learning to improve the maize model. We had some yield data from South Sudan and Ethiopia (Source: deep-transfer-learning-crop-prediction) that we used to train the model and then fine-tuned it using the yield data from Senegal.

Data augmentation

We tried to do some data augmentation on the IPAR dataset by taking sliding windows around the point of origin (lat/lon) and assuming the yield of the crops in these sliding windows was the same as at the point.

Figure 1 shows all the Points (lat/lon) from the IPAR dataset for the Maize crops and the other three figures show the sliding windows around one of the Points (describing the data augmentation explained above) . Source: Omdena

This method did not improve the maize and millet results but did improve the rice model.

Results

We ran several trainings with the different approaches explained previously and collected the resulting metrics:

*Metrics of the prediction for several trainings and crop types (MSE in T/ha) . Source: Omdena*

We can see that the Transfer Learning for the maize model did improve the MSE (Mean square error) and therefore was our best MSE. In comparison, the millet model did not do as well as the maize model but we did not have any other data to perform transfer learning. Finally, the rice model could be improved using data from other countries from the same Github repository where you found the maize data. To be noted: the MSE is higher for the rice model because the yields of rice are higher than the maize and millet yields in the first place (up to 14 T/ha).

Here are some visualizations of the maize results:

*CNN-LSTM results (Maize) . Source: Omdena*

We also ran the predictions on every department of Senegal over 4 years (2015–2018) for maize, rice and millet:

*Yield prediction for three different types of crop over four years in each Senegalese department . Source: Omdena*

Final product

We created an interactive notebook where the user can select the region they want to predict the yield. The user can also choose the year and crop type. While this notebook has several application areas. For instance, it can be used as a tool for policymakers to decide what food to import and export in order to maintain food security in the country. The tool can also help farmers make management and financial decisions.

After the user selects region on an interactive map, the notebook will then download the images for the region of interest, generate the 3D histograms, and use it as an input of the CNN-LSTM pre-trained model to predict the yield of the crop type selected. Source: Omdena

We also implemented another notebook that will take as input the GPS latitude and longitude instead of a selected area:

Yield prediction with GPS latitude/longitude as input . Source: Omdena

Conclusion

To conclude, in these two months, we were able to implement a Deep Learning model that predicts crop yield in Senegal following this schema:

*Summary of project’s structure . Source: Omdena*

As mentioned in this article, the lack of ground truth data was an issue that made the models not as efficient and accurate for Senegal as they could be. An improvement easily implementable would be to have ground-truth data like the 2014 IPAR dataset but for several years in order to be able to show the model the fluctuations over the years so that it could learn it and be more flexible to variations in the data.