Illustration of Alpha Soko
Alpha Soko holds a Doctor of Philosophy in Mathematics (Computing option) from the Pan African University Institute for Basic Sciences Technology and Innovation. He is also an alumnus of the Africa Institute of Mathematical Sciences Tanzania. As part of the fellowship, Alpha was embedded at the Malawi Ministry of Gender, Children, Disability and Social Welfare. Illustration by Maryjane Uzodinma/Global Partnership for Sustainable Development Data.


What drew you to taking part in this fellowship program, and how would you sum up your experience?

I was drawn to the fellowship mostly because it was an opportunity to apply my academic training and experience working as an adjunct lecturer and secondary mathematics teacher, towards building capacity at an even higher level, this time with professionals as my students. Further, I am highly interested in applying data science to assess and address global problems, as demonstrated by my PhD research, which focused on water pollution. Although my dissertation focuses on a single topic, other areas that interest me for my future research stem from my goal of developing improved computation models and methods for any range of sustainable development indicators. Knowing that government ministries play a leading role in policy development, monitoring, and evaluation, I was excited to be involved and contribute to this cause for my country through working with the Ministry of Gender, Children, Disability and Social Welfare. I would sum it up as an extremely unique learning experience from many angles: from working with large volumes of descriptive data to training on data science tools, there were a lot of firsts. What I will remember the most is what I’ve learned about the role of data for development when it comes to equity and inclusivity, specifically with regards to women and children.

What were the objectives of your fellowship?

Gender Based Violence (GBV) and child protection remain serious development challenges in Malawi. For example, 42% of all females aged 15-19 are currently married or in union. Similarly, orphaned children are more vulnerable than their counterparts to violence, abuse, exploitation, and neglect. The Ministry of Local Government has developed a web-based integrated information management system composed of GBV and Child Protection Information Management Systems (CPIMS). The system generates knowledge on children’s vulnerability and aggregates data at community, district, and national levels. With support from UNICEF, the CPIMS was piloted in 12 of Malawi’s 30 districts in 2014. While the CPIMS is still operational, there have been resource and capacity constraints as far as maintaining and scaling it. Together with the national statistics office, the Ministry in 2020 identified strengthening of the CPIMS as a means to ensure the success of its coordinated efforts towards ending child marriage, teen pregnancies, and incidents of abuse against children. The objective of the fellowship was therefore to: (1) enhance capacity of monitoring and evaluation officers and statisticians in the Ministry in data analysis and visualization, (2) revamp the Ministry integrated information system through monthly production of statistical briefs on child protection, and (3) improve the CPIMS and incorporate indicators on child marriage issues into its dashboard. 

What are some of the capacity and financial restraints you identified during the project, and how did you overcome them?

There were a number of constraints concerned with the data lifecycle, beginning with the quality of data collected and ending with the quality of data produced by the CPMIS. For example, not all districts received tablets with which to upload data onto the system, which resulted in a process of paper-based data collection and processing, and limited use of the system over time. Subsequently, there were data quality issues related to human errors. At the start of my fellowship, I was tasked with cleaning this paper-based data collected from the different districts. It appeared that the tools used to collect GBV-related data were not harmonized or standardized across the different districts, and there was a lot of invalid, inconsistent, and incomplete data. 

This indicated that most of the social welfare officers responsible for data management, including entering the data, may not have the skills or capacity to do so properly. However, I was able to clean the data using Python Pandas, a process about which I also trained 11 officers at the Ministry. I also designed a standard data collection tool, with which GBV data could be collected in a uniform manner across the country, taking into account critical inclusion elements, such as disability status, gender disaggregation, and child marriage indicators. Further, I recommended that the Ministry conducts refresher training with social welfare officers and data quality assurance (validation of the data, examination of results, adjustments as necessary, and identification of successes and failures) in all 12 districts to ensure data is cleaner and more valid in future.

I observed that there were some bugs in the system that affected the way even properly entered and uploaded data was presented, resulting in the omission of vital information. Due to web connectivity issues, the system would often be down, further necessitating the transfer of data through paper versions. However, during my fellowship the Ministry recruited an officer from the national statistical office with skills to manually transfer incoming data into the system. This meant that the field officers with tablets and access to the CPIMS could resume entering data into the system or sending it through email to the statistical officer. One challenge I was not able to overcome was that some bugs remained in the system, which led to production of incomplete data analyses even after improved and complete data was uploaded. The government is planning to implement my recommendations and consult, recruit, or train responsible IT officers to troubleshoot the CPMIS and roll out the system into the country’s 30 districts.

Can you give a brief description of the monthly statistical briefs you provided the Ministry on child protection indicators, and how the Ministry used this data?

The report included a section on key statistics where I summarized the total number of registered offenses, and demographics of the victims (such as gender, parental status, access to education, etc.). It also contained district-level summaries, such as the number of abused children in each district, disaggregated by gender. I also created summary statistics on orphans, who are most vulnerable to violence and abuse, summarizing the number of the abused children who were orphans and their respective genders. Finally, the report summarizes abuses of different types: economic, emotional, labor, marriage, negligence, and physical abuse.

The Ministry uses this information to conduct case follow-ups at the district level. The data collected also includes the victim’s details such as their phone number, village, and parents’ contact details in order for the district social welfare officers to investigate the cases after they are reported, through provision of health care and psychosocial support, security and legal assistance, and justice for the victims. A highlight of my tenure was when I had the opportunity to join a field visit by personnel from the Ministry and demonstrated the Ministry’s case follow-up methods and records to officers on the ground, including details about how many cases have been taken to court and handled by the justice system. It is the Ministry’s mandate to ensure that every case is closed, so in-person follow ups are essential. I also found out that the Ministry Commissioner is in consultations with the districts to find more ways of using this data apart from regional case follow-ups.

What did training of the Ministry's data science staff on management of data and the CPIMS entail, and how did you ensure that it is sustainable?

I trained a total of 15 officers at the Ministry of Gender and one from the national statistical office (who joined remotely from Zomba), who are responsible for data management of the system. The training was on Python for data science and based on a customized syllabus that I developed. The syllabus covers Python basics, functional modules and some statistics and mathematics functions. The syllabus also acts as a manual for future use and reference.

Specifically, I introduced the trainees to a number of Python libraries, including NumPy, SciPy, Matplotlib, Pandas and Seaborn. A (software) library is a collection of files (called modules) that contains functions that other programs, in this case the CPMIS, can use including data cleaning and transformation, manipulation, analysis, aggregation, visualization, and scientific computing. The final output that I developed was a Jupyter notebook that summarized all these functions. Jupyter Notebook is an open-source web application that will allow the trainees to create and share documents containing summary statistics, visualizations and narrative text by running the application on new (cleaned) data. 

I also believe that the standard data collection tool that I developed, which incorporates child marriage variables, will be very beneficial for strengthening the CPIMS and ensuring its sustainability. There will be a need to do in-person training with social welfare officers responsible for entering this data to report incidents, so they can use the tools more appropriately and understand the importance of valid and consistent data entry. There also remains a need for distribution of integrated tablets across the districts for data collection.

What would change about the fellowship, if at all?

I would recommend that future fellowships include funds to support not only the fellow, as was the case this time, but also fund the institution and extracurricular activities. For example, I would have been able to facilitate quality assurance and training across the participating districts to make sure that valid values are entered if I had sufficient resources to go to the field. I would also want to convene various stakeholders to strengthen the project and share the results I achieved so far. Another recommendation would be to make the fellowships longer, for example annual rather than four months, as I feel I could have achieved a lot more given more time, since there is a lot of work to do towards scaling the CPIMS to work for the whole country.