🌐 Examples
Several different examples exist for data pipeline setup that can be instructive:
- In Estonia, an authorized data processor is sought through an open competition that acts as a technical provider for data storage, cleaning and processing. Until 2027, this provider is Positium, which is a private company. The statistical authority and operator have agreed on sharing the data through an API in encrypted .csv format, where all subscriber identifiers have been hashed (pseudonymized).
During the COVID-19 pandemic, Estonia implemented an emergency response model for utilizing mobile network data. Due to time constraints, instead of direct data sharing, a simplified approach was adopted. Statistics Estonia, acting as an intermediary, requested operators perform basic calculations on aggregated mobility data, based on methodology developed by Positium, and submit the results. Statistics Estonia’s role as an intermediary (between mobile network operator and the project team) was crucial as the operators did not want to show each other the regional distribution of their customer base.
- In Ghana, Telecel Ghana (previously Vodafone Ghana) provided access to pseudonymized telecommunications data free of charge. Using its open-source processing software, Flowminder aggregated and analyzed the data on behalf of Ghana Statistical Service. The technical details of data pipeline setup are outlined in the data sharing agreement.
- In Indonesia, the data remains at the mobile network operator which invests in (and is compensated for) staff time, storage, and technology. Statistics Indonesia has access to sample data, on which they are able to create and improve methodology, which the operator implements. Companies like Positium and Indonesian public research organizations assist with methodological developments, with or without access to data.
- In the Gambia, the telecommunications regulator PURA took the role of the data storage and processing party, with help from the University of Tokyo as a technical provider. A set of mobility statistics is being produced from CDR data in PURA’s premises and updated as new data come. CDR data are de-identified by respective mobile network operators to ensure that no personally identifiable information is included in the data used for producing statistics by PURA, which manages all data access. The NSO receives aggregates for the production of statistics and publication.
❗Tips
✅ Investigate the data sharing practices of other countries, through resources online, through international working groups or through direct discussions with the countries that have implemented operator data pipelines.
✅ At this point, it will be best to engage a technical provider or consultant to help with technical discussions on data pipeline setup. You should be able to find an expert with relevant technical background and experience in two or more countries.
✅ During technical discussions, make sure that the final approach for data sharing enables production of quality statistics. For that, it is important to ensure fit for purpose, data minimization, data quality checking, validated methodologies, and transparency of processing.
📖 Resources
See a description of the project setup and principles upheld in the projects of Estonia, Ghana, and the Gambia in Guiding principles to maintain public trust in the use of mobile operator data for policy purposes, Data for Policy, 2021.
See various data access options described in Chapter 3 of the UN Handbook on the use of Mobile Phone data for Official Statistics (2019).
⏩ Next Actions
With the technical committee, schedule working meetings with the goal of agreeing on the data sharing principles and pipeline setup.
Set a regular meeting cadence to finalize all the data sharing terms.
- Send the data sharing terms for approval to the steering committee.