Data sharing partnerships are characterized by opening up access to and sharing of information. However, doing so raises security concerns, which can be particularly serious in the context of personal data, defined as any information that relates to an identified or identifiable living individual. Even though most successful data sharing initiatives in the development sector use only anonymized, aggregated, or non-personal data, concerns among data providers about the fallibility of anonymization methods and the risk of re-identification remain.
Security and storage policies aim to address these concerns and are the primary tools to ensure technical safeguards are in place.
Data sharing initiatives can focus on ensuring data security by having in place the people, processes, and tools necessary for protecting data confidentiality and integrity against malicious attacks or unintended accidents throughout the data life cycle.
Creating controlled data environments to secure data privacy
Since 2014, the LinkedIn Economic Graph challenge has invited research and analytic partnerships in which LinkedIn data can be leveraged to identify macroeconomic labor and economic trends.
To ensure the security of its members’ data, the challenge provides extensive security training to each participating team and mandates that work be performed only on LinkedIn-issued laptops on the LinkedIn network within a monitored sandbox environment. Data downloaded outside of this network is heavily restricted, and a LinkedIn employee collaborator supervises all access to and use of data. Data use is restricted to the specific goal identified in the research partnership. In addition, an internal review board evaluates all research products created through the partnership.
Recent research carried out on LinkedIn data, for instance, shows that green skills are increasingly demanded on the job market, with at least 10% of job postings from the last year requiring them. It also suggests that more workers are green-skilling and transitioning into green and greening jobs, driving positive net transitions into these jobs.
Approaches to securely storing data can be centralized, federated, or distributed, with each utilizing different ways to ensure the safety and security of the data concerned. While most initiatives analyzed rely on centralized storage systems, there is no one-size-fits-all approach and, in the development sector and beyond, a growing trend toward decentralization of storage is linked to increasing concerns over power imbalances and data hoarding.
Shifting from centralized to decentralized data storage
The Implementation Network for Sharing Population Information from Research Entities (INSPIRE) is a partnership to share data from Health and Demographic Surveillance Sites in five countries in East Africa. Initially, INSPIRE was established to create a large health data repository, and the data were stored centrally in a cloud-based facility, with the initiative, led by the African Population and Health Research Center, as the custodian of the data.
The INSPIRE team, however, in line with their strategy of increasing data providers’ capacity and strengthening the skills of the research institutes involved in the initiative, intend to transition from being a repository of data to serving as a platform for services. This entails a shift toward a federated storage system in which the data providers remain custodians of their own data. INSPIRE’s platform would then mine the data remotely for particular use cases.
As one of the key initiative stakeholders put it, the objective of this transition is to explore an approach in which no single partner holds all the data.