Data sharing by different partners and aggregating different data sources on common platforms requires significant investments in terms of interoperability and data format. Data are interoperable when they follow commonly agreed upon technicalstandards and have sufficient metadata for an analyst to identify provenance, concepts, methods, and usage guidance.
Both standards and metadata are necessary for interoperability. Without standards, even well-described data (i.e., with metadata) are not user ready, and, without metadata, it is extremely time consuming to understand data or put them to use.
Even within single initiatives, different types of data can require different approaches to standardization and interoperability. For example, the California Data Collaborative receives two kinds of data — billing and program participation data. The former requires little standardization, as the billing format is uniform across agencies. However, the formats of the latter can vary. The CaDC has therefore developed a software that partner agencies can use to standardize participation data before sharing them.
While there is often a strong emphasis on standards, it is important to acknowledge that successful initiatives invest heavily in producing high-quality metadata. Good metadata can smooth the process of transferring and using data. To guide data providers, the Humanitarian Data Exchange has developed and made available clear instructions on resource and metadata fields that are required and provides support to new partners to apply them.
Ensuring interoperability among disparate datasets
Any humanitarian organization can upload its data to the Humanitarian Data Exchange. To ensure data sharing on the platform is seamless, HDX supports contributions in any standard data format. However, from the beginning, HDX has recognized that some standardization was necessary to reduce the friction of sharing data.
Its original approach to data standardization required time and technical expertise from contributing humanitarian organizations. But, in 2014, HDX rebooted this approach to develop the Humanitarian Exchange Language (HXL). As most data on HDX was stored in spreadsheets, the new approach merely required the data to include an identifier (a symbol using a hashtag) in the column title, making it possible to merge the same fields across multiple datasets. While there are still challenges in the adoption of HXL, studies carried out by the data sharing initiative show that it has greatly improved data use and saved data processing time.