Big data and new technological innovations have the potential to address health inequalities and improve health outcomes for patients. These new tools and methods are able to provide a stronger evidence base for more efficient, resilient, inclusive, and sustainable healthcare delivery. Their potential lies in the additional provision of relevant and timely data to individually produced patient and hospital records. For example, in the U.S. the analysis of streaming patient data has reduced mortality by 20 percent. Likewise,

  • Mobile data analysis: In response to the Ebola virus disease epidemic call data records (CDRs) from mobile network operators have been used to map people’s mobility and project the path of the disease. CDRs were a powerful proxy to identify risks, design information campaigns, and show impact of actions.

  • Patient monitoring through self-tracking via sensors, gadgets, and apps: In 2015, there were more than 100,000 health apps available for smart phones. In the U.S., 34% of all Americans who tracked their health habits stated that self-tracking has affected a health decision they have taken.

Nonetheless, the ability to combine multiple sources of data is essential to effectively harness big data in health care. Likewise, the ever-expanding volume of data with complex patterns may extend beyond the physician’s ability to use traditional data processing techniques for interpretation. A data revolution for improved health outcomes will require setting the right incentives to support coordination between different stakeholders within health care systems. Harnessing this potential will also require new partnerships to link data producers with data users and data analysts. Ultimately, recognizing the value of big data and the will to act on its insights demands a fundamental shift in mindset.

Read the full text of the research paper in which Sabrina Juran, Ph.D., Paul I. Heidekrueger, M.D., and P. Niclas Broer, M.D., Ph.D., explore how big data can be leveraged to improve health outcomes and present several examples to seize this opportunity.


Sabrina Juran, Ph.D. (1), Paul I. Heidekrueger, M.D. (2), P. Niclas Broer, M.D., Ph.D. (2)

In recent years, there is growing enthusiasm for leveraging technological innovations and harnessing alternative big data sources for effective evidence-based medicine. New information has become available to guide treatment and optimize outcomes in medical care. Big data and new technological innovations if harnessed and utilized effectively, have the potential to address health inequalities and improve health outcomes for patients. Big data will be able to provide a much stronger evidence base for more efficient, resilient, inclusive and sustainable medical interventions. Big data’s potential lies in the additional provision of relevant and timely statistics to individually produced patient and hospital records.[1] Triangulation and the ability to combine multiple types of data, such as patient and practitioner data are essential to effectively harness big data in health care. However, the ever-expanding volume of data with complex patterns may extend beyond the physician’s ability to use traditional data processing techniques for interpretation.[2]

However, while, big data has been successfully used in areas including, astronomy, retail, online behaviour, and politics, its utilization in health care remains could be strengthened.[3] In health care, “the introduction of digital epidemiology for disease surveillance, tracking and controlling outbreaks such as H1N1 influenza, Severe Acute Respiratory Syndrome (SARS), and more recently, H7N9 influenza and the Middle East Respiratory Syndrome Coronavirus (MERS-CoV), has become more real-time, and thus, more effective for identifying cases and implementing appropriate control measures.”[4] In the United States of America (US), the analysis of streaming patient data has reduced mortality by 20 percent.[5]

Advances in information technology include electronic health records (EHRs). In the US, “in 2014, 3 out of 4 (76%) hospitals had adopted at least a basic EHR system. This represents an increase of 27% from 2013 and an eight-fold increase since 2008. Nearly all reported hospitals (97%) possessed a certified EHR technology in 2014, increasing by 35% since 2011.”[6] Increasing the use of such data and the integration of new information sources with traditional ones can result in powerful for health care delivery and efficiency as well as patients’ safety and treatment. Medical passports containing diagnostic, pharmaceutical and follow up information, data from global position systems (GPS), mobile phone applications and wearable health devices allow people to track their progress towards a healthier lifestyle. Data from such devices and applications, provide more observation points, higher frequency and greater granularity. This detailed capture of data about patients provides an increased understanding of the intersection of lifestyle and diseases as well as potential treatment options, making void of the self-reported biases from patients. [7] [8] “To generate valuable knowledge, big data must come from high-quality individual clinical data. […] Big data will not achieve its full potential if it is not used to improve health outcomes for the individual patients from whom the data were generated.”[9]

Medical records should support the personalization of medical care and contribute to the engagement of patients in research and care. [9] Information on health status can further be derived from sources as diverse as cell phones, remote sensors and Internet use. Analysis of cell phone data usage patterns can allow inferences to be made about users' sex, age, socioeconomic status, mobility patterns and financial activities. For instance, in response to the Ebola virus disease epidemic call data records (CDRs) from mobile network operators have been used to map people’s mobility and project the path of the disease. CDRs were a powerful proxy to identify risks, design information campaigns, and show impact of actions. Other technological advances allow the medical industry to better understand diseases and what treatment to apply and translate personalized medicine into clinical practice. As such, Internet web search logs have proven to “provide valuable signals to predict the later appearance of first-person queries in disease management that are strongly suggestive of a professional diagnosis of pancreatic carcinoma. Performance of the risk stratification holds many weeks in advance and improves when conditioned on the presence of specific symptoms or risk factors found in people’s search histories.”[10]

Remote sensing reveals epidemiological trends of concern and provides information on physical access to clinics and other essential health services. While these approaches need a strong evidence base for calibrating big data, innovations and new technologies need to be embraced and integrated into the world’s growing health systems. To achieve these aims, strong systems are required, connecting the entire plethora of data producers and users, together with institutional capacity to use and integrate diverse types and sources of data. “The cost of answering many clinical questions prospectively, and even retrospectively, by collecting structured data is prohibitive. Analysing the unstructured data contained [for instance] within EHRs using computational techniques […] permits finer data acquisition in an automated fashion.”[3] However, physicians and clinicians, trained in traditional statistical methods might not have the capacity to analyse these new forms and amounts of data. 
Through machine learning, statistical learning techniques can be automatically applied to data sets to identify patterns in the data and to make highly accurate predictions. One example of an accessible machine learning method is a decision tree,[11] able to find the most suitable predictors of health risks and improvements. This methodology looks at one data feature at a time, to then, subsequently, find that feature in a new data files.[12] The decision tree uses if-then statements to define patterns in data to compute for example health risk probabilities from the established tree. Machine learning helps avoid the problem of sample duplication, because the data file would be randomly split into two samples.[13]

Some devices can take patient monitoring through self-tracking to a new level, by having patients actively enter data or passively via sensors, gadgets and apps. In 2015, there were more than 100,000 health apps available for smart phones. In the US, 69% of all Americans tracked their health statistics either through technology or in their heads and 60% logged their dietary habits. Thirty-four per cent of all Americans who tracked their health habits stated that self-tracking has affected a health decision they have taken.[5]

Big data has the power to transform health care and empower the individual patient by bringing the medical information directly to the patients. “Big data offers the chance to improve the medical record by linking traditional health- related data (eg, medication list and family history) to other personal data found on other sites (eg, income, education, neighborhood, military service, diet habits, exercise regimens, and forms of entertainment) […]. By doing so, big data offers a chance to integrate the traditional medical model with the social determinants of health in a patient-directed fashion. Public health initiatives […] could perhaps be delivered more efficiently in this way by targeting their messages to the most appropriate people based on their social media profiles.”[3]

Further, the health sector is more and more computerizing, which leads to a rise in the collection and exchange of invaluable patient data. “Unifying that data - and combining it with patient-collected data from smart devices - is the industry’s next big hurdle to overcome. Healthcare providers are already focusing on digitizing patient records and ensuring access to one set of records across the healthcare system.”  For instance, “the Pittsburgh Health Data Alliance aims to compile data from various sources (including medical and insurance records, wearable sensors, genetic data and even social media use) to draw a comprehensive picture of the patient as an individual, and then offer a tailored healthcare package.”[14]

Data science techniques and data models are able to examine and extract knowledge from large, partially unstructured databases in an automated fashion. While many diseases are well characterized and understood, in the future, new technologies with high precision, such as wearables, could detect diseases earlier and take into account the individual patient.[14] Soon, “adequate health and health care will, however, […] be impossible without proper data supervision from modern machine learning methodologies like cluster models, neural networks, and other data mining methodologies.”[12]

With respect to follow up, big data can inform physicians on the conditions and behaviour of patients that are likely to follow medical advice. Using such information for tailored treatment and follow up options can reduce readmissions rates of more vulnerable patient populations. “Apps are being developed that can track when a patient takes his or her medication - like GPS enabled inhalers for asthmatics. Others record information about calls, texts, physical location, movement and sleep patterns that can help alert doctors or family members if the patient is likely feeling unwell (poor sleep, lack of movement) or even in danger of an anxiety or other psychological attack.”[14]

Further, mobile technology can be used for more efficient disbursement of reimbursement and compensation for the patient. Since data for a single patient may come from various health care providers, hospitals, laboratories, and physician offices, new technologies allow for data being better linked at a lower cost. However, beyond the individual patient, marginal investments can lead to saving money for the health care system, beyond improving profits and cutting down on wasted overhead, for example by predicting the cost of managing specific illnesses across certain geographic regions and demographic groups.[5] In order to provide disincentives for overutilization of health services, there has been a shift from fee-for-service compensation to risk-sharing arrangements, such as prevention, chronic disease management, and prioritizing measurable outcomes.  Under such schemes, medical stakeholders have greater incentives to compile and exchange information. For example, careful study of data can give an insight into who is likely to get sick, thereby enabling preventive treatment. Under such mechanisms, patients have incentives to comply with clinical follow up and follow recommendations.
Further, “by using and analyzing big data tools such as predictive monitoring and algorithms with clinical trial data, disease patterns, and genomic data sets to inform the emerging field of personalized medicine, the healthcare sector could reduce expenditures by approximately $25 billion.”[4]

However, the integration of big data in the health care sector needs to rely on proper data privacy and data protection frameworks and mechanisms to ensure that responsible patient data practices are implemented from the start. An important responsibility will be the capture, management and storage of individual patient data. While attention to data privacy and data protection is growing,[15] there are still many challenges, some of them due to a fragmented regulatory landscape; lack of privacy-enhancing methodologies and tools to ensure that the data can be used. 

Realizing a medical data revolution will require setting the right incentives to support coordination between different stakeholders within health care systems. Recognizing the value of big data and being willing to act on its insights will require a fundamental shift in mind-set from patients and physician alike.

The key challenge is leveraging and utilizing big data to formulate better decisions in health care. Aggregating individual data sets into big-data algorithms often provides the most robust evidence, since nuances in subpopulations may be rare that they are not readily apparent in small samples. Knowing when, where and the extent to which conditions are changing that either hinder or advance desirable health outcomes of patients is potentially invaluable information because it allows health care providers to make mid-course corrections; i.e., necessary and effective changes or adjustments to treatment. 

The understanding of such new opportunities must be strengthened to improve health outcomes for the patient. Despite these potential benefits, historical, technological, legal, as well as cultural reasons may impede the use of new technologies and non-traditional data sources in health care. Until today, medical professionals have been rather reluctant to harness big data and use machine learning in their field.

Moreover, national health systems and health care providers often lack sufficient capacity and funding to harness big data. Health care institutions seldom possess adequate infrastructure and resources to produce, process and leverage big data. Smaller institutions have little access to modern technologies that enable big data, including supercomputing, data centres, broadband and ubiquitous Internet access. Those deficiencies extend to data producers and users, academia, civil society and the private sector. 

Medical data analytics need to make sense of big data with the right tools. For this, data producers need to work closer with data users and data analysts. Medical stakeholders need to go beyond medicine to explore what can be learned and added from other areas, such as development, biology etc. 
New pathways will open up as new data become available, which will foster a feedback loop. For instance, treatment could change if new data suggest that the standard protocol for a particular disease for a certain patient does not produce the optimal results.

With the world’s population increasing and everyone living longer, models of treatment delivery are rapidly changing, and many of the decisions behind those changes should be driven by data. Digital technologies have lowered the costs of producing and publishing data; they have eased the distribution and visualization of data and have hence democratized access to data and create new use cases for it.[16]

Going forward, due attention must be paid to the potential of big data for health care.  Harnessing this potential will require new partnerships and new commitments. In order to ensure that all people have access to quality medical, and in particular surgical treatment, health systems need to be linked up around the patient. Big data could transform the health-care sector, but the industry must undergo fundamental changes before stakeholders can capture its full value. There is a need, and opportunity, to mine the data generated in the health care field.

(1) United Nations Population Fund, Technical Division, Population and Development Branch, New York, NY, USA
(2) StKM - Klinikum Bogenhausen, Academic Teaching Hospital, Technical University Munich, Department for Plastic, Reconstructive, Hand, and Burn Surgery, Germany

This contribution reflects exclusively the personal opinion of the authors and not necessarily that of their employers.


[1] United Nations Economic Commission for Europe, (2014): Big data and modernization of statistical systems.
[2] Kanevsky, J., Jason Corban, Richard Gaster, Ari Kanevsky, Samuel Lin, and Mirko Gilardino. Big Data and Machine Learning in Plastic Surgery: A New Frontier in Surgical Innovation. Plastic & Reconstructive Surgery: May 2016 - Volume 137 - Issue 5 - p 890e–897e
[3] Murdoch, Travis B. and Allan S. Detsky, (2013): The Inevitable Application of Big Data to Health Care. JAMA. 2013;309 (13):1351-1352. doi:10.1001/jama.2013.393.
[4] Health Capital (2013). Application of “Big Data” in Today’s Healthcare Environment. Volume 6, Issue 8, Page 1 Last retrieved: 6 June 2016
[6] Charles, Dustin, Meghan Gabriel, Talisha Searcy (2015) Adoption of Electronic Health Record Systems among U.S. NonFederal Acute Care Hospitals: 2008-2014. The Office of the National Coordinator for Health Information Technology. Retrieved: June 6, 2016. 
[7] Broer, P.N., and Sabrina Juran (2014): Informing a Medical Data Revolution. Blog Post: Available at: Last retrieved: 24 May 2016
[8] Marr, Bernard (2016): Big Data: A Game Changer in Healthcare. Forbes Magazine 24 May 2016. 
[9] Sacristán, José A and Tatiana Dilla Pharm (2015): No big data without small data: learning health care systems begin and end with the individual patient. Journal of Evaluation in Clinical Practice 21 (2015) 1014–1017: 1016
[10] Paparrizos John, Ruen E. White and Eric Horvitz (2016): Screening for Pancreatic Adenocarcinoma Using Signals from Web Search Logs: Feasibility Study and Results. American Society of Clinical Oncology. Published online ahead of print. 
Health Care. JAMA. 2013;309 (13):1351-1352. doi:10.1001/jama.2013.393.
[11] R2D3 Software Package (2016): A Visual Introduction to Machine Learning. Available at:
[12] Cleophas, Ton J., and Aeilko H. Zwinderman (2015). Machine Learning in Medicine – a Complete Overview. Springer Heidelberg Germany.
[13] Cleophas, Ton J., and Aeilko H. Zwinderman (2012). Artificial Intelligence, Multilayer Perceptron Modeling. In: Machine Learning in Medicine. Springer Heidelberg Germany: pp 145–156,
[14] Marr, Bernard (2016): How Big Data is Changing Healthcare. Forbes Magazine 21 April 2015. 
[15] General Assembly resolution 69/166 of 18 December 2014 addressed the right to privacy in the digital age. In its resolution 28/16, the Human Rights Council appointed a Special Rapporteur on the Right to Privacy in July 2015. Both actions reaffirmed the escalating need to address data and privacy rights globally.
[16] Data Pop Alliance (2016): Opportunities and Requirements for Leveraging Big Data for Official Statistics and the Sustainable Development Goals in Latin America.

This Story is About These SDGs