Archive

Posts Tagged ‘Big Data’

Using Big Data to Learn from Positive Outliers

29 October 2018 Leave a comment

Why do a few individuals, communities or organisations achieve significantly better results than their peers?  The positive deviance approach tries to answer this question.

The story began in 1990, the Vietnamese government invited Save the Children (SCF) to help overcome the problem of child malnutrition.  Jerry Sternin, the SCF Programme Director, was asked to demonstrate impact within six months and decided to try the idea of positive deviance.  Building on past work[1]he undertook a village survey of child height and weight, looking for positive deviants: children from poor families, living among high malnutrition rates, who were nonetheless well-nourished.

In the pilot survey, he found six such families and began to study them intensively (see Figure 1).  By observing the food preparation, cooking and serving behaviours of these families, he found three consistent yet rare behaviours. Mothers of positive deviants:

  1. washed their children’s hands every time they came in contact with anything unclean;
  2. added to their children’s diet tiny shrimps from the rice paddies, and the greens from sweet potato tops; and
  3. fed their children less per meal but more often: four to five times per day compared to two times in non-positive deviant families.

Sternin and his team then scaled out those simple, affordable, community-inspired practices and, within two years, this had reduced malnutrition by 80% in 250 communities, rehabilitating an estimated 50,000 malnourished children[2].

Figure 1: Jerry Sternin speaking to mothers in a village in Vietnam

The simple power of the positive deviance (PD) approach has led to its successful application in more than 60 countries across the globe[3].  Yet PD still faces a number of challenges to its diffusion and implementation.  As a result, we decided to investigate whether big data might help address those challenges, via a systematic review, published in the Electronic Journal of Information Systems in Developing Countries.

A priori, big data provides opportunities in relation to two main PD challenges.

1. Time, Cost and Sample Size. Relying on in-depth primary data collection, the PD approach is time- and labour-intensive with costs proportional to sample size[4]. As a result, PD sample sizes are traditionally small.  Statistically and practically, this can make it hard to identify positive deviants, given their relative rarity (see Figure 2)[5].  By contrast, cost of gathering big data tends to be very low since it often makes use of already existing “data exhaust” from digital processes.  With big data thus covering large – often very large – sample sizes, greater numbers of PDs can be identified, and generalisation to even-larger populations is easier.

Figure 2: Positive deviants in a normal distribution

2. Domain and Geographic Scope. To date, most applications of PD have been highly concentrated. In a recent systematic literature review[6], 89% of applications in developing countries were in public health, 83% were in rural communities, and just four countries had hosted roughly half of all PD implementations.  A simultaneous review of big data in developing countries, on the other hand, showed datasets and demonstrated value across a much wider set of domains and locations.  As a result, big data could help positive deviance to break from its current path dependency.

To assess these and other benefits that big data may bring to the PD approach – relating to behaviour identification, methodological risk, and scalability – a “big data-based positive deviance” research project has been designed and is underway.  The project is currently identifying positive deviants from large-scale datasets in the education and agriculture domains, with results planned to emerge in 2019.

For further details on the challenges of positive deviance and the opportunities offered by big data, please refer to the review article.

REFERENCES

[1]Wishik, S. M. & Van Der Vynckt, S. (1976) The use of nutritional “positive deviants” to identify approaches for modification of dietary practices, American Journal of Public Health, 66(1), 38–42. Zeitlin, M. F. et al.(1990) Positive Deviance in Child Nutrition: With Emphasis on Psychosocial and Behavioural Aspects and Amplications for Development. Tokyo: United Nations University.
[2]Sternin, J. (2002) Positive deviance: a new paradigm for addressing today’s problems today, The Journal of Corporate Citizenship, 57–63.
[3]Felt, L. J. (2011) Present Promise, Future Potential: Positive Deviance and Complementary Theory.  Lapping, K. et al.(2002) The positive deviance approach: challenges and opportunities for the future., Food and Nutrition Bulletin, 23(4 Suppl), 130–7.  Marsh, D. R., Schroeder, D. G., Dearden, K. A., Sternin, J. & Sternin, M. (2004) The power of positive deviance, BMJ, 329(7475), 1177–1179.
[4]Marsh et al. (ibid.).
[5]Springer, A., Nielsen, C. & Johansen, I. (2016) Positive Deviance by the NumbersPositive Deviance Initiative. Available at: https://positivedeviance.org/background/.
[6]Albanna, B. & Heeks, R. (2018) Positive deviance, big data and development: a systematic literature review, Electronic Journal of Information Systems in Developing Countries.
Advertisements

Big Data and Healthcare in the Global South

The global healthcare landscape is changing. Healthcare services are becoming ever more digitised with the adoption of new technologies and electronic health records. This development typically generates enormous amounts of data which, if utilised effectively, have the potential to improve healthcare services and reduce costs.

The potential of big data in healthcare

Decision making in medicine relies heavily on data from different sources, such as research and clinical data, rather than only based on individuals’ training and professional knowledge. Historically, healthcare organisations have often based their understanding of information on an incomplete grasp of reality on the ground, which could lead to poor health outcomes. This issue has recently become more manageable with the advent of big data technologies.

Big data comprises unstructured and structured data from clinical, financial and operational systems, and data from public health records and social media that goes beyond the health organisations’ walls. Big data, therefore, can support more insightful analysis and enable evidence-based medicine by making data transparent and usable at much broader verities, much larger volumes and higher velocities than was ever available to healthcare organisations [1].

Using big data, healthcare providers can, for example, manage population health by identifying patients at high-risk during disease outbreaks and then take preventive actions. In one case, Google used data from user search histories to track the spread of influenza around the world in near real time (see figure below).

Google Flu Trends correlated with influenza outbreak[2]

Big data can also be used for identifying procedures and treatments that are costly or delivering insignificant benefits. For example, one healthcare centre in the USA has been using clinical data to bring to light costly procedures and other treatments. This helped it to reduce and identify unnecessary procedures and duplicate tests. In essence, big data not only helped to improve high standards of patient care but also helped to reduce the costs of healthcare [3].

Medical big data in the global south

The potential healthcare benefits of big data are exciting. However, it can offer the most significant potential rewards for developing countries. While global healthcare is facing challenges to improve health outcomes and to reduce costs, these issues can be severe in developing countries.

Lack of sufficient resources, poor use of existing funds, poverty, and lack of managerial and related capabilities are the main differences between developing and developed countries. This means health inequality is more pronounced in the global south. Equally, mortality and birth rates are relatively high in developing countries as compared to developed countries, which have better-resourced facilities [4].

Given improvements in the quality and quantity of clinical data, the quality of care can be improved. In the global south in particular, where health is more a question of access to primary healthcare than a question of individual lifestyle, big data can play a prominent role in improving the use of scarce resources.

How is medical big data utilised in the global south?

To investigate this key question, I analysed the introduction of Electronic Health Records (EHR), known as SEPAS, in Iranian hospitals. SEPAS is a large-scale project which aims to build a nationally integrated system of EHR for Iranian citizens. Over the last decade, Iran has progressed from having no EHR to 82% EHR coverage for its citizens [5].

EHR is one of the most widespread applications of medical big data in healthcare. In effect, SEPAS is built with the aim to harness data and extract value from it and to make real-time and patient-centred information available to authorised users.

However, the analysis of SEPAS revealed that medical big data is not utilised to its full potential in the Iranian healthcare industry. If the big data system is to be successful, the harnessed data should inform decision-making processes and drive actionable results.

Currently, data is gathered effectively in Iranian public hospitals, meaning that the raw and unstructured data is mined and classified to create a clean set of data ready for analysis. This data is also transferred into summarised and digestible information and reports, confirming that real potential value can be extracted from the data.

In spite of this, the benefit of big data is not yet realised in guiding clinical decisions and actions in Iranian healthcare. SEPAS is only being used in hospitals by IT staff and health information managers who work with data and see the reports from the system. However, the reports and insights are not often sent to clinicians and little effort is made by management to extract lessons from some potentially important streams of big data.

Limited utilisation of medical big data in developing countries has also been reported in other studies. For example, a recent study in Saudi Arabia [6] reported the low number of e-health initiatives. This suggests the utilisation of big data faces more challenges in these countries.

Although this study cannot claim to have given a complete picture of the utilisation of medical big data in the global south, some light has been shed on the topic. While there is no doubt that medical big data could have a significant impact on the improvement of healthcare in the global south, there is still much work to be done. Healthcare policymakers in developing countries, and in Iran in particular, need to reinforce the importance of medical big data in hospitals and ensure that it is embedded in practice. To do this, the barriers to effective datafication should be first investigated in this context.

References

[1] Kuo, M.H., Sahama, T., Kushniruk, A.W., Borycki, E.M. and Grunwell, D.K. (2014). Health big data analytics: current perspectives, challenges and potential solutions. International Journal of Big Data Intelligence, 1(1-2), 114-126.

[2] Dugas, A.F., Hsieh, Y.H., Levin, S.R., Pines, J.M., Mareiniss, D.P., Mohareb, A., Gaydos, C.A., Perl, T.M. and Rothman, R.E. (2012). Google Flu Trends: correlation with emergency department influenza rates and crowding metrics. Clinical infectious diseases, 54(4), 463-469.

[3] Allouche G. (2013). Can Big Data Save Health Care? Available at: https://www.techopedia.com/2/29792/trends/big-data/can-big-data-save-health-care (Accessed: August 2018).

[4] Shah A. (2011). Healthcare around the World. Global Issues. Available at: http://www.globalissues.org/article/774/health-care-around-the-world (Accessed: August 2018).

[5] Financial Tribune (2017). E-Health File for 66m Iranians. Available at: https://financialtribune.com/articles/people/64502/e-health-files-for-66m-iranians (Accessed: August 2018).

[6] Alsulame K, Khalifa M, Househ M. (2016). E-Health Status in Saudi Arabia: A Review of Current Literature. Health Policy and Technology, 5(2), 204-210.

Measuring the Big Data Knowledge Divide Using Wikipedia

Big data is of increasing importance; yet – like all digital technologies – it is affected by a digital divide of multiple dimensions. We set out to understand one dimension: the big data ‘knowledge divide’; meaning the way in which different groups have different levels of knowledge about big data [1,2].

To do this, we analysed Wikipedia – as a global repository of knowledge – and asked: how does people’s knowledge of big data differ by language?

An exploratory analysis of Wikipedia to understand the knowledge divide looked at differences across ten languages in production and consumption of the specific Wikipedia article entitled ‘Big Data’ in each of the languages. The figure below shows initial results:

  • The Knowledge-Awareness Indicator (KAI) measures the total number of views of the ‘Big Data’ article divided by total number of views of all articles for each language (multiplied by 100,000 to produce an easier-to-grasp number). This relates specifically to the time period 1 February – 30 April 2018.
  • ‘Total Articles’ measures the overall number of articles on all topics that were available for each language at the end of April 2018, to give a sense of the volume of language-specific material available on Wikipedia.

‘Big Data’ article knowledge-awareness, top-ten languages*

ko=Korean; zh=Chinese; fr=French; pt=Portuguese; es=Spanish; de=German; it=Italian; ru=Russian; en=English; ja=Japanese.
Note: Data analysed for 46 languages, 1 February to 30 April 2018.
* Figure shows the top-ten languages with the most views of the ‘Big Data’ article in this period.
Source: Author using data from the Wikimedia Toolforge team [3]

 

Production. Considering that Wikipedia is built as a collaborative project, the production of content and its evolution can be used as a proxy for knowledge. A divide relating to the creation of content for the ‘Big Data’ article can be measured using two indicators. First, article size in bytes: longer articles would tend to represent the curation of more knowledge. Second, number of edits: seen as representing the pace at which knowledge is changing. Larger article size and higher number of edits may allow readers to have greater and more current knowledge about big data. On this basis, we see English far ahead of other languages: articles are significantly longer and significantly more edited.

Consumption. The KAI provides a measure of the level of relative interest in accessing the ‘Big Data’ article which will also relate to level of awareness of big data. Where English was the production outlier, Korean and to a lesser extent Chinese are the consumption outliers: there appears to be significantly more relative accessing of the article on ‘Big Data’ in those languages than in others. This suggests a greater interest in and awareness of big data among readers using those languages. Assuming that accessed articles are read and understood, the KAI might also be a proxy for the readers’ level of knowledge about big data.

We can draw two types of conclusion from this work.

First, and addressing the specific research question, we see important differences between language groups; reflecting an important knowledge divide around big data. On the production side, much more is being written and updated in English about big data than in other languages; potentially hampering non-English speakers from engaging with big data; at least in relative terms. This suggests value in encouraging not just more non-English Wikipedia writing on big data, but also non-English research (and/or translation of English research) given research feeds Wikipedia writing. This value may be especially notable in relation to East Asian languages given that, on the consumption side, we found much greater relative interest and awareness of big data among Wikipedia readers.

Second, and methodologically, we can see the value of using Wikipedia to analyse knowledge divide questions. It provides a reliable source of openly-accessible, large-scale data that can be used to generate indicators that are replicable and stable over time.

This research project will continue exploring the use of Wikipedia at the country level to measure and understand the digital divide in the production and consumption of knowledge, focusing specifically on materials in Spanish.

References

[1] Andrejevic, M. (2014). ‘Big Data, Big Questions |The Big Data Divide.’ International Journal of Communication, 8.

[2] Michael, M., & Lupton, D. (2015). ‘Toward a Manifesto for the “Public Understanding of Big Data”.’ Public Understanding of Science, 25(1), 104–116. doi: 10.1177/0963662515609005

[3] Wikimedia Toolforge (2018). Available at: https://tools.wmflabs.org/

%d bloggers like this: