Global South researchers succeeding against the odds: how are they different?

Understanding the Context

How are some global South researchers able to overcome contextual constraints and become highly cited?

There is a clear research divide between the global South and the global North[1] in terms of research investment and capabilities. The average national expenditure on research and development in Southern countries is 0.38% compared to 1.44% in Northern countries[2]. The number of researchers per million population in 2017 was 713 in the global South and 4,351 in the global North[3]. This had implications on the volume and impact of scientific outputs produced by the global South in comparison to the global North. Excluding China and India, in 2018 global North countries produced an average of more than 35,000 scientific and technical journal articles per country while global South countries produced 4,000 journal articles per country, out of which less than 2% made it to the top 1% most cited articles globally. This can be partially explained by the lower levels of investment and English proficiency, smaller relative populations of researchers, institutional exclusion factors and/or biases against Southern researchers when it comes to accepting their papers in top tier journals or awarding grants.

Despite all of the aforementioned challenges, there are a few Southern researchers who are able to achieve better outcomes than their peers. Such researchers could provide valuable insights and lessons that might help to better understand and even mitigate the current North–South divide in research outputs and citation. This blog post will highlight some of the valuable insights emerging from our recently published study that attempted to uncover publication-level and individual-level factors underlying the outperformance of information systems researchers in Egypt.

The Method

 This study employed the “data-powered positive deviance” (DPPD) methodology that uses digital datasets to identify positive deviants (those performing unexpectedly well in a specific outcome measure that is digitally recorded, mediated or observed) and potentially also to understand the characteristics and practices of those positive deviants (PDs) if digitally recorded.

Three main steps were conducted to identify and characterise PDs, as shown in Figure 1:

  • In the Define step, we defined our study population and the performance indicators that will be used to assign a score for each researcher. The study population comprised 203 information system researchers in Egyptian public universities. Six well-known citation metrics (h-index, g-index, hc-index, hi-index, aw-index and m-quotient) were calculated for each researcher using Publish or Perish and Google Scholar bibliometrics. Several citation metrics were used to avoid putting certain groups at a disadvantage due to factors such as the length of their research career, the size of their research departments, the age of their papers or their publication strategies.
  • The Determine step aims at identifying the PDs based on the scores calculated in the previous step. In this study, PDs or outliers were defined as researchers who significantly outperformed their peers in at least one of the six citation metrics. The interquartile (IQR) method was used to identify those outliers based on their deviation from the median, i.e. lying beyond the 1.5*IQR added to the third quartile in at least one of the six citation metrics.
  • The third step, Discover, consists of three main stages. In Stage 1, primary data was collected through in-depth interviews from a sample of PDs to explore practices, attitudes and attributes that might distinguish them from non-PDs. During Stage 2, the key findings from Stage 1 plus other predictors of research performance drawn from the literature were used to design a survey tool. That survey then targeted the whole population and tested if the proposed differentiators were significantly different between the two groups. Finally, in Stage 3, the Scopus database was used as the basis for analysis of researcher publications; extending and validating some of the findings identified in the previous stages.

Figure 1: Summary of the applied DPPD method

 What we found

 A combination of data sources (interviews, surveys, publications) and analytical techniques (PLS regression, topic modelling) were used to identify significant predictors of positively-deviant information system researchers. One of the key findings was that PDs contributed to the creation of roughly half (48%) of the publications and achieved nearly double (1.7x) the total number of citations of non-PDs despite representing roughly one-eighth (13%) of the study population. While there were significant predictors of outperformance that are structural (e.g. gender, academic rank and role, workplace perceptions), our focus in this post is on highlighting factors that are transferable i.e. practices and strategies that are to some extent within the control of the individual researchers. Table 1 provides a summary of such factors.

Individual-Level Predictors

 

Positive Deviants

Non-Positive Deviants

Travelling abroad to obtain their PhD degree

More PDs got their PhDs from global North countries 

Fewer non-PDs got their PhDs from global North countries

International research collaborations

Frequently part of multi-country research teams 

Seldom part of multi-country research teams

Co-authorship

Published more papers with foreign reputable authors

Published fewer papers with foreign reputable authors

Securing research grants and travel funds 

Secured more grants and travel funds

Secured fewer grants and travel funds

Research approach

Less inclined to do radical research

More inclined to do radical research

Student supervisions

Supervised a larger number of postgraduate students

Supervised a smaller number of postgraduate students

Capacity development  

More PDs took scientific writing and English writing courses

Fewer non-PDs took scientific writing and English writing courses

Publication-Level Predictors

Length of paper

Longer papers

Shorter papers

Length of abstract

Longer abstracts

Shorter abstracts

Length of title

Longer titles

Shorter titles

Number of authors and affiliations

More authors and affiliations

Fewer authors and affiliations

Number of references

More references

Fewer references 

Publication type

More journal articles and fewer conference papers

More conference papers and fewer journal articles

Quality of journals

Higher SJR journals

Lower SJR journals

Publishers

Published more in Elsevier Journals

Published less in Elsevier Journals

Topics

PDs publish fewer papers covering business process management and neural networks and published more papers in wireless sensor networks and hardware systems

Non-PDs publish more papers covering business process management and neural networks and published fewer papers in wireless sensor networks and hardware systems

 Table 1: Significant transferable predictors of outperformance

The analysis also included a visualization of topic prevalence over time for the PD corpus and non-PD corpus as presented in Figure 2. It shows topics, such as Classification Models, where PDs were early movers and then they were followed by NPDs. There is a greater prevalence of Expert Systems and GIS-related topics in the PD corpus in comparison to the NPD corpus. Conversely, there is lower prevalence of Neural Networks and Business Process Management & Process Mining. There are also topics that had very similar proportions over time for both groups, such as Social Network Mining.

Figure 2: Topic proportions of PD corpus (left) and non-PD corpus (right) over time

 Implications for practice and policy

This analysis cannot, of course, guarantee that applying these factors more broadly would lead to the same outcomes achieved by PDs. Nonetheless, there would be value in individual Southern researchers reflecting on the research- and paper-related behaviours that have been shown associated with positively-deviant research profiles. For instance, Southern researchers work in contexts of resource limitation, hence, research grants and travel funds are of outmost importance. Including partners from Northern universities (as PDs do) increases the chances of securing the funds as those partners are more familiar with grant procurement processes and more experienced in writing proposals. Studying abroad also seems to put Southern researchers at an advantage as it does not just equip them with the technical know-how and the degree needed to pursue their academic careers, but also helps them establish channels of collaboration with their supervisors and their PhD granting universities, long after they returned to their home countries. Those long standing relationships provide further access to research grants either directly or via joint grant applications.

In terms of paper-related strategies, Southern researchers could avoid low-visibility local conferences and can select journals instead as they are more likely to deliver citations. Publishing with more authors (domestic and international) could also help pay for journal publication fees, with fees split across more authors or paid from overseas sources. Publishing with foreign authors could also help Southern researchers overcome the institutional biases[4] among editors, reviewers in single-blind or open review systems, and readers. PDs’ preference for working on established research areas rather than on radical research topics may also help in relation to institutional barriers, with research that builds incrementally on existing ideas and literature being more likely to be accepted for publication by referees, and cited by others working in the established area. Hence, Southern researchers seeking more citations could consider contributing to mainstream topics that build on existing work. Along the same lines, having multiple authors and affiliations increases the likelihood of citations, as each author has their own network and bringing those networks together can increase readership. Similarly, publishing papers with a larger number of references increases paper visibility through citation-based search in databases that allow it, such as Google Scholar, and through the “tit-for-tat” hypothesis i.e. authors tend to cite those who cite them.[5]

Higher education institutions and higher education policy makers may also reflect on the findings, and consider strategic implications for training, resource provision, collaborations, etc. For example, English and scientific/formal writing courses were associated with PD performance; such courses could be prerequisites for starting a PhD research. There could be more academic training designed around research grant writing and providing guidance on funding bodies that researchers can apply to. International research collaborations appeared as an important predictor of PDs; so, university senior managers and policy makers can explore ways to reduce barriers and increase opportunities for overseas PhD study, post-PhD return, and ongoing joint research projects with global North universities.

Citation rates are, of course, not the “be all and end all” of research: there are and should be other motivations and indicators of research. However, we hope the findings presented here can provide valuable “food for thought” for global South researchers.

 ________ 

[1] The terms “South” and “Southern” will be used to refer to countries classified as upper-middle income, lower-middle income, and low income. Accordingly, the terms “North” and “Northern” will be used to refer to countries that are members of the OECD (Organisation for Economic Co-operation and Development) or are classified as high-income economies by the World Bank based on estimates of gross national income per capita.

[2] Blicharska, M., Smithers, R. J., Kuchler, M., Agrawal, G. K., Gutiérrez, J. M., Hassanali, A., Huq, S., Koller, S. H., Marjit, S., Mshinda, H. M., & Masjuki, H. (2017). Steps to overcome the North-South divide in research relevant to climate change policy and practice. Nature Climate Change, 7(1), 21–27.

[3] World Bank. (2020). Science & Technology Indicators. World Bank.

[4] Karlsson, S., Srebotnjak, T., & Gonzales, P. (2007). Understanding the North-South knowledge divide and its implications for policy: A quantitative analysis of the generation of scientific knowledge in the environmental sciences. Environmental Science and Policy, 10(7–8), 668–684.; Gibbs, W. W. (1995). Lost science in the third world. Scientific American, 273(2), 92–99.; Leimu, R., & Koricheva, J. (2005). What determines the citation frequency of ecological papers? Trends in Ecology & Evolution, 20(1), 28–32.

[5] Webster, G. D., Jonason, P. K., & Schember, T. O. (2009). Hot topics and popular papers in evolutionary psychology: Analyses of title words and citation counts in evolution and human behavior, 1979–2008. Evolutionary Psychology, 7(3), 147470490900700300.

 

Positive Deviance: A Data-Powered Approach to the Covid-19 Response

Nations around the world are struggling with their response to the Covid-19 pandemic.  In particular, they seek guidance on what works best in terms of preventive measures, treatments, and public health, economic and other policies.  Can we use the novel approach of data-powered positive deviance to improve the guidance being offered?

Positive Deviance and Covid-19

Positive deviants are those in a population that significantly outperform their peers.  While the terminology of positive deviance is absent from public discourse on Covid-19, the concept is implicitly present at least at the level of nations.  In an evolving list, countries like New Zealand, Australia, Taiwan, South Korea and Germany regularly appear among those seen as most “successful” in terms of their relative infection or death rates so far.

Here we argue first that the ideas and techniques of positive deviance could usefully be called on more directly; second that application of PD is probably more useful at levels other than the nation-state.  In the table below, we summarise four levels at which PD could be applied, giving potential examples and also potential explanators: the factors that underpin the outperformance of positive deviants.

Level Potential positive deviants Potential PD explanators
Nation[i] Countries with very low relative infection or death rates
  • Early lockdown
  • Extensive testing
  • Use of contact-tracing incl. apps
  • Cultural acceptance of mask-wearing
  • Prior mandatory TB vaccination
  • Quality of leadership
Locality (Regions, Cities)[ii] Cities and regions with significantly slower spread of Covid-19 infection than peers
  • Extensive or innovative community education campaigns
  • Testing well in excess of national levels
  • Earlier-than-national lockdown
  • Extensive sanitisation of public transport
  • Quality and breadth of local healthcare
  • Quality of leadership
Facility (Hospitals, Health Centres)[iii] Health facilities with significantly higher recovery rates than peers
  • Innovative use of existing (scarce) healthcare technologies / materials
  • Innovative use of new healthcare technologies: AI, new treatments
  • Level of medical staff expertise and Covid-19-specific training
Health facilities with significantly lower staff infection rates than peers
  • Provision of high-quality personal protective equipment in sufficient quantity
  • Strict adherence to infection monitoring and control measures
  • Strict adherence to high-quality disinfection procedures
  • Innovative use of contact-free healthcare technologies: chat bots, robots, interactive voice response, etc
Individual[iv] Individuals in vulnerable groups who contract full-blown Covid-19 and survive
  • Psychological resilience
  • Physical fitness
  • Absence of underlying health conditions
  • Effective therapies
  • Genetics

 

At present, items in the table are hypothetical and/or illustrative but they show the significant value that could be derived from identification of positive deviants and their explanators.  Those explanators that are under social control – such as use of technological solutions or policy/managerial measures – can be rapidly scaled across populations.  Those explanators such as genetics or pre-existing levels of healthcare capacity which are not under social control can be built into policy responses; for example in customising responses to particular groups or locations.

Evidence from positive deviance analysis can help currently in designing policies and specific interventions to help stem infection and death rates.  Soon it will be able to help design more-effective lockdown exit strategies as these start to show differential results, and as post-lockdown positive deviants start to appear.

However, positive deviance consists of two elements; not just outperformance but outperformance of peers.  It is the “peers” element that confounds the value of positive deviance at the nation-state level.

Public discourse has focused mainly on supposedly outperforming nations [v]; yet countries are complex systems that make meaningful comparisons very difficult[vi]: dataset definitions are different (e.g. how countries count deaths); dataset accuracy is different (with some countries suspected of artificially suppressing death rates from Covid-19); population profiles and densities are different (countries with young, rural populations differing from those with old, urban populations); climates are different (which may or may not have an impact); health service capacities are different; pre-existing health condition profiles are different; testing methods are different; and so on.  Within all this, there is a great danger of apophenia: the mistaken identification of “patterns” in the data that are either not actually present or which are just random.

More valid and hence more useful will be application of positive deviance at lower levels.  Indeed, the lower the level, the more feasible it becomes to identify and control for dimensions of difference and to then cluster data into true peer groups within which positive deviants – and perhaps also some of their explanators – can then be identified.

Data-Powered Positive Deviance and Covid-19

The traditional approach to identifying positive deviants has been the field survey: going out into human populations (positive deviants have historically been understood only as individuals or families) and asking questions of hundreds or thousands of respondents.  Not only was this time-consuming and costly but it also becomes more risky or more difficult or even impractical during a pandemic.

Much better, then, is to look at analysis of large-scale datasets which may be big data[vii] and/or open data, since this offers many potential benefits compared to the traditional approach[viii].  Many such datasets already exist online[ix], while others may be accessed as they are created by national statistical or public health authorities.

Analytical techniques, such as those being developed by the Data-Powered Positive Deviance project, can then be applied: clustering the data into peer groups, defining the level of outperformance needed to be classified as a positive deviant, identifying the positive deviants, then interrogating the dataset further to see if any PD explanators can be extracted from it.

An example already underway is clustering the 368 districts in Germany based on data from the country’s Landatlas dataset and identifying those which are outperforming in terms of spread of the virus.  Retrospective regression analysis is already suggesting structural factors that may be of importance in positive deviant districts: extent and nature of health infrastructure including family doctors and pharmacies, population density, and levels of higher education and of unemployment.

This can then be complemented in two directions – diving deeper into the data via machine learning to try to predict future spread of the disease; and complementing this large-scale open data with “thick data” using online survey and other methods to identify the non-structural factors that may underlie outperformance.  The latter particularly will look for factors under socio-political control such as policies on lockdown, testing, etc.

Of course, great care must be taken here.  Even setting aside deliberate under-reporting, accuracy of the most basic measures – cases of, and deaths from Covid-19 – has some inherent uncertainties[x].  Beyond accuracy are the broader issues of “data justice”[xi] as it applies to Covid-19-related analysis[xii], including:

  • Representation: the issue of who is and is not represented on datasets. Poorer countries, poorer populations, ethnic minority populations are often under-represented.  If not accounted for, data analysis may not only be inaccurate but also unjust.
  • Privacy: arguments about the benefits of analysing data are being used to push out the boundaries of what is seen as acceptable data privacy; opening the possibility of greater state surveillance of populations. As Privacy International notes, any boundary-pushing “must be temporary, necessary, and proportionate”[xiii].
  • Access and Ownership: best practice would seem to be datasets that are publicly-owned and open-access with analysis that is transparently explained. The danger is that private interests seek to sequester the value of Covid-19-related data or its analysis.
  • Inequality: the key systems of relevance to any Covid-19 response are the economic and public health systems. These contain structural inequalities that benefit some more than others.  Unless data-driven responses take this into account, those responses may further exacerbate existing social fracture lines.

However, if these challenges can be navigated, then the potential of data-powered positive deviance can be effectively harnessed in the fight against Covid-19.  By identifying Covid-19 positive deviants, we can spotlight the places, institutions and people who are dealing best with the pandemic.  By identifying PD explanators, we can understand what constitutes best practice in terms of prevention and treatment; from public health to direct healthcare.  By scaling out those PD explanators within peer groups, we can ensure a much-broader application of best practice which should reduce infections and save lives.  And using the power of digital datasets and data analytics, we can do this in a cost- and time-effective manner.

The “Data-Powered Positive Deviance” project will be working on this over coming months.  We welcome collaborations with colleagues around the world on this exciting initiative and encourage you to contact the GIZ Data Lab or the Centre for Digital Development (University of Manchester).

This blogpost was co-authored by Richard Heeks and Basma Albanna and was originally published on the Data-Powered Positive Deviance blog.

 

 

[i] https://interestingengineering.com/7-countries-keeping-covid-19-cases-in-check-so-far; https://www.forbes.com/sites/avivahwittenbergcox/2020/04/13/what-do-countries-with-the-best-coronavirus-reponses-have-in-common-women-leaders; https://www.maskssavelives.org/; https://www.bloomberg.com/news/articles/2020-04-02/fewer-coronavirus-deaths-seen-in-countries-that-mandate-tb-vaccine

[ii] https://www.weforum.org/agenda/2020/03/how-should-cities-prepare-for-coronavirus-pandemics/; https://www.wri.org/blog/2020/03/covid-19-could-affect-cities-years-here-are-4-ways-theyre-coping-now; https://www.fox9.com/news/experts-explain-why-minnesota-has-the-nations-lowest-per-capita-covid-19-infection-rate; https://www.bbc.co.uk/news/world-asia-52269607

[iii] https://hbr.org/2020/04/how-hospitals-are-using-ai-to-battle-covid-19; https://www.cuimc.columbia.edu/news/columbia-develops-ventilator-sharing-protocol-covid-19-patients; https://www.esht.nhs.uk/2020/04/02/innovation-and-change-to-manage-covid-19-at-esht/; https://www.med-technews.com/topics/covid-19/; https://www.innovationsinhealthcare.org/covid-19-innovations-in-healthcare-responds/; https://www.cnbc.com/2020/03/23/video-hospital-in-china-where-covid-19-patients-treated-by-robots.html; https://www.researchprofessionalnews.com/rr-news-new-zealand-2020-4-high-quality-ppe-crucial-for-at-risk-healthcare-workers/; https://www.ecdc.europa.eu/sites/default/files/documents/Environmental-persistence-of-SARS_CoV_2-virus-Options-for-cleaning2020-03-26_0.pdf

[iv] https://www.sacbee.com/news/coronavirus/article241687336.html; https://www.thelocal.it/20200327/italian-101-year-old-leaves-hospital-after-recovering-from-coronavirus; https://www.vox.com/science-and-health/2020/4/8/21207269/covid-19-coronavirus-risk-factors; https://www.medrxiv.org/content/10.1101/2020.04.22.20072124v2; https://www.bloomberg.com/news/articles/2020-04-16/your-risk-of-getting-sick-from-covid-19-may-lie-in-your-genes

[v] Specifically, this refers to the positive discourse.  There is a significant “negative deviant” discourse (albeit, again, not using this specific terminology) that looks especially at countries and individuals which are under-performing the norm.

[vi] https://www.bbc.co.uk/news/52311014; https://www.theguardian.com/world/2020/apr/24/is-comparing-covid-19-death-rates-across-europe-helpful-

[vii] https://www.forbes.com/sites/ciocentral/2020/03/30/big-data-in-the-time-of-coronavirus-covid-19; https://healthitanalytics.com/news/understanding-the-covid-19-pandemic-as-a-big-data-analytics-issue

[viii] https://doi.org/10.1002/isd2.12063

[ix] E.g. via https://datasetsearch.research.google.com/search?query=coronavirus%20covid-19

[x] https://www.medicalnewstoday.com/articles/why-are-covid-19-death-rates-so-hard-to-calculate-experts-weigh-in; https://www.newsletter.co.uk/health/coronavirus/coronavirus-world-health-organisation-accepts-difficulties-teasing-out-true-death-rates-covid-19-2527689

[xi] https://doi.org/10.1080/1369118X.2019.1599039

[xii] https://www.opendemocracy.net/en/openmovements/widening-data-divide-covid-19-and-global-south/; https://www.wired.com/story/big-data-could-undermine-the-covid-19-response/; https://www.thenewhumanitarian.org/opinion/2020/03/30/coronavirus-apps-technology; https://botpopuli.net/covid19-coronavirus-technology-rights

[xiii] https://privacyinternational.org/examples/tracking-global-response-covid-19; see also https://globalprivacyassembly.org/covid19/