Archive

Posts Tagged ‘Digital Divide’

Measuring the Big Data Knowledge Divide Using Wikipedia

Big data is of increasing importance; yet – like all digital technologies – it is affected by a digital divide of multiple dimensions. We set out to understand one dimension: the big data ‘knowledge divide’; meaning the way in which different groups have different levels of knowledge about big data [1,2].

To do this, we analysed Wikipedia – as a global repository of knowledge – and asked: how does people’s knowledge of big data differ by language?

An exploratory analysis of Wikipedia to understand the knowledge divide looked at differences across ten languages in production and consumption of the specific Wikipedia article entitled ‘Big Data’ in each of the languages. The figure below shows initial results:

  • The Knowledge-Awareness Indicator (KAI) measures the total number of views of the ‘Big Data’ article divided by total number of views of all articles for each language (multiplied by 100,000 to produce an easier-to-grasp number). This relates specifically to the time period 1 February – 30 April 2018.
  • ‘Total Articles’ measures the overall number of articles on all topics that were available for each language at the end of April 2018, to give a sense of the volume of language-specific material available on Wikipedia.

‘Big Data’ article knowledge-awareness, top-ten languages*

ko=Korean; zh=Chinese; fr=French; pt=Portuguese; es=Spanish; de=German; it=Italian; ru=Russian; en=English; ja=Japanese.
Note: Data analysed for 46 languages, 1 February to 30 April 2018.
* Figure shows the top-ten languages with the most views of the ‘Big Data’ article in this period.
Source: Author using data from the Wikimedia Toolforge team [3]

 

Production. Considering that Wikipedia is built as a collaborative project, the production of content and its evolution can be used as a proxy for knowledge. A divide relating to the creation of content for the ‘Big Data’ article can be measured using two indicators. First, article size in bytes: longer articles would tend to represent the curation of more knowledge. Second, number of edits: seen as representing the pace at which knowledge is changing. Larger article size and higher number of edits may allow readers to have greater and more current knowledge about big data. On this basis, we see English far ahead of other languages: articles are significantly longer and significantly more edited.

Consumption. The KAI provides a measure of the level of relative interest in accessing the ‘Big Data’ article which will also relate to level of awareness of big data. Where English was the production outlier, Korean and to a lesser extent Chinese are the consumption outliers: there appears to be significantly more relative accessing of the article on ‘Big Data’ in those languages than in others. This suggests a greater interest in and awareness of big data among readers using those languages. Assuming that accessed articles are read and understood, the KAI might also be a proxy for the readers’ level of knowledge about big data.

We can draw two types of conclusion from this work.

First, and addressing the specific research question, we see important differences between language groups; reflecting an important knowledge divide around big data. On the production side, much more is being written and updated in English about big data than in other languages; potentially hampering non-English speakers from engaging with big data; at least in relative terms. This suggests value in encouraging not just more non-English Wikipedia writing on big data, but also non-English research (and/or translation of English research) given research feeds Wikipedia writing. This value may be especially notable in relation to East Asian languages given that, on the consumption side, we found much greater relative interest and awareness of big data among Wikipedia readers.

Second, and methodologically, we can see the value of using Wikipedia to analyse knowledge divide questions. It provides a reliable source of openly-accessible, large-scale data that can be used to generate indicators that are replicable and stable over time.

This research project will continue exploring the use of Wikipedia at the country level to measure and understand the digital divide in the production and consumption of knowledge, focusing specifically on materials in Spanish.

References

[1] Andrejevic, M. (2014). ‘Big Data, Big Questions |The Big Data Divide.’ International Journal of Communication, 8.

[2] Michael, M., & Lupton, D. (2015). ‘Toward a Manifesto for the “Public Understanding of Big Data”.’ Public Understanding of Science, 25(1), 104–116. doi: 10.1177/0963662515609005

[3] Wikimedia Toolforge (2018). Available at: https://tools.wmflabs.org/

Advertisements

From Digital Divide to Digital Provide: Spillover Benefits to ICT4D Non-Users

31 August 2011 5 comments

ICTs bring benefits to those who have them and not to those who don’t. They therefore increase inequality.  Right?  Well . . . let’s see.

First question: what do you mean by “those who don’t have ICTs”?

We need something a bit more nuanced than a simple, binary digital divide, and can use instead a digital divide stack of four categories (see figure below):

Non-Users: those who have no access to either ICTs or ICT-based information and services.

Indirect Users: those who do not get hands-on themselves, but gain access to digital information and services via those who are direct users.

Shared Users: those who do not own the technology, but who directly use ICT owned by someone else (a friend, workplace, ICT business, community, etc).

Owner-Users: those who own and use the technology

Of course we would need to make transverse slices through the figure; potentially, one slice for each different type of ICT, but particularly noting many in developing countries would be in a different category level for mobiles compared to the Internet.

 

Second question: what’s the evidence on inequality?

It is relatively limited and often bad at differentiating which digital divide categories it’s talking about.  However, we can find three types of evidence.

The Rich Get Richer; The Poor Get Poorer: situations in which some category of user gains a benefit from ICT while non-users suffer a disbenefit.  For example, micro-producers of cloth in Nigeria who owned or had use of a mobile phone found they were gaining orders and income; micro-producers without mobile phone access found they were losing orders and income (to those who had phones). (See also work on growing costs of network exclusion.)

Development vs. Stasis: situations in which some category of user gains a benefit from ICT while non-users do not gain that benefit. For example, farmers in rural Peru who used a local telecentre were able to introduce improved agricultural practices and new crops, which increased their incomes.  Those who did not use the telecentre just continued farming in the same way as previously.

Spillover Benefits: situations in which some category of user gains a benefit from ICT while non-users also gain a (lesser) benefit.  One rather less-publicised outcome from the case of Keralan fishermen using mobile phones to check market prices is an example.  Those fishermen without mobile phones saw their profit rise by an average Rs.97 (c.US$2) per day as a result of the general improvements in market efficiency and reduced wastage which phones introduced.  This was about half the profit increase seen by phone owners and meant, even allowing for the additional costs, that returns to phone ownership were greater than those for non-ownership.  However, it was a spillover benefit to non-ICT-users.

ICT4D research on spillovers to non-users specifically has been rare, with the main interests in non-users being to understand why they are non-users; and most spillover work being done between sectors or enterprises and/or focusing on the spillover of encouraging ICT adoption rather than more immediate benefits.

This does seem to be changing, perhaps because of the growth of mobile and related to earlier work on the externalities to non-users of arrival of rural telecommunications.  Rob Jensen’s Kerala study found a second digital spillover: while fishermen’s revenues rose, the price per kg fell due to the increase in supply arising from less waste.  Fish consumers (many likely non-users) now paid less than previously thanks to the mobile-induced efficiency gains.  More directly, a study of M-PESA’s community effects in Kenya found its use providing positive financial, employment, security and capital accumulation externalities that affected both users and non-users within the community.

We also have a little evidence of spillover benefits from owner-users to indirect users:

Follow-up work with Keralan fishermen found fish workers who will only get into a boat with a mobile phone-owner due to safety concerns, with these indirect users able to benefit from the owner should the boat get into difficulties.  That paper’s author (personal email) also gives the example of an indirect user citing as a benefit being informed of – and able to curtail – his daughter’s illicit elopement via his boat owner’s phone.

– Research on farmers in Northern Ghana[1] found those who did not themselves own or use mobiles benefitting from information passed on from phone owners, including more frequent meetings with agricultural extension officers; meetings that were coordinated by phone owners.

In all these cases, owner-users are benefitting more than the lower-category users to whom benefits spill over.  That means – if you’ll forgive the pun – that in these cases ICTs are causing all boats to rise but the ICT-using boats to rise somewhat faster.  Inequality may still grow; perhaps absolutely but not relatively.

I look forward to what appears to be forthcoming work by the Global Impact Study on non-user spillovers.  However, this remains a poorly-understood and little-researched issue; one that needs a greater focus since it is central to understanding the digital divide and digital inequalities.  It also has implications for practice; suggesting ICT4D projects should promote non-user spillovers as much as they promote ICT usage.  As ever, your pointers to spillover research and practice are welcome.


[1] Smith, M. (2010) A Technology of Poverty Reduction for Non-Commercial Farmers? Mobile Phones in Rural North Ghana, BA dissertation, unpublished, University of Oxford, UK

%d bloggers like this: