Repo

Reference Phabricator tickets

Current update

  • Complete datasets as of 2021/09/30.

Feedback.

  • Feedback should be send to goran.milovanovic_ext@wikimedia.de.

Summary

In this Report we consider the data on (a) the number of Wikidata active editors (separately in the items namespace and all namespaces), and (b) the number of Wikidata edits (also taken separately in the items namespace and all namespaces). We analyse the data from 2020/09 to 2021/08.

We contrast the data for the countries belonging to the Global North and Global South, following the Brandt Line as the operational defition of the regions.

The most important findings are:

  • On the level of Global North vs Global South comparison, the contributions to Wikidata from the countries of the Global North overally suprass those made on behalf of editors from the countries in the Global South.

  • On the fine-grained, per-country analyses, however, we show that the Global South has a more positive dynamics in terms of their contributions to Wikidata: a higher percent of countries in the Global South is expanding the extent of their contributions, while a higher percent of countries in the Global North is narrowing the extent of their contributions to Wikidata. This finding was replicated for the number of active editors as well as for the number of edits made from 2020/09 to 2021/08, across all namespaces as well as in the items namespace (NS:0) separately. The analysis of the dynamics of the Wikidata contributions from the Global North and Global South show that the Global South was more agile in the time span of this analysis (2020/09 - 2021/08).

  • In our analysis we were able to identify a number of countries whose extent of contribution to Wikidata could be considered as critical in a sense of showing a negative dynamics over the time span (2020/09 - 2021/08) considered in this Report; those countries are singled out in the respective tables.

0. Data Acquisiton

NOTE A. The Data Acquisition code is not fully reproducible from this Report. The data are collected by running the WD_GlobalSouth_202109.R script from WMF Analytics Client(s), collecting the data as .csv files from the WMF Data Lake.

NOTE B. The countries from the WMF Country Protection List were removed from the datasets.

1. Active Editors

1.1 Active Editors Dataset

The Active Editors dataset presents data on the number of active editors per country. The datasets encompasses the following fields:

  • id - the observation ID
  • country_code - the country ISO 3166 alpha-2 code
  • month - the observation year and month in the YYYY-MM format
  • active_editors - number of active editors expressed in intervals: 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, > 100
  • active_editors_ns0 - number of active editors in the content namespaces (NS:0) expressed in intervals: 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, > 100

In the next step, we enrich the Active Editors dataset by adding (a) country names, and (b) including information on Global North/Global South classification (see List of countries by regional classification). The Brandt Line was used to establish the Global North vs Global South dichotomy in cases not immediately evident from the data.

1.2 Active Editors Analysis

We find a total of 184 countries in the analysis, of which 62 belong to the Global North and 122 to the Global South.

1.2.1 Active Editor Classes

The following chart shows the percent of countries in the Global North and Global South, in each month from September 2020 to August 2021, which were found in the respective categories of user activity:

The contrast is obvious: consistently over time, most of the countries in the Global North maintain > 100 active editors monthly, while most of the countries in the Global South maintain 1 - 10. As it can be seen from the left panel in the chart, the runner-up class of active editors in Global North are the countries that maintain 1 - 10 editors monthly.

The following chart shows the same data except for now we consider only editors active in the items namespace (NS:0).

As we can see, the result is (qualitatively, and almost fully) replicated when looking into the items namespace only.

1.2.2 Average Active Editors Rank

The active editors classes - 1-10, 11-20, 21-30, 31-40, 41-50, 51-60, 61-70, 71-80, 81-90, 91-100, > 100 - can be described as active editors ranks by assigning the rank of 11 to > 100, 10 to 91-100, 9 to 81-90, .., 1 to 1-10. The following chart depicts the change of the average active editors rank with time for the countries of the Global North and Global South.

We can see how the countries of Global South have a lower average active editors rank, consistently in time. In the following chart we replicate the finding considering edits in the items namespace (NS:0) only.

1.2.2 Editor Activity Monthly Change Index

1.2.2.1 In all namespaces

Looking at each country over time, we can track the change in its editor activity in the following way. For example, a country that was in the > 100 active editors category in September 2020 and in the 90 - 100 active editors category in October 2020 has changed its active editors rank for -1. Vice versa, the country that moved from 90 - 100 to > 100 has changed for +1; the country that moved from 1 - 10 to 41 - 50 has changed for +3, etc. In the following chart we take the average monthly change - as described - per month, and contrast Global North vs Global South over time.

There is no obvious pattern in the average rank of active editors class for either Global North or Global South countries. Both Global North and Global South countries show similar osculations over time in that respect, and thus it cannot be said that the Global South countries exhibit a pattern of positive change in that respect from this data alone.

However, we have noticed large variations across countries in their change in the average rank of active editors class over time. A per-country, finer-grain analysis shows the following results:

    1. Some countries of the Global South exhibited the highest (positive) change in average rank of active editors class in the previous 12 months (from 2020/09 to 2021/08, the time span of this analysis);
    1. Also, some other countries of the Global South are positioned at the very bottom of the list in that respect; finally,
    1. Most of the countries encompassed by this analysis tend to have a zero change of average rank of active editors classes.

The per-country data are given in the following table:

If we split the countries into (a) those with the positive change in average rank of active editors class, (b) those with the negative change in average rank of active editors class, and (c) those with the zero change in average rank of active editors class, and cross-tabulate that information with Global North vs Global South, we find out that 16.95% of countries in the Global North and 13.54% of countries in the Global South had an overall negative change in the previous twelve months, while 5.09% of countries in the Global North and 23.96% of countries in the Global South had an overall positive change in the previous twelve months.

We can conclude that a large percent (23.96%) of the countries in the Global South were expanding the extent of their contributions to Wikidata, in general, but also that not a small percent of them (13.54%) were contributing less and less in general. Taking the perspective of the Global South, the contributions from the following countries seem to be critical in a way; all of them had a negative change in the average rank of active editors class from 2020/09 to 2021/08:

We now repeat the whole Editor Activity Monthly Change Index analysis for editor activity in the items namespace (NS:0).

1.2.2.2 In items (NS:0) namespace only

In the following chart we take the average monthly change - as described - per month, and contrast Global North vs Global South over time, for active editors in the items (NS:0) namespace only.

Again, no obvious pattern emerges from the analysis of the aggregated data. Now we take a dive into the per-country analysis.

The per-country data are given in the following table:

Again, split the countries into (a) those with the positive change in average rank of active editors class, (b) those with the negative change in average rank of active editors class, and (c) those with the zero change in average rank of active editors class, and cross-tabulate that information with Global North vs Global South:

The results in the items namespace are quite similar to those from all namespaces. The list of countries in the Global South with the highest negative change in the average rank of active editors class from 2020/09 to 2021/08 in the items (NS:0) namespace follows.

2. Edits

2.1 Edits Dataset

The Edits dataset presents data on the number of revisions made per country. The datasets encompasses the following fields:

  • id - the observation ID
  • country_code - the country ISO 3166 alpha-2 code
  • month - the observation year and month in the YYYY-MM format
  • edits - number of active editors expressed in intervals: 1-100, 101-200, 201-300, 301-400, 401-500, 501-600, 601-700, 701-800, 801-900, 901-1000, > 1000
  • edits_ns0 - number of active editors in the content namespaces (NS:0) expressed in intervals: 1-100, 101-200, 201-300, 301-400, 401-500, 501-600, 601-700, 701-800, 801-900, 901-1000, > 1000

In the next step, we enrich the Editors dataset by adding (a) country names, and (b) including information on Global North/Global South classification (see List of countries by regional classification). The Brandt Line was used to establish the Global North vs Global South dichotomy in cases not immediately evident from the data.

2.2 Edits Analysis

We find a total of 184 countries in the analysis, of which 62 belong to the Global North and 122 to the Global South.

2.2.1 Edit Classes

The following chart shows the percent of countries in the Global North and Global South, in each month from September 2020 to August 2021, which were found in the respective edit categories:

The result qualitatively replicates the finding obtained from the respective analysis of active editor classes.

The following chart shows the same data except for now we consider only edits made in the items namespace (NS:0).

As we can see, the result is (qualitatively, and almost fully) replicated when looking into the items namespace only.

2.2.2 Average Edits Rank

The analysis follows the same logic as explained in section 1.2.2 except for the difference in the intervals used for the edit classes.

In the following chart we replicate the finding considering edits in the items namespace (NS:0) only.

2.2.2 Editor Activity Monthly Change Index

2.2.2.1 In all namespaces

Again, the analysis follows the same logic is exemplified in the respective analysis of the active editors data.

Again, no obvious pattern is present in the analysis of aggregate data. We take a look at per-country data now:

Now we split the countries into (a) those with the positive change in average rank of active editors class, (b) those with the negative change in average rank of active editors class, and (c) those with the zero change in average rank of active editors class, and cross-tabulate that information with Global North vs Global South:

Taking the perspective of the Global South, the contributions from the following countries seem to be critical in a way; all of them had a negative change in the average rank of edit class from 2020/09 to 2021/08:

We now repeat the whole Edits Monthly Change Index analysis for edits in the items namespace (NS:0).

2.2.2.2 In items (NS:0) namespace only

In the following chart we take the average monthly change - as described - per month, and contrast Global North vs Global South over time, for active editors in the items (NS:0) namespace only.

Again, no obvious pattern emerges from the analysis of the aggregated data. The per-country data are given in the following table:

Again, split the countries into (a) those with the positive change in average rank of active editors class, (b) those with the negative change in average rank of active editors class, and (c) those with the zero change in average rank of active editors class, and cross-tabulate that information with Global North vs Global South:

The results in the items namespace are quite similar to those from all namespaces. The list of countries in the Global South with the highest negative change in the average rank of edit class from 2020/09 to 2021/08 in the items (NS:0) namespace follows.

