Blavatnik Index of Public Administration

Any analytical exercise needs to carefully consider the quality of its data, this is particularly important when combining data from different data sources. This section explores country data quality, in particular data availability for countries, to determine inclusion in the Index’s calculations.

Of the 216 countries and territories which have data for at least one of the 86 metrics used for the global index, only one country (Mexico) has data for all 86 metrics, as a result an approach is needed to consider a country’s data coverage for determining its inclusion in the calculation of the Index.

A simple approach for determining coverage would be to consider simply the total number of metrics a country has data for (\(m_c\)) as a proportion of the total number of possible metrics (\(M\)).

\[ p_m = \frac{m_c}{M} \]

However, this approach does not consider how a country’s data is spread across the structure of our conceptual framework and data model – a country with a higher proportion of metrics concentrated in fewer themes would do better on this simple measure than a country with a lower proportion of metrics spread across more themes. As described in section 6 missing data is not subject to imputation, good coverage of the data model is therefore an important consideration to ensure that the implicit estimate of missing data will come from “sibling” components in the data model.

Based on the chi-square test we have developed an algorithm for assessing a country’s data availability. This algorithm assesses both the volume of metrics a country has as well as the spread of that data over the data model. For each country-theme combination, an information quotient is calculated as the square of the difference between the total number of metrics a country has for that theme (\(m_{t,c}\)) and the number of indicators a country has for that theme (\(i_{t,c}\)) divided by the overall number of indicators in that theme (\(i_t\)). The overall data coverage score for a country (\(Q_c\)) is calculated as the sum of these country-theme information quotients adjusted for overall coverage across the themes, calculated as the number of themes a country has data for (\(t_c\)) divided by overall number of themes (\(T\)).

\[ Q_c= \displaystyle\sum_{t=1}^{t_c} \frac{(m_{t,c}-i_{t,c}+1)^2}{i_t} \]

Having calculated the data coverage algorithm, we can set criteria to determine inclusion in the calculation of the Index and its components. Two principal criteria were selected :

  • The overall data coverage score (\(Q_c\)) is greater than or equal to half its theoretical maximum value (i.e. the score a country would achieve if it has data for all metrics). The maximum data coverage score is 161.08, the threshold value is therefore 80.54.
  • While the overall score does take into consideration the number of metrics, the overall percentage of metrics a country has (\(p_m\)) is also retained as a secondary threshold. It was decided to only include countries and territories with at least 2/3rds of overall metrics.

These criteria give rise to a selection of 120 countries and territories1.

The inclusion criteria provide a binary state of whether countries are included or not however we can also devise a grading scheme using the same measures to determine inclusion to give a better sense of an individual country’s data coverage. For countries and territories above thresholds for inclusion in the Index the lower quartile, median and upper quartile of their data coverage score and percentage of metrics are used to define the boundaries for the first four grades (A-D); for countries and territories below the thresholds for inclusion in the Index the median of their data coverage score and percentage of metrics is used to define the boundary between the fifth and sixth grade (E-F).

Summary of data coverage metrics by grade
Data coverage gradedata coverage scorePercent of metricsNumber of countries
Countries included in the Index
A\(Q_c\) ≥ 127.50\(p_m\) ≥ 90.24%30
B132.00 > \(Q_c\) ≥ 116.1790.24% > \(p_m\) ≥ 85.37%27
C123.83 > \(Q_c\) ≥ 103.8386.59% > \(p_m\) ≥ 80.49%28
D108.33 > \(Q_c\) ≥ 80.5482.93% > \(p_m\) ≥ 66.67%35
Countries not included in the Index
E\(Q_c\) ≥ 53.25\(p_m\) ≥ 48.78%45
F53.25 > \(Q_c\) > 048.78% > \(p_m\) > 0%46
Ungraded (no data)\(Q_c\) = 0\(p_m\) = 0%43
Summary of country coverage by region and income groups
Countries included in the indexUN members not included in the indexOther entities not included in the index
All countries and territories1207461
Geographic region
Americas221322
Asia and Pacific181918
Eastern Europe2021
Middle East, North Africa & Central Asia15144
Sub-Saharan Africa29197
Western Europe1678
Income groups
High income392124
Low income1313
Lower middle income3321
Upper middle income35181
No World Bank income classification136

 

Use the table below to find the data coverage results for individual countries.

 


  1. Hong Kong appears separately in many of the data sources used to calculate the Blavatnik Index and while it was above the thresholds for inclusion with a data coverage score of 98.9 and 74.4% of total metrics, it has not been included as it is a territory China rather than a separate national entity. ↩︎