Country Similarity Index 2.0

The Country Similarity Index has been revised to reflex feedback received on this project. The goal is to more closely align with people’s intuitions. Certain aspects of the index have been given more weight, while other aspects have been added and others have been deleted. Still, these changes in most cases should not greatly affect the outcome of which countries are most and least similar to other countries. The maps and articles will be updated to the new version periodically.

Demographics: More weight has been given to race, language family, and religious origin, which are more obvious differences. Since variance of height, weight, and gender ratio between countries are less perceivable than other aspects, they have been given less weight. Household size was removed since it highly correlates with the number of children in a family. Population above 65 years old took its place to act as the other end of the age extreme from children.

Culture: This section had the most changes. The language portion of culture was given a much greater weight than in the previous version. This was consistently the biggest complaint from feedback received. Before, the native language of the people was not accounted for in the culture section, only in demographics, which was a big mistake. In addition, the official language of the country is given a greater weight, but the language family of the official language is no longer considered. Previously this had been making countries like Bolivia and Congo more similar than reality, since the fact that their official languages are part of the same language family as less to do with culture than if their native languages were part of the same language family. This change also helps countries like Canada be more like other countries that have French as their official language, even though a relatively low percentage of the people speak it there. Marriage and divorce rates were moved to the culture section from the demographic section, although they could have been placed in either. The sports section was reworked to give less weight to the sporting success of countries and more weight on the sports they are interested in, since sporting success is highly correlated with economics. Diet is also highly correlated with economics. The previous version compared amounts of food, which is not necessarily cultural. More emphasis was given on the kinds of meat, fruits, vegetables, and staple foods eaten, not the amount. Finally, the writing script was given a slightly greater weight. Charitable activity, coffee consumption, and tea consumption needed to be removed to make way for greater emphasis on language.

Politics: This section did not change a lot, but there were a few minor improvements. The amount of democracy was given a slightly greater weight. The head of state type was integrated into the executive type, to give the form of government less weight. Gambling, paid leave laws, and the right to abode were added. Immigration rate, which does not have a huge amount to do with politics, was removed.

Technology: This section also did not change a lot, but there were a few minor improvements. Container port traffic was added to create a greater difference between landlocked countries and countries with large seaports. The data on police officers per capita is not necessarily comparable between countries, so it was removed in favor of military per capita, which seems like a better choice anyway. Statistics on radio broadcasting was removed since there is not a massive difference between countries. Favorite websites was also reluctantly removed. A slightly greater weight was given to countries using the same power grid with the same frequency.

Geography: Previously, two countries with geographies that are vastly different could look more similar than in reality if they both did not have a lot of agricultural land, for instance Solomon Islands and Saudi Arabia. The same goes for countries that do not have a lot of forested land, like Bangladesh and Argentina. They are quite different, but if you only look at this aspect, they are similar. Instead of looking at farmland percentage and forested percentage individually, it is better to look at their land cover mix. The absence of something should not always imply they are similar. In addition, urbanization, air pollution, and light pollution were removed, while the weighting on population density was increased.

Do you agree with the changes?
Please leave any thoughts in the comments section.


  1. So, will you now republish all countries from the beginning? I hope you won’t delete the old calculations; they are still interesting and deserve attention. What will the URLs look like? Will show the old calculation and the new calculation?

  2. I have used the Country Similarity Index 1.0 data to draw a schematic map of the world where similar countries are connected. I started with the most similar pair of countries (i.e. Netherlands — Belgium) and then I listed less similar pairs only if one of the countries hadn’t been included in the map, or if it was necessary to link two clusters that had been separate. (This was to prevent the map from becoming a confusing mess of lines where one country is linked to dozens of thers.)
    The map:
    The source code: (the Neato program was used to draw the map from it)
    Frankly I did it in a slight haste and I am afraid I might have made mistakes. I hope I will have time to do a revision.

    1. Very nice work. I was planning on doing something like this at the end of the process, but you beat me to it. My methodology might be slightly different, attaching a country to its two most similar pairs, or to the most similar country and the most similar country that it is more similar to than the 1st country (ex: USA is not more similar to Australia than Canada, so they wouldnt be attached.) Also thinking about doing a hierarchical clustering analysis as well using the average linkage method.
      Do you mind if I repost your work? Thanks!

      1. Sorry, I was impatient. 🙂 I will be glad if you repost my scheme and I am looking forward to yours at the end of the process.

      2. I made a few other charts:

        Each of the 127 countries you published is connected with its most similar country: This creates a lot of isolated groups.

        Each of the 127 countries you published is connected with its two most similar countries: Some of the groups got connected.

        The 358 most similar pairs of countries are connected (i. e. all pairs whose degree of similarity is 79,9 or bigger): Now it is a bit messy.

        The source codes:

      3. Nice, very interesting! 2.0 is coming out soon and from what I’ve seen it is an improvement. Doesn’t change the order drastically though

Leave a Reply