The Country Similarity Index attempts to quantify how similar countries are to each other relative to other countries. The index is a statistically-based way to measure this. It weighs equally five major aspects of countries: their demographics, culture, politics, infrastructure, and geography. The methodology is exactly the same for each country.
The data from the Country Similarity Index was used to cluster countries into different regions by average linkage. This resulted in 9 distinct macro-regions:
- Western World
- Central & South America
- Middle East & North Africa
- Sub-Saharan Africa
- Central Asia
- South Asia
- East Asia
- Southeast Asia
- South Pacific
Most countries neatly fit into one of these nine regions. However, there are a few countries that could easily be categorized into two different regions. Average linkage clustering does not do a good job of showing this. Averaging the amount of similarity a country has to each overall region gives a fuller picture. The following table shows the 10 most difficult countries to classify.
|Rank||Country||Region 1||Region 2||Difference|
|1||YEMEN||Sub-Saharan Africa||52.9||Middle East||66.3||-13.4|
|2||MAURITANIA||Sub-Saharan Africa||59.9||Middle East||64.1||-4.2|
|3||ARMENIA||Western World||61.0||Central Asia||63.6||-2.6|
|4||AFGHANISTAN||Middle East||58.0||Central Asia||59.7||-1.7|
|5||SUDAN||Sub-Saharan Africa||60.0||Middle East||61.5||-1.5|
|6||PAKISTAN||Middle East||59.3||South Asia||60.7||-1.4|
|7||PHILIPPINES||Southeast Asia||61.4||South Pacific||61.6||-0.2|
|8||HAITI||Sub-Saharan Africa||60.7||C & S America||60.9||-0.2|
|9||SINGAPORE||Southeast Asia||56.6||East Asia||56.5||0.1|
|10||DJIBOUTI||Sub-Saharan Africa||60.1||Middle East||59.8||0.3|
As written about in an article comparing the Country Similarity Index map compared to the vast majority of world regional maps created by academics, average linkage clustering of the Country Similarity Index seemed to misclassify Yemen. Sure enough, when the data was analyzed, Yemen was by far the biggest aberration. The Index shows it has much more in common with the average Middle Eastern country than the average Sub-Saharan country. This is likely simply a fault of the average linkage clustering method. Since Yemen has a lot in common with Mauritania and Sudan, it got grouped in with African countries unjustly.
Mauritania is classified as a Middle Eastern in some world regional maps, while it is classified as a Sub-Saharan African country in other maps. Although Mauritania clustered with Sub-Saharan Africa, the Index suggests it has slightly more in common with the average Middle Eastern country. On one hand it is a mostly Muslim country that has Arabic as its official language, located in the Sahara Desert. On the other hand, like Sub-Saharan Africa, its infrastructure is quite poor and the quality of life there is low. Its people have a mix of Sub-Saharan African and Caucasian ancestry.
Average clustering grouped Armenia with its most similar country, Georgia, along with the rest of the Western World. However, the Index suggests it has slightly more in common with the average Central Asian country. It is another landlocked country located between Europe and Asia that was once part of the Soviet Union, like many other Central Asian countries. However, unlike most Central Asian countries, its people speak an Indo-European language and are mostly Christian. Armenia also has some traits that do not fit into either region, since it uses a unique alphabet and most of its people belong to the rare Oriental Orthodox Christian denomination.
Afghanistan has actually been commonly grouped into three different regions of the world in maps created by academics: the Middle East, Central Asia, and even South Asia. Average clustering grouped Afghanistan with the Middle East. However, the Index suggests it has ever so slightly more in common with the average Central Asian country. Afghanistan is more religious and has far more conservative laws than most Central Asian countries. It also uses the Arabic script, while Central Asian countries use the Cyrillic or Latin alphabet. Still, it geography more closely resembles Central Asia than the Middle East, since it is landlocked and has a significantly cooler climate. Furthermore, its most similar country, Tajikistan, is classified as Central Asian.
Like Mauritania, Sudan is classified as a Middle Eastern in some world regional maps, while it is classified as a Sub-Saharan African country in other maps. Although Sudan clustered with Sub-Saharan Africa, the Index suggests it has slightly more in common with the average Middle Eastern country. On one hand it is a mostly Muslim country that has Arabic as its official language, located in the Sahara Desert. On the other hand, like Sub-Saharan Africa, its infrastructure is not well developed and the quality of life there is poor. Its people have a mix of Sub-Saharan African and Caucasian ancestry.
Pakistan was clustered with the Middle East, despite India being its most similar country. The index suggests that Pakistan is only slightly more similar to the average South Asian country than the average Middle Eastern country. This makes sense as Pakistan is a transitional zone between these two regions. On one hand, Pakistan is mostly Muslim and uses the Arabic script. Furthermore, it is mostly desert. Still, like most countries in South Asia, most of its people speak Indic languages. Furthermore, their infrastructure is similar since they drive on the left side of the road and have Type D electrical outlets. Like Pakistan, South Asians also are more interested in cricket than soccer.
The Philippines is almost always classified as a Southeast Asian country on world regional maps created by academics. Average linkage clustering also grouped the country in with Southeast Asia. However, the Philippines has a lot in common with countries in the South Pacific as well. In fact, its similarity was almost the same for the average country in both the Southeast Asia and South Pacific regions. Like many countries in the South Pacific, English is one of the official languages. Furthermore, its people are mostly Christian, unlike other countries in Southeast Asia. On the other hand, like most Southeast Asian countries, its people have ancestry that originates in East Asia. Furthermore, their agriculture is similar since countries in this region tend to grow and eat a lot of rice.
Haiti has traits of both Sub-Saharan Africa and Latin America. The Index shows it has about the same amount in common with the average Sub-Saharan African country as the average country in South and Central America. Haiti is located in the Americas and its people natively speak Romantic languages, like countries in Latin America. Still, like Africa, its people have mostly African ancestry. Furthermore, the country’s infrastructure is not well developed and the quality of life is quite low there. Since many Sub-Saharan African countries were colonized by France, they have French as an official language as well.
Singapore is surrounded by Southeast Asian countries, but it also has a lot of characteristics in common with East Asian countries. The data from the index shows that the average Southeast Asian country and the average East Asian county have nearly the same amount of similarity to Singapore. Most of Singapore’s population is ethnically Chinese, like China and Taiwan. Furthermore, its economy is much more prosperous and its infrastructure is more developed than most Southeast Asian countries. On the other hand, like most Southeast Asian countries it has a tropical climate and its average temperature is far warmer than East Asian countries.
Djibouti is similar to Mauritania and Sudan, since it is on the border between the Middle East and Sub-Saharan Africa. It is almost always classified as a Sub-Saharan African country in world regional maps. The data suggests it has a lot in common with the average Middle Eastern country as well. On one hand it is a mostly Muslim country that has Arabic as one of its official languages. It is also located in a desert. Its people natively speak Afro-Asiatic languages, although like most of Sub-Saharan Africa, they use the Latin alphabet. Its infrastructure is quite poor and the quality of life there is low. Its people have a mix of Sub-Saharan African and Caucasian ancestry.
Revised and Simplified Map
Based on the new findings comparing average linkage clustering and regional average similarity, a new map was created showing the nine macro-regions of the world and no subregions. Countries where there was a difference between average linkage clustering and regional average similarity are displayed as being split between two different regions.
Regions of the World by Average Linkage Clustering
Regions of the World by Regional Average Similarity
Do you agree with these regions of the world?
Please leave any thoughts in the comments section.