兩個行業的故事:編程語言與富裕國家和發展中國家之間的差異

A Tale of Two Industries: How Programming Languages Differ Between Wealthy and Developing Countries

技術與人均國內生產總值相關
在最近的一篇文章中,我們看到Android的流量問題(佔國家堆棧溢出訪問的百分比)往往與一個國家的人均GDP負相關。 這可能會導致我們想知道任何其他標籤是否也是如此。
當我們探索主要的編程語言和平臺時,除了Android之外,其他一些表現出來包括PHP,Python和R.


Android和PHP流量的數量與國家的收入呈負相關,而Python和R則呈正相關。在每種情況下,我們可以看到例外(韓國使用的Android比我們預期的要多,而中國更多的是Python),但一般來說,相關性很強。 (每個具有約0.5-6的R2,對於多次測試進行調整後,p值<10-6)。


我們會強調,我們在這裏並不表示任何因果關係。我們當然不是說編程語言選擇影響到一個國家的平均收入,但是我們也不是說一個國家的財富直接影響到他們對技術的使用。我們懷疑司機可能會混合經濟和社會因素(教育水平,軟件行業的年齡,外包水平),這通常與一國的財富相關。


我們如何將軟件開發行業分爲兩個?


當我們研究趨勢時,談論兩組國家(高收入和非高收入)是有用的,而不是考慮一堆相關性。作爲一個有用的預先存在的分類,我們可以使用世界銀行的收入分類,這是根據人均國民總收入(國民總收入)(參見這裏討論這一分類)。



There are 78 high-income economies, largely made up of the US and Canada, Western Europe, parts of the Middle East and East Asia, and Australia/New Zealand. I’ve done some analyses of the fundamental drivers of the between-country variation (such as principal component analysis) that suggest this is a reasonable division, and that it’s more meaningful than other ways we could divide them, such as Eastern vs Western Hemisphere. (For instance, Australia is generally more similar to the US and Europe in terms of visited technologies than it is to China or Indonesia).


The division splits Stack Overflow traffic into groups of about two-thirds and one-third: 63.7% of Stack Overflow’s traffic comes from high income countries. (This likely is due to a combination of greater proportion of software development, more widespread internet access, and a disproportionate share of English-speakers). Much of the traffic from non-high-income countries comes from India, followed by Brazil, Russia, and China.

How do high-income countries differ in the technologies they use?

We’ve now divided the software development world into two segments. How do high-income and non-high-income countries differ in terms of the technologies they use?



We can extract several interesting insights:

  • Difference in data science technologies: As we saw earlier, Python and R are associated with a country’s income. Python is visited about twice as often in high-income countries as in the rest of the world, and R about three times as much. We might also notice that among the smaller tags, many of the greatest shifts are in scientific Python and R packages such as pandasnumpymatplotlib and ggplot2. This suggests that part of the income gap in these two languages may be due to their role in science and academic research. It makes sense these would be more common in wealthier industrialized nations, where scientific research makes up a larger portion of the economy and programmers are more likely to have advanced degrees.

  • C/C++: C/C++ are two other notable languages that tend to be visited from high-income countries. One hypothesis is that this may have to do with education: as we saw in a previous post, C and C++ are among the languages more disproportionately visited from American universities. It could also be related to the geographic distribution of the electronics and manufacturing industries.

  • PHP and Android: We explored Android development around the world in a previous post, but PHP is another technology that’s notably associated with lower-income countries. It’s interesting to see that CodeIgniter, a PHP open source framework, is the tag that’s singularly most disproportionately visited from lower-income countries, by a large margin. Further examination shows it is especially heavily visited in South/Southeast Asia (particularly India, Indonesia, Pakistan and the Philippines) while it has very little traffic from the US and Europe. It’s possible that CodeIgniter is a common choice for outsourcing firms building websites.

Conclusion: why does this matter?

I was certainly interested in these results as a fun fact about the programming language ecosystem. But it also has implications for other data explorations we’ll be publishing in the near future.

When we ask questions about the software development industry, it’s important to know that we’re really answering two separate questions that have been “blended” together, and that separating them can sometimes give us more informative answers.

For example, we’re often interested in understanding which technologies drive the most traffic, such as examining technologies like Flash that are shrinking over time. If we were to create a list of the most visited programming technologies, it would be different for high-income and low-income countries:




發表評論
所有評論
還沒有人評論,想成為第一個評論的人麼? 請在上方評論欄輸入並且點擊發布.
相關文章