Open Source Ecosystem Growth
I've been trying to quantify the growth of various open source ecosystems. The other day I published a story to Perspective about it. If you have an iOS device you should watch the Perspective story since it allows you to interact with the data in a way you can't with the static charts I'll be showing in this article.
Using data from modulecounts.com we can look at the size of different package repositories on a quarter by quarter basis for the last few years.
This is how people tend to visualize this data. In fact, this chart is very similar to the way the data is shown on modulecounts.com which is good for comparing the absolute number of packages per repository. But, this doesn't illustrate the growth of each repository very well. Rather than compare the absolute size we can look at the number of new packages added each quarter.
This already shows us a lot we couldn't see before. For one quarter CPAN added a huge number of modules, as many at Ruby, but this was nearly invisible in the previous graph. We also see that Ruby and Python are a little closer in growth than the previous chart would have suggested, with Ruby about double Python. This also illustrates pretty well that the first full quarter we have for node's growth it's already larger than Python and that, in terms of growth, it overtakes Ruby by the last quarter of 2012 and takes a big lead by the first quarter of this year.
The biggest criticism you can make of the previous charts it that they tend to visualize a package in each repository as being equal. People tend to dismiss any package system data as an indicator of growth or popularity on the basis that one community publishes "smaller" or "less significant" packages compared to another. One way we can overcome this is by displaying the growth of each repository relative to its own size the previous quarter. This makes the size of individual packages irrelevant since we're looking at growth as a percentage of the repository and rather than comparing size of a repository or new packages added we can compare the percentage growth quarter over quarter.
To me, this is the most interesting data. Many claims were made last year that some of these platforms were losing developers and that the ecosystems were dying but what we actually see is that the growth rate of all these platforms is relatively stable. If these platforms were losing developers, or developers were becoming less engaged, or even if these platforms failed to attract new developers we'd see a bigger decline in growth. Every quarter the size of the repository grows and all of these platforms have to fight harder to maintain a stable growth percentage. Where some might claim that the result of node's incredible growth is a divestment in existing platforms this data would seem to indicate that this is not true, that the available growth of open source platforms is not finite and that each of these platforms is attracting new people and are less in competition with each other than people claim.
It would be my instinct to write off node's first quarter growth as being easy due to the relatively small size of the repository but the fact that it maintains that growth and that by the end of the first quarter of this year it's nearly the same size as Perl and Python's repositories while it actually increases its growth rate it looks more like node's ~25% growth rate is about as stable as the other platforms we're tracking.
I've heard people claim that people are "leaving Ruby for node.js" and while this may be anecdotally true Ruby is attracting enough new developers to maintain a healthy growth rate and node.js explosive growth (100% year over year) can't be explained by any people "switching" from other platforms to node.js. This data would seem to indicate that node.js is attracting a huge following of people who are not already invested in the platforms listed above.
What is also surprising is how close this data shows Ruby and Python. Both are growing at a rate, relative to their existing size, of within a few percentage points.
Now that we're pretty sure that all these platforms are healthy and that this year they grew at about the same rate as last year let's take the average growth rate for last year and predict the absolute growth for the next few years.
If this were the current growth chart it would lead to a lot of divestment in platforms below the middle line, which would be foolish and irrational. This chart assumes that everyone maintains their current, rather healthy, growth rate but because it compares the absolute values it gives the impression that there is divestment below a certain line. This is how we usually look at this data and I hope this illustrates how many false impressions it gives.