On Usage Figures

Among the more eye-popping numbers associated with LinkedIn’s recent initial public offering is the 100,000,000 members it claims. What do those hundred million people do with their LinkedIn accounts? If they’re like me, they quietly ignore the endless spam but never quite motivate to unsubscribe. Or maybe they occasionally click through a link returned by a Google search, only to discover the limp résumé of some sad sack looking to escape the Enterprise rent-a-car counter, not the super cool and attractive “Sean Takats” that they went to high school with and are stalking.

I’ve been thinking a lot about these kinds of numbers as the Zotero team prepares for a major summit this summer. In our first few years, we used to measure Zotero’s growth in terms of downloads, but we quit doing so well over a year ago, when that number was north of four million, having doubled from two million just a few months earlier. We stopped because downloads are never a very accurate measurement of adoption, and they are especially problematic for Zotero, which is available from a variety of repositories. Most users get our software from either zotero.org or addons.mozilla.org, but Zotero has also popped up elsewhere, mainly because we don’t restrict its distribution in any way. In the absence of any other metric, however, downloads are better than nothing, and Mendeley for example still uses downloads to arrive at its figure of 900K+ “people,” according to Ian Mulvaney’s recent code4lib talk. And when it comes to commercial products like EndNote, we of course have no idea at all.

A second way to measure usage would be to tally user account registrations. Currently zotero.org hosts 620,000 accounts. Note that I say “accounts” and not “users.” Indeed there’s no reason to think that this figure is anything more than very slightly more reliable than downloads. Zotero was around for years before we even had server accounts, and we have never aggressively pushed users of Zotero to register accounts by confronting them with a sign-up form before offering the download. We think server accounts provide incredibly valuable functionality, but we also feel that it’s a little sleezy to try to co-opt people into signing up for something they don’t want. So the “real” number could be much higher! Among that mass of accounts, there are hundreds of thousands of real, active researchers but also, inevitably, countless spammers waiting to be weeded and dormant accounts sitting idle. Or maybe it’s much lower! But even if we were to pretend that all 620,000 accounts were tended to by highly motivated scholars, we would still be faced with an order of magnitude drop when compared to downloads. A quick look at Mendeley’s people directory reveals a similar discrepancy: it lists fewer than 70,000 user accounts, which is nothing to sneeze at but of course well south of the download figure. How many accounts does RefWorks have? Again, we can’t know.

A final way would be to count how many people are running Zotero each day. Because Zotero automatically checks for updated translator code on a daily basis, we know that at least 275,000 instances of Zotero ran today. But wait a minute, what’s with this “instances” and “at least” business? Well, maybe some people are running more than one copy of Zotero on a single machine. We could account for unique IP addresses, which moves the number down slightly, but then we would ignore multiple instances of Zotero sharing a single public IP address. And of course, this figure only accounts for copies of Zotero that have automatic updates active, and that managed to connect to the internet. Other software vendors could presumably track sync activity or other metrics to arrive at analogous figures.

The basic moral of the story, if you haven’t already guessed, is that these numbers are all pure shit, though some are clearly worse than others. All we can do is provide an honest explanation of how they’re derived.