November 16th, 2011 § § permalink
N.B. For best results, try to get Destiny’s Child’s “Survivor” going in your head before proceeding.
Two recent blog posts by Larry Cebula and Holger Syme highlight the deep divide that separates the pessimists from the optimists in academia. Cebula explains why he steers his students away from pursuing a career as a professor, essentially arguing that the odds are simply too stacked against them even under the best of conditions. Syme, in contrast, suggests that Cebula’s dream-crushing advice is short-sighted and ultimately dangerous to the long-term viability of the profession. While Cebula’s reasoning will be familiar to many — he’s working the same rich vein as has William Pannapacker under his nom de doom Thomas H. Benton in the Chronicle of Higher Education — I suspect he’s also still greatly outnumbered among the greater population of academics, thanks in no small part to survivorship bias and an unwillingness to grapple with the unforgiving calculus of opportunity cost.
“Survivorship bias,” you ask? Let’s roll Wikipedia:
Survivorship bias can lead to overly optimistic beliefs because failures are ignored […] It can also lead to the false belief that the successes in a group have some special property, rather than being just lucky.
Survivorship bias can lead to all sorts of hilarious situations, like your grandparents advising you not to bother wearing a seatbelt because they never did, and look, they’re still around! Unfortunately it also can drive you to imagine that there’s a reason for your success. You certainly deserved to win, and others who deserve it will succeed, too!
In contrast, Cebula, Pannapacker, et al. appear at least partially motivated by survivorship bias’s evil twin, survivor’s guilt. They just know that they didn’t do anything particularly special, and they’re mortified by the thought that others have failed and are failing while they haven’t. They blame themselves for the gross imbalance in the job market and believe that by discouraging students from beginning the process, they can mitigate some of its deleterious effects. I’ll admit that I find this position rather appealing, in part because it’s self-deprecating, but mostly because in my case it’s actually quite easy to calculate what’s at stake.
The two core elements of the argument against pursuing a career in academia are the poor prospects for employment at the end of the road and the opportunity cost along the way. Few professors would deny the existence of the first challenge: we’ve all run the gauntlet of the job market, often multiple times, and we’ve seen our friends and colleagues and rivals do the same. Periodic reports on the state of the job market confirm what we instinctively know: there are too many candidates chasing too few jobs. And even among those available positions, fewer still conform to the ideal of a 2–2 tenure-track post.
The point about opportunity cost is far more contentious because it’s less easily quantifiable (and historians at least are of course explicitly trained to avoid indulging in such hypotheticals), and possibly because so many of us are bad at math. Cebula was (rightly) called out by my colleague Zach Schrag for advancing the figure of “a million dollars” as an indication of the income forgone by students pursuing graduate study as opposed to some more remunerative career (and here everyone seems oddly obsessed with Hooters). But the opportunity costs can still be enormous, as my own experience shows.
For three years between undergraduate and graduate study, I worked at IBM, first as a contractor and then as a direct hire employee. When I left in August 1999 for the University of Michigan, my salary was $72,000, excluding bonuses, awards, etc. which were frequent and not particularly difficult to obtain. I then spent the next six years as a graduate student, eking out a modest existence on teaching fellowships and research grants. So what did I give up?
At IBM, ten percent raises were the norm for reasonably competent individuals, and I had in the previous year received a 13% increase and far larger increases in previous years. To be very conservative, let’s assume 10% per year and no promotions, bonuses, or awards.
| Year |
IBM |
Michigan |
| 2000 |
$72,000 |
$14,000 |
| 2001 |
$79,200 |
$14,000 |
| 2002 |
$87,120 |
$13,000 |
| 2003 |
$95,832 |
$27,000 |
| 2004 |
$105,415 |
$21,000 |
| 2005 |
$115,957 |
$6,000 |
| Total |
$555,524 |
$95,000 |
In contrast, during my six years of graduate study, I earned a total of around $95,000 in stipends and research grants, making for a net opportunity cost approaching half a million dollars. Not Cebula’s full million, but still nothing to sneeze at. Had I remained in graduate school longer than six years, the gulf would of course have widened even more rapidly, since I would have traded the compounding effect of raises for the rapidly diminishing handouts granted to lingering ABDs.
What critics often fail to remember is that opportunity cost doesn’t magically vanish even if one is lucky enough to land a good tenure-track position like I did. Even with the relatively minor inflation we’ve experienced over the last decade, 1999’s $72,000 is equivalent to just over $98,000 today. Not exactly the average assistant professor salary in the humanities, is it? Indeed, the opportunity cost associated with academia effectively continues to mount indefinitely.
When we tell our students that earning a PhD is essentially a very costly way to purchase a lottery ticket (albeit with less astronomical odds but also a vastly diminished payout), we’re already making a good case for what might not be there at the end of the road. But we also need to explain to them what absolutely won’t be there: opportunity cost’s forgone earnings. Syme oddly claims that Cebula is cynical to address these financial concerns, continuing:
Has anyone ever been under the illusion that working as an academic in the humanities was a quick way to wealth, homeownership, and a stable nuclear family existence?
Well no, probably not. But unless I’m mistaken, what animates Cebula’s argument isn’t regret that working as an academic in the humanities is not a quick path to these goals — which, aside from “wealth,” are hardly grandiose aspirations — it’s the understanding that working as an academic in the humanities more often than not rules out these goals entirely.
May 21st, 2011 § § permalink
Among the more eye-popping numbers associated with LinkedIn’s recent initial public offering is the 100,000,000 members it claims. What do those hundred million people do with their LinkedIn accounts? If they’re like me, they quietly ignore the endless spam but never quite motivate to unsubscribe. Or maybe they occasionally click through a link returned by a Google search, only to discover the limp résumé of some sad sack looking to escape the Enterprise rent-a-car counter, not the super cool and attractive “Sean Takats” that they went to high school with and are stalking.
I’ve been thinking a lot about these kinds of numbers as the Zotero team prepares for a major summit this summer. In our first few years, we used to measure Zotero’s growth in terms of downloads, but we quit doing so well over a year ago, when that number was north of four million, having doubled from two million just a few months earlier. We stopped because downloads are never a very accurate measurement of adoption, and they are especially problematic for Zotero, which is available from a variety of repositories. Most users get our software from either zotero.org or addons.mozilla.org, but Zotero has also popped up elsewhere, mainly because we don’t restrict its distribution in any way. In the absence of any other metric, however, downloads are better than nothing, and Mendeley for example still uses downloads to arrive at its figure of 900K+ “people,” according to Ian Mulvaney’s recent code4lib talk. And when it comes to commercial products like EndNote, we of course have no idea at all.
A second way to measure usage would be to tally user account registrations. Currently zotero.org hosts 620,000 accounts. Note that I say “accounts” and not “users.” Indeed there’s no reason to think that this figure is anything more than very slightly more reliable than downloads. Zotero was around for years before we even had server accounts, and we have never aggressively pushed users of Zotero to register accounts by confronting them with a sign-up form before offering the download. We think server accounts provide incredibly valuable functionality, but we also feel that it’s a little sleezy to try to co-opt people into signing up for something they don’t want. So the “real” number could be much higher! Among that mass of accounts, there are hundreds of thousands of real, active researchers but also, inevitably, countless spammers waiting to be weeded and dormant accounts sitting idle. Or maybe it’s much lower! But even if we were to pretend that all 620,000 accounts were tended to by highly motivated scholars, we would still be faced with an order of magnitude drop when compared to downloads. A quick look at Mendeley’s people directory reveals a similar discrepancy: it lists fewer than 70,000 user accounts, which is nothing to sneeze at but of course well south of the download figure. How many accounts does RefWorks have? Again, we can’t know.
A final way would be to count how many people are running Zotero each day. Because Zotero automatically checks for updated translator code on a daily basis, we know that at least 275,000 instances of Zotero ran today. But wait a minute, what’s with this “instances” and “at least” business? Well, maybe some people are running more than one copy of Zotero on a single machine. We could account for unique IP addresses, which moves the number down slightly, but then we would ignore multiple instances of Zotero sharing a single public IP address. And of course, this figure only accounts for copies of Zotero that have automatic updates active, and that managed to connect to the internet. Other software vendors could presumably track sync activity or other metrics to arrive at analogous figures.
The basic moral of the story, if you haven’t already guessed, is that these numbers are all pure shit, though some are clearly worse than others. All we can do is provide an honest explanation of how they’re derived.
May 6th, 2011 § § permalink
Brian Croxall recently lit up the comment feed at the Chronicle with his ProfHacker comparison of “Zotero vs. Endnote,” where the debate centered mostly around issues of citation fidelity. As Fred Gibbs notes, however, “while citation formatting is one major reason to use bibliographic software, it isn’t necessarily the only or even primary reason, especially in the humanities.” Zotero’s citation functionality was always imagined merely as bait: by providing this labor-saving functionality, Zotero would encourage each user to move her research into what amounted to a fully searchable and shareable relational database that could be subjected to text mining and other analysis. Here researchers could begin to do truly remarkable and new things with their evidence.
A few commenters, as well as Fred, tried to shift the discussion toward the issue of cost and openness, and in particular to Zotero’s status as free/libre open source software (FLOSS). Many of Zotero’s most dedicated users have championed the software in the name of FLOSS, but this line of argument frequently falls on deaf ears, or even ears that are conditioned to reject FLOSS as somehow anti-market or anti-capitalist. From my perspective, FLOSS in and of itself is a fairly unpersuasive argument for using Zotero, akin to knee-jerk calls to “Buy American!” in the 1980s, when the USA still did some manufacturing.
Buying American and using FLOSS might make one feel some sense of moral superiority, but at the end of the day can those feelings still paper over our sense of existential dread when faced with driving to work in our crumbling K-cars or cobbling together a dissertation with shitty research software?
Just as Zotero’s citation management functionality is a means to an end, so is licensing and developing the software as FLOSS. Far from just ideology, FLOSS has allowed Zotero to leverage relatively limited financial resources to outperform vastly larger and better funded competitors, old and new. Zotero’s annual operating overhead is only in the low six figures. This amount covers in-house development, outreach, and infrastructure costs. In comparison, EndNote and Mendeley each have operating costs that are an order of magnitude greater (or even more). And of course there’s an even higher, hidden cost for these platforms: the expectation of substantial profit, which necessarily impinges on sustainability.
Why should any researcher care about these issues? Defenders of Zotero have often voiced concerns about “lock-in” with proprietary, for-profit software. Users might find themselves unable to migrate their data out of one of these commercial solutions at some later date. But even if this worry were valid — and I don’t know that it is — lock-in in and of itself isn’t necessarily a bad thing. Who would complain about being locked-in to the very best solution, particularly if that solution also didn’t cost any money?
Unfortunately, the closed, for-profit software option has never been the very best solution, and there’s no sign that that situation is changing. This isn’t ideology speaking; it’s history. EndNote has been derided for well over a decade for its 1990s interface and predatory “upgrade” cycles. New features come late or never, and the software has yet to embrace online research and collaboration. Mendeley, while far newer and theoretically nimbler, has likewise only slowly moved to provide the basic, core functionality that active, publishing researchers require. It’s entirely likely that “features” like journal abbreviations, citation page numbers, and subcollections will eventually make their way into Mendeley, or that EndNote will one day discover the internet, but the mere fact that these things haven’t yet transpired speaks volumes about the priorities of their parent corporations.
Because it’s FLOSS, Zotero has been able to add and refine features thanks to the contributions of hundreds of volunteer developers and the feedback of hundreds of thousands of users. The technological success of this model is undeniable: Zotero’s open-source citation engine, entirely rewritten by Zotero user Frank Bennett, and the thousands of user-contributed style files the engine uses have already been adopted by Mendeley and Papers, and a representative from XXXXX has expressed interest in doing the same. (Update: The individual who wrote regarding XXXXX and CSL clarifies that the communication was made in a personal capacity, not as a representative, and so I’ve removed the software’s name.)

Wikimedia Commons Credit: Stan Zurek
And of course, there is no reason to think that any of these parties is acting in the interest of serving ideological interests. Indeed, if we look at how they publicly address
FLOSS, we find ambivalence and disdain. Mendeley only admitted its use of Zotero code when confronted, and avoided any mention of the provenance of its citation styles
for years. Frank’s citation processing engine, despite saving countless hours of development and support, earns
faint praise. Papers likewise initially only
confessed its planned use of the citation styles when probed on Twitter.
Liberating researchers from the constraints of commercial software development has been good for research, not for ideological reasons but for technical ones. It has also been extremely good for commercial competitors, who recognize the value in openly developed software. What’s not at all clear is that attempting to put the genie back into this particular proprietary software bottle will sustain any of the remarkable momentum gained over the past few years, or whether innovation will continue to be stunted or stifled in pursuit of illusory financial gain.

Wikimedia Commons Credit: Finn Rindahl
As commenters on Brian’s post noted, there is a real cost associated with moving between research software, and it’s inevitably in the interest of for-profit entities to keep those costs as high as possible. Right now the market won’t bear very high costs, but that’s largely thanks to Zotero, not because it’s free but because it’s
FLOSS. EndNote, Mendeley, and the rest simply aren’t equivalent players, because the market that they’re squabbling over is checked in growth and in all likelihood doomed to decline so long as there is a strong
FLOSS competitor.
April 20th, 2011 § § permalink
About ten thousand years ago, we were introduced to the phrase “time shifting” by a decade-long lawsuit over the right to use VCRs to tape TV shows for later viewing. Today’s DVR has of course made this process far easier and probably more widespread, but the idea remains the same: rather than watch something right now, with no snack breaks, we instead put it off until some later time. Other than the occasionally self-serving gripe about having “a lot of TiVo to catch up on,” time shifting is a settled and dead issue, a non-story. Or it would be, if it were not for the troubling case of historical research.
In a recent post I fretted about how shifting research practices might affect the significance and allure of historical fields. Here I want to examine those shifting practices in a bit more detail. The benefits of compressing a research agenda or of greatly expanding the amount of materials that can be gathered (or both) has encouraged a wholesale transformation in the way that researchers now use archives. The point of visiting the archives hasn’t changed — people still go there to gather evidence — but before the widespread use of digital photography the collection of evidence was limited by what could be read, and then summarized in notes or transcribed. All of this activity necessarily had to occur on-site, during the limited hours and days of operation, further constrained of course by strikes, holidays, and hangovers.
With digital photography, a far greater number of documents can now be processed in a much shorter period. This isn’t really news to anyone who has visited an archive in the last five years. And here, Robert Darnton’s recent defense of the analog rings especially hollow. In dispelling “5 Myths About the ‘Information Age’”, Darton claims,
“All information is now available online.” The absurdity of this claim is obvious to anyone who has ever done research in archives. Only a tiny fraction of archival material has ever been read, much less digitized.
This is certainly an accurate statement, so long as we only look backward. What Darton is ignoring of course is that essentially all archival material consulted today is being digitized, whether in the form of transcription or photography. What’s missing is the ability to access and mine these innumerable rich individual silos of data. Zotero is one step toward realizing this vast meta-archive, but however outrageously ambitious such a project might seem, it is trivial when compared to the massive amount of labor that has already been deployed to digitize at the individual, cottage-industry level.
What’s especially interesting, I think, is how this new practice might qualitatively affect research. In particular, I wonder how the creation and population of individual research queues, time-shifted for later consultation, will influence how scholars approach the gathering and analysis of evidence. Take, for example, the remarkable transformation in the area of pre-1923 printed materials. Whenever I encounter any reference to any printed source, the first thing that I now do is to consult Google Books or Gallica to see whether there is a digitized edition available. If I find a digital version — and most times I do — I add a copy to my Zotero library to be read later (and if it’s small format, octavo or duodecimo, usually on my Kindle).
This workflow has dramatically reduced the amount of time that I spend on-site in research libraries like the Bibliothèque nationale de France, which is increasingly becoming just another nice quiet place to do work, a purpose much like that served by my neighborhood branch library when I was in grade school, only with RFID cards and a smoking lounge. But it has also hugely increased the amount of time that I now spend reading and “doing research” at home, at night, and on weekends. Moreover, it’s incredibly easy to amass a massive queue of digitized documents and feel like one has “performed research” even though a good percentage (most?) of those materials might prove useless. So in a sense, we’re not just talking about time shifting an amount of research equivalent to say, 1998 levels, but rather that we’re simultaneously escalating the evidentiary basis for any research project.
Mike O’Malley and I have written about the changing landscape of historical research in the face of abundant evidence. We agree that finding, as part of the research process, will inevitably decline as a valued skill as associated costs continue to fall. In contrast, synthesis and contextualization, always valuable, will become even more important differentiating qualities. Yet I wonder whether time shifting, and the risks it necessarily introduces, won’t so overburden researchers that they fail to advance to the stage of the research cycle where they can begin to perform meaningful analysis. How is time shifting affecting your research?
April 8th, 2011 § § permalink
As a historian I like to think that I’m comfortable with the idea of fields dying. Maybe they’re reborn and live to fight another day (like diplomatic history, or so we all keep hearing), and maybe they never really die at all (like quantitative history, safely entrenched at Paris 1-Sorbonne).
But what about the old workhorse geographic fields like, say, French history? Unless you are Natalie Davis, being a French historian over the last five decades or more probably meant that you, well, went to France to do your research. And why did you go? You told your department chair and your grant funders and your colleagues that you had to get to the archives and the libraries, of course. But let’s be honest here: French historians don’t go to France to get to the documents. Instead we wed ourselves to documents that just so happen to be found only in France, “forcing” us to go there, usually when the weather there is super great, or at least super crappy at home. That’s why we became French historians in the first place. There are a few exceptions to this rule: the handful of self-loathing sad sacks we all know who hate wine and cheese and cigarettes, but this is no time to be cruel. » Read the rest of this entry «
March 27th, 2011 § § permalink
This past week’s THATCamp Firenze was a huge success, offering plenty of opportunities to learn about new projects and the various national and international flavors of digital humanities that are flourishing in Europe. But seriously, on a blog with this title, how can I ignore the most spectacular part of the trip?
-
- So much pork fat. Note my plate’s single sad leaf of radicchio (it was actually delicious).
-
- Serge shows Sharon, Amanda, and me how it’s done.
March 4th, 2011 § § permalink
Zotero’s server infrastructure has evolved in countless ways since the project’s 2006 launch, but most of those changes are super boring and not worth remembering. Over the past two months, however, we moved the bulk of Zotero’s back end to Amazon Web Services, a step that I believe is uniquely noteworthy in the context of digital humanities projects and their long-term sustainability. In this post I describe the recent changes to Zotero’s architecture. In the next post I’ll discuss why these changes are important for the digital humanities. This story is long, but it has a moral, and also a van. » Read the rest of this entry «
February 15th, 2011 § § permalink
Because I live under a rock (Vietnam), I only recently discovered the Google Charts API. When I saw that it supported maps, I thought it might be fun to plot the sales data for Zotero File Storage provided by the nonprofit corporation I started along with a bunch of other academics. Bear in mind that these maps only reflect the billing addresses associated with purchasers of Zotero storage. Zotero’s general user base is even more globally distributed and several orders of magnitude larger than the subset depicted here. Nonetheless the results are stunning, I think, and something that pleasantly reminds me of the last throes of a game of Risk. We have work to do in Africa and the Middle East. Click the thumbnails below for full-size, detailed images.
November 16th, 2010 § § permalink
I met up with my French colleague Marin Dacos today while he was in the middle of giving an exam on digital publishing to his humanities master’s students. While many U.S. graduate students (and professors) would balk at such a “factual” exam, I suspect that they would have a very tough time getting through it unscathed.
How would you or your students fare with a test like this? I’ve translated the questions into English. You have ninety minutes. Go!
Basic questions and definitions (13 points)
- In Wikipedia, what do “Diff” and “Edit war” mean?
- Name the XML formats useful for electronic publishing and specify their particularities.
- How does PageRank work? What are the advantages of this system? What are its drawbacks?
- Who is Tim Berners-Lee? What is the W3C?
- What is metadata? What is Dublin Core?
- What is DRM? What are its advantages and disadvantages?
- What is single source publishing?
- What is the difference between the PDF and EPUB formats?
- What is Zotero? What does it do?
- What is a DOI? What is name resolution?
- What is interoperability? What is OAI-PMH? What are the main verbs of OAI-PMH and what do they do?
- What is the attention economy?
- What is the Creative Commons License? What is its purpose?
Synthesis (7 points)
- Electronic publishing falls into three categories. For each type of electronic publishing, provide a definition, at least one representative example, its main technical characteristics, its principal qualities, and its major faults.
- The publishing industry is searching for an economic model of electronic publication. Present the different strategies currently under development (name, basic description, example, advantages, disadvantages)
November 12th, 2010 § § permalink
Before the excitement surrounding last month’s Open Access Week fades completely — too late — I thought it might be appropriate to describe how and where Zotero intersects with OA. When people talk about Open Access, they typically mean free access to published, usually scholarly, content. It’s a concept that’s ideologically easy for most researchers to get behind because few of us reap any direct financial benefit from the majority of our publications, and we’re all very familiar with the annoying frictions introduced by gating access to content. Championing Open Access is kind of like advocating not clubbing baby seals: you’re unlikely to encounter much opposition. » Read the rest of this entry «