Survivorship Bias, Survivor Guilt, and Opportunity Cost, Oh My!

November 16th, 2011 § 6 comments § permalink

N.B. For best results, try to get Destiny’s Child’s “Sur­vivor” going in your head before proceeding.

Two recent blog posts by Larry Cebula and Hol­ger Syme high­light the deep divide that sep­a­rates the pes­simists from the opti­mists in acad­e­mia. Cebula explains why he steers his stu­dents away from pur­su­ing a career as a pro­fes­sor, essen­tially argu­ing that the odds are sim­ply too stacked against them even under the best of con­di­tions. Syme, in con­trast, sug­gests that Cebula’s dream-crushing advice is short-sighted and ulti­mately dan­ger­ous to the long-term via­bil­ity of the pro­fes­sion. While Cebula’s rea­son­ing will be famil­iar to many — he’s work­ing the same rich vein as has William Pan­na­packer under his nom de doom Thomas H. Ben­ton in the Chron­i­cle of Higher Edu­ca­tion — I sus­pect he’s also still greatly out­num­bered among the greater pop­u­la­tion of aca­d­e­mics, thanks in no small part to sur­vivor­ship bias and an unwill­ing­ness to grap­ple with the unfor­giv­ing cal­cu­lus of oppor­tu­nity cost.

Sur­vivor­ship bias,” you ask? Let’s roll Wikipedia:

Sur­vivor­ship bias can lead to overly opti­mistic beliefs because fail­ures are ignored […] It can also lead to the false belief that the suc­cesses in a group have some spe­cial prop­erty, rather than being just lucky.

Sur­vivor­ship bias can lead to all sorts of hilar­i­ous sit­u­a­tions, like your grand­par­ents advis­ing you not to bother wear­ing a seat­belt because they never did, and look, they’re still around! Unfor­tu­nately it also can drive you to imag­ine that there’s a rea­son for your suc­cess. You cer­tainly deserved to win, and oth­ers who deserve it will suc­ceed, too!

In con­trast, Cebula, Pan­na­packer, et al. appear at least par­tially moti­vated by sur­vivor­ship bias’s evil twin, survivor’s guilt. They just know that they didn’t do any­thing par­tic­u­larly spe­cial, and they’re mor­ti­fied by the thought that oth­ers have failed and are fail­ing while they haven’t. They blame them­selves for the gross imbal­ance in the job mar­ket and believe that by dis­cour­ag­ing stu­dents from begin­ning the process, they can mit­i­gate some of its dele­te­ri­ous effects. I’ll admit that I find this posi­tion rather appeal­ing, in part because it’s self-deprecating, but mostly because in my case it’s actu­ally quite easy to cal­cu­late what’s at stake.

The two core ele­ments of the argu­ment against pur­su­ing a career in acad­e­mia are the poor prospects for employ­ment at the end of the road and the oppor­tu­nity cost along the way. Few pro­fes­sors would deny the exis­tence of the first chal­lenge: we’ve all run the gaunt­let of the job mar­ket, often mul­ti­ple times, and we’ve seen our friends and col­leagues and rivals do the same. Peri­odic reports on the state of the job mar­ket con­firm what we instinc­tively know: there are too many can­di­dates chas­ing too few jobs. And even among those avail­able posi­tions, fewer still con­form to the ideal of a 2–2 tenure-track post.

The point about oppor­tu­nity cost is far more con­tentious because it’s less eas­ily quan­tifi­able (and his­to­ri­ans at least are of course explic­itly trained to avoid indulging in such hypo­thet­i­cals), and pos­si­bly because so many of us are bad at math. Cebula was (rightly) called out by my col­league Zach Schrag for advanc­ing the fig­ure of “a mil­lion dol­lars” as an indi­ca­tion of the income for­gone by stu­dents pur­su­ing grad­u­ate study as opposed to some more remu­ner­a­tive career (and here every­one seems oddly obsessed with Hoot­ers). But the oppor­tu­nity costs can still be enor­mous, as my own expe­ri­ence shows.

For three years between under­grad­u­ate and grad­u­ate study, I worked at IBM, first as a con­trac­tor and then as a direct hire employee. When I left in August 1999 for the Uni­ver­sity of Michi­gan, my salary was $72,000, exclud­ing bonuses, awards, etc. which were fre­quent and not par­tic­u­larly dif­fi­cult to obtain. I then spent the next six years as a grad­u­ate stu­dent, eking out a mod­est exis­tence on teach­ing fel­low­ships and research grants. So what did I give up?

At IBM, ten per­cent raises were the norm for rea­son­ably com­pe­tent indi­vid­u­als, and I had in the pre­vi­ous year received a 13% increase and far larger increases in pre­vi­ous years. To be very con­ser­v­a­tive, let’s assume 10% per year and no pro­mo­tions, bonuses, or awards.

Year IBM Michi­gan
2000 $72,000 $14,000
2001 $79,200 $14,000
2002 $87,120 $13,000
2003 $95,832 $27,000
2004 $105,415 $21,000
2005 $115,957 $6,000
Total $555,524 $95,000

In con­trast, dur­ing my six years of grad­u­ate study, I earned a total of around $95,000 in stipends and research grants, mak­ing for a net oppor­tu­nity cost approach­ing half a mil­lion dol­lars. Not Cebula’s full mil­lion, but still noth­ing to sneeze at. Had I remained in grad­u­ate school longer than six years, the gulf would of course have widened even more rapidly, since I would have traded the com­pound­ing effect of raises for the rapidly dimin­ish­ing hand­outs granted to lin­ger­ing ABDs.

What crit­ics often fail to remem­ber is that oppor­tu­nity cost doesn’t mag­i­cally van­ish even if one is lucky enough to land a good tenure-track posi­tion like I did. Even with the rel­a­tively minor infla­tion we’ve expe­ri­enced over the last decade, 1999’s $72,000 is equiv­a­lent to just over $98,000 today. Not exactly the aver­age assis­tant pro­fes­sor salary in the human­i­ties, is it? Indeed, the oppor­tu­nity cost asso­ci­ated with acad­e­mia effec­tively con­tin­ues to mount indefinitely.

When we tell our stu­dents that earn­ing a PhD is essen­tially a very costly way to pur­chase a lot­tery ticket (albeit with less astro­nom­i­cal odds but also a vastly dimin­ished pay­out), we’re already mak­ing a good case for what might not be there at the end of the road. But we also need to explain to them what absolutely won’t be there: oppor­tu­nity cost’s for­gone earn­ings. Syme oddly claims that Cebula is cyn­i­cal to address these finan­cial con­cerns, continuing:

Has any­one ever been under the illu­sion that work­ing as an aca­d­e­mic in the human­i­ties was a quick way to wealth, home­own­er­ship, and a sta­ble nuclear fam­ily existence?

Well no, prob­a­bly not. But unless I’m mis­taken, what ani­mates Cebula’s argu­ment isn’t regret that work­ing as an aca­d­e­mic in the human­i­ties is not a quick path to these goals — which, aside from “wealth,” are hardly grandiose aspi­ra­tions — it’s the under­stand­ing that work­ing as an aca­d­e­mic in the human­i­ties more often than not rules out these goals entirely.

On Usage Figures

May 21st, 2011 § 9 comments § permalink

Among the more eye-popping num­bers asso­ci­ated with LinkedIn’s recent ini­tial pub­lic offer­ing is the 100,000,000 mem­bers it claims. What do those hun­dred mil­lion peo­ple do with their LinkedIn accounts? If they’re like me, they qui­etly ignore the end­less spam but never quite moti­vate to unsub­scribe. Or maybe they occa­sion­ally click through a link returned by a Google search, only to dis­cover the limp résumé of some sad sack look­ing to escape the Enter­prise rent-a-car counter, not the super cool and attrac­tive “Sean Takats” that they went to high school with and are stalking.

I’ve been think­ing a lot about these kinds of num­bers as the Zotero team pre­pares for a major sum­mit this sum­mer. In our first few years, we used to mea­sure Zotero’s growth in terms of down­loads, but we quit doing so well over a year ago, when that num­ber was north of four mil­lion, hav­ing dou­bled from two mil­lion just a few months ear­lier. We stopped because down­loads are never a very accu­rate mea­sure­ment of adop­tion, and they are espe­cially prob­lem­atic for Zotero, which is avail­able from a vari­ety of repos­i­to­ries. Most users get our soft­ware from either zotero.org or addons.mozilla.org, but Zotero has also popped up else­where, mainly because we don’t restrict its dis­tri­b­u­tion in any way. In the absence of any other met­ric, how­ever, down­loads are bet­ter than noth­ing, and Mende­ley for exam­ple still uses down­loads to arrive at its fig­ure of 900K+ “peo­ple,” accord­ing to Ian Mulvaney’s recent code4lib talk. And when it comes to com­mer­cial prod­ucts like End­Note, we of course have no idea at all.

A sec­ond way to mea­sure usage would be to tally user account reg­is­tra­tions. Cur­rently zotero.org hosts 620,000 accounts. Note that I say “accounts” and not “users.” Indeed there’s no rea­son to think that this fig­ure is any­thing more than very slightly more reli­able than down­loads. Zotero was around for years before we even had server accounts, and we have never aggres­sively pushed users of Zotero to reg­is­ter accounts by con­fronting them with a sign-up form before offer­ing the down­load. We think server accounts pro­vide incred­i­bly valu­able func­tion­al­ity, but we also feel that it’s a lit­tle sleezy to try to co-opt peo­ple into sign­ing up for some­thing they don’t want. So the “real” num­ber could be much higher! Among that mass of accounts, there are hun­dreds of thou­sands of real, active researchers but also, inevitably, count­less spam­mers wait­ing to be weeded and dor­mant accounts sit­ting idle. Or maybe it’s much lower! But even if we were to pre­tend that all 620,000 accounts were tended to by highly moti­vated schol­ars, we would still be faced with an order of mag­ni­tude drop when com­pared to down­loads. A quick look at Mendeley’s peo­ple direc­tory reveals a sim­i­lar dis­crep­ancy: it lists fewer than 70,000 user accounts, which is noth­ing to sneeze at but of course well south of the down­load fig­ure. How many accounts does Ref­Works have? Again, we can’t know.

A final way would be to count how many peo­ple are run­ning Zotero each day. Because Zotero auto­mat­i­cally checks for updated trans­la­tor code on a daily basis, we know that at least 275,000 instances of Zotero ran today. But wait a minute, what’s with this “instances” and “at least” busi­ness? Well, maybe some peo­ple are run­ning more than one copy of Zotero on a sin­gle machine. We could account for unique IP addresses, which moves the num­ber down slightly, but then we would ignore mul­ti­ple instances of Zotero shar­ing a sin­gle pub­lic IP address. And of course, this fig­ure only accounts for copies of Zotero that have auto­matic updates active, and that man­aged to con­nect to the inter­net. Other soft­ware ven­dors could pre­sum­ably track sync activ­ity or other met­rics to arrive at anal­o­gous figures.

The basic moral of the story, if you haven’t already guessed, is that these num­bers are all pure shit, though some are clearly worse than oth­ers. All we can do is pro­vide an hon­est expla­na­tion of how they’re derived.

Zotero Versus

May 6th, 2011 § 11 comments § permalink

Brian Crox­all recently lit up the com­ment feed at the Chron­i­cle with his ProfHacker com­par­i­son of “Zotero vs. End­note,” where the debate cen­tered mostly around issues of cita­tion fidelity. As Fred Gibbs notes, how­ever, “while cita­tion for­mat­ting is one major rea­son to use bib­li­o­graphic soft­ware, it isn’t nec­es­sar­ily the only or even pri­mary rea­son, espe­cially in the human­i­ties.” Zotero’s cita­tion func­tion­al­ity was always imag­ined merely as bait: by pro­vid­ing this labor-saving func­tion­al­ity, Zotero would encour­age each user to move her research into what amounted to a fully search­able and share­able rela­tional data­base that could be sub­jected to text min­ing and other analy­sis. Here researchers could begin to do truly remark­able and new things with their evidence.

A few com­menters, as well as Fred, tried to shift the dis­cus­sion toward the issue of cost and open­ness, and in par­tic­u­lar to Zotero’s sta­tus as free/libre open source soft­ware (FLOSS). Many of Zotero’s most ded­i­cated users have cham­pi­oned the soft­ware in the name of FLOSS, but this line of argu­ment fre­quently falls on deaf ears, or even ears that are con­di­tioned to reject FLOSS as some­how anti-market or anti-capitalist. From my per­spec­tive, FLOSS in and of itself is a fairly unper­sua­sive argu­ment for using Zotero, akin to knee-jerk calls to “Buy Amer­i­can!” in the 1980s, when the USA still did some man­u­fac­tur­ing. Buy­ing Amer­i­can and using FLOSS might make one feel some sense of moral supe­ri­or­ity, but at the end of the day can those feel­ings still paper over our sense of exis­ten­tial dread when faced with dri­ving to work in our crum­bling K-cars or cob­bling together a dis­ser­ta­tion with shitty research software?

Just as Zotero’s cita­tion man­age­ment func­tion­al­ity is a means to an end, so is licens­ing and devel­op­ing the soft­ware as FLOSS. Far from just ide­ol­ogy, FLOSS has allowed Zotero to lever­age rel­a­tively lim­ited finan­cial resources to out­per­form vastly larger and bet­ter funded com­peti­tors, old and new. Zotero’s annual oper­at­ing over­head is only in the low six fig­ures. This amount cov­ers in-house devel­op­ment, out­reach, and infra­struc­ture costs. In com­par­i­son, End­Note and Mende­ley each have oper­at­ing costs that are an order of mag­ni­tude greater (or even more). And of course there’s an even higher, hid­den cost for these plat­forms: the expec­ta­tion of sub­stan­tial profit, which nec­es­sar­ily impinges on sustainability.

Why should any researcher care about these issues? Defend­ers of Zotero have often voiced con­cerns about “lock-in” with pro­pri­etary, for-profit soft­ware. Users might find them­selves unable to migrate their data out of one of these com­mer­cial solu­tions at some later date. But even if this worry were valid — and I don’t know that it is — lock-in in and of itself isn’t nec­es­sar­ily a bad thing. Who would com­plain about being locked-in to the very best solu­tion, par­tic­u­larly if that solu­tion also didn’t cost any money?

Unfor­tu­nately, the closed, for-profit soft­ware option has never been the very best solu­tion, and there’s no sign that that sit­u­a­tion is chang­ing. This isn’t ide­ol­ogy speak­ing; it’s his­tory. End­Note has been derided for well over a decade for its 1990s inter­face and preda­tory “upgrade” cycles. New fea­tures come late or never, and the soft­ware has yet to embrace online research and col­lab­o­ra­tion. Mende­ley, while far newer and the­o­ret­i­cally nim­bler, has like­wise only slowly moved to pro­vide the basic, core func­tion­al­ity that active, pub­lish­ing researchers require. It’s entirely likely that “fea­tures” like jour­nal abbre­vi­a­tions, cita­tion page num­bers, and sub­col­lec­tions will even­tu­ally make their way into Mende­ley, or that End­Note will one day dis­cover the inter­net, but the mere fact that these things haven’t yet tran­spired speaks vol­umes about the pri­or­i­ties of their par­ent corporations.

Because it’s FLOSS, Zotero has been able to add and refine fea­tures thanks to the con­tri­bu­tions of hun­dreds of vol­un­teer devel­op­ers and the feed­back of hun­dreds of thou­sands of users. The tech­no­log­i­cal suc­cess of this model is unde­ni­able: Zotero’s open-source cita­tion engine, entirely rewrit­ten by Zotero user Frank Ben­nett, and the thou­sands of user-contributed style files the engine uses have already been adopted by Mende­ley and Papers, and a rep­re­sen­ta­tive from XXXXX has expressed inter­est in doing the same.1 (Update: The indi­vid­ual who wrote regard­ing XXXXX and CSL clar­i­fies that the com­mu­ni­ca­tion was made in a per­sonal capac­ity, not as a rep­re­sen­ta­tive, and so I’ve removed the software’s name.)

Wiki­me­dia Com­mons Credit: Stan Zurek

And of course, there is no rea­son to think that any of these par­ties is act­ing in the inter­est of serv­ing ide­o­log­i­cal inter­ests. Indeed, if we look at how they pub­licly address FLOSS, we find ambiva­lence and dis­dain. Mende­ley only admit­ted its use of Zotero code when con­fronted, and avoided any men­tion of the prove­nance of its cita­tion styles for years. Frank’s cita­tion pro­cess­ing engine, despite sav­ing count­less hours of devel­op­ment and sup­port, earns faint praise. Papers like­wise ini­tially only con­fessed its planned use of the cita­tion styles when probed on Twitter.

Lib­er­at­ing researchers from the con­straints of com­mer­cial soft­ware devel­op­ment has been good for research, not for ide­o­log­i­cal rea­sons but for tech­ni­cal ones. It has also been extremely good for com­mer­cial com­peti­tors, who rec­og­nize the value in openly devel­oped soft­ware. What’s not at all clear is that attempt­ing to put the genie back into this par­tic­u­lar pro­pri­etary soft­ware bot­tle will sus­tain any of the remark­able momen­tum gained over the past few years, or whether inno­va­tion will con­tinue to be stunted or sti­fled in pur­suit of illu­sory finan­cial gain.

Wiki­me­dia Com­mons Credit: Finn Rindahl

As com­menters on Brian’s post noted, there is a real cost asso­ci­ated with mov­ing between research soft­ware, and it’s inevitably in the inter­est of for-profit enti­ties to keep those costs as high as pos­si­ble. Right now the mar­ket won’t bear very high costs, but that’s largely thanks to Zotero, not because it’s free but because it’s FLOSS. End­Note, Mende­ley, and the rest sim­ply aren’t equiv­a­lent play­ers, because the mar­ket that they’re squab­bling over is checked in growth and in all like­li­hood doomed to decline so long as there is a strong FLOSS competitor.

  1. To my knowl­edge, not a sin­gle pub­li­ca­tion style file has ever been con­tributed by a non-Zotero user. []

Time Shifting and Historical Research

April 20th, 2011 § 20 comments § permalink

About ten thou­sand years ago, we were intro­duced to the phrase “time shift­ing” by a decade-long law­suit over the right to use VCRs to tape TV shows for later view­ing. Today’s DVR has of course made this process far eas­ier and prob­a­bly more wide­spread, but the idea remains the same: rather than watch some­thing right now, with no snack breaks, we instead put it off until some later time. Other than the occa­sion­ally self-serving gripe about hav­ing “a lot of TiVo to catch up on,” time shift­ing is a set­tled and dead issue, a non-story. Or it would be, if it were not for the trou­bling case of his­tor­i­cal research.

In a recent post I fret­ted about how shift­ing research prac­tices might affect the sig­nif­i­cance and allure of his­tor­i­cal fields. Here I want to exam­ine those shift­ing prac­tices in a bit more detail. The ben­e­fits of com­press­ing a research agenda or of greatly expand­ing the amount of mate­ri­als that can be gath­ered (or both) has encour­aged a whole­sale trans­for­ma­tion in the way that researchers now use archives. The point of vis­it­ing the archives hasn’t changed — peo­ple still go there to gather evi­dence — but before the wide­spread use of dig­i­tal pho­tog­ra­phy the col­lec­tion of evi­dence was lim­ited by what could be read, and then sum­ma­rized in notes or tran­scribed. All of this activ­ity nec­es­sar­ily had to occur on-site, dur­ing the lim­ited hours and days of oper­a­tion, fur­ther con­strained of course by strikes, hol­i­days, and hangovers.

With dig­i­tal pho­tog­ra­phy, a far greater num­ber of doc­u­ments can now be processed in a much shorter period. This isn’t really news to any­one who has vis­ited an archive in the last five years. And here, Robert Darnton’s recent defense of the ana­log rings espe­cially hol­low.1 In dis­pelling “5 Myths About the ‘Infor­ma­tion Age’”, Dar­ton claims,

All infor­ma­tion is now avail­able online.” The absur­dity of this claim is obvi­ous to any­one who has ever done research in archives. Only a tiny frac­tion of archival mate­r­ial has ever been read, much less digitized.

This is cer­tainly an accu­rate state­ment, so long as we only look back­ward. What Dar­ton is ignor­ing of course is that essen­tially all archival mate­r­ial con­sulted today is being dig­i­tized, whether in the form of tran­scrip­tion or pho­tog­ra­phy. What’s miss­ing is the abil­ity to access and mine these innu­mer­able rich indi­vid­ual silos of data. Zotero is one step toward real­iz­ing this vast meta-archive, but how­ever out­ra­geously ambi­tious such a project might seem, it is triv­ial when com­pared to the mas­sive amount of labor that has already been deployed to dig­i­tize at the indi­vid­ual, cottage-industry level.

What’s espe­cially inter­est­ing, I think, is how this new prac­tice might qual­i­ta­tively affect research. In par­tic­u­lar, I won­der how the cre­ation and pop­u­la­tion of indi­vid­ual research queues, time-shifted for later con­sul­ta­tion, will influ­ence how schol­ars approach the gath­er­ing and analy­sis of evi­dence. Take, for exam­ple, the remark­able trans­for­ma­tion in the area of pre-1923 printed mate­ri­als. When­ever I encounter any ref­er­ence to any printed source, the first thing that I now do is to con­sult Google Books or Gal­lica to see whether there is a dig­i­tized edi­tion avail­able. If I find a dig­i­tal ver­sion — and most times I do — I add a copy to my Zotero library to be read later (and if it’s small for­mat, octavo or duodec­imo, usu­ally on my Kindle).

This work­flow has dra­mat­i­cally reduced the amount of time that I spend on-site in research libraries like the Bib­lio­thèque nationale de France, which is increas­ingly becom­ing just another nice quiet place to do work, a pur­pose much like that served by my neigh­bor­hood branch library when I was in grade school, only with RFID cards and a smok­ing lounge. But it has also hugely increased the amount of time that I now spend read­ing and “doing research” at home, at night, and on week­ends. More­over, it’s incred­i­bly easy to amass a mas­sive queue of dig­i­tized doc­u­ments and feel like one has “per­formed research” even though a good per­cent­age (most?) of those mate­ri­als might prove use­less. So in a sense, we’re not just talk­ing about time shift­ing an amount of research equiv­a­lent to say, 1998 lev­els, but rather that we’re simul­ta­ne­ously esca­lat­ing the evi­den­tiary basis for any research project.

Mike O’Malley and I have writ­ten about the chang­ing land­scape of his­tor­i­cal research in the face of abun­dant evi­dence.2 We agree that find­ing, as part of the research process, will inevitably decline as a val­ued skill as asso­ci­ated costs con­tinue to fall. In con­trast, syn­the­sis and con­tex­tu­al­iza­tion, always valu­able, will become even more impor­tant dif­fer­en­ti­at­ing qual­i­ties. Yet I won­der whether time shift­ing, and the risks it nec­es­sar­ily intro­duces, won’t so over­bur­den researchers that they fail to advance to the stage of the research cycle where they can begin to per­form mean­ing­ful analy­sis. How is time shift­ing affect­ing your research?

  1. Robert Darn­ton, “5 Myths About the ‘Infor­ma­tion Age’,” The Chron­i­cle of Higher Edu­ca­tion, April 17, 2011, sec. The Chron­i­cle Review, http://chronicle.com/article/5-Myths-About-the-Information/127105/. []
  2. Michael O’Malley, “Evi­dence and Scarcity,” The Aporetic, Octo­ber 2, 2010, http://theaporetic.com/?p=176 and Sean Takats, “Evi­dence and Abun­dance,” The Quin­tes­sence of Ham, Octo­ber 18, 2010, http://quintessenceofham.org/2010/10/18/evidence-and-abundance/. []

The End of (French) History

April 8th, 2011 § 3 comments § permalink

As a his­to­rian I like to think that I’m com­fort­able with the idea of fields dying. Maybe they’re reborn and live to fight another day (like diplo­matic his­tory, or so we all keep hear­ing), and maybe they never really die at all (like quan­ti­ta­tive his­tory, safely entrenched at Paris 1-Sorbonne).

But what about the old work­horse geo­graphic fields like, say, French his­tory?1 Unless you are Natalie Davis, being a French his­to­rian over the last five decades or more prob­a­bly meant that you, well, went to France to do your research. And why did you go? You told your depart­ment chair and your grant fun­ders and your col­leagues that you had to get to the archives and the libraries, of course. But let’s be hon­est here: French his­to­ri­ans don’t go to France to get to the doc­u­ments. Instead we wed our­selves to doc­u­ments that just so hap­pen to be found only in France, “forc­ing” us to go there, usu­ally when the weather there is super great, or at least super crappy at home. That’s why we became French his­to­ri­ans in the first place. There are a few excep­tions to this rule: the hand­ful of self-loathing sad sacks we all know who hate wine and cheese and cig­a­rettes, but this is no time to be cruel. » Read the rest of this entry «

  1. I’m focus­ing on France, of course, because it’s what I know. That said, I don’t see any com­pelling rea­son why the argu­ment that fol­lows couldn’t apply to any num­ber of national or transna­tional areas of inquiry. []

FATCamp Florence

March 27th, 2011 § 3 comments § permalink

This past week’s THAT­Camp Firenze was a huge suc­cess, offer­ing plenty of oppor­tu­ni­ties to learn about new projects and the var­i­ous national and inter­na­tional fla­vors of dig­i­tal human­i­ties that are flour­ish­ing in Europe. But seri­ously, on a blog with this title, how can I ignore the most spec­tac­u­lar part of the trip?

Zotero and AWS, or How We Learned to Stop Worrying and Love the Cloud (Part 1 of 2)

March 4th, 2011 § 23 comments § permalink

Zotero’s server infra­struc­ture has evolved in count­less ways since the project’s 2006 launch, but most of those changes are super bor­ing and not worth remem­ber­ing. Over the past two months, how­ever, we moved the bulk of Zotero’s back end to Ama­zon Web Ser­vices, a step that I believe is uniquely note­wor­thy in the con­text of dig­i­tal human­i­ties projects and their long-term sus­tain­abil­ity. In this post I describe the recent changes to Zotero’s archi­tec­ture. In the next post I’ll dis­cuss why these changes are impor­tant for the dig­i­tal human­i­ties. This story is long, but it has a moral, and also a van. » Read the rest of this entry «

Zotero Storage Goes Global

February 15th, 2011 § 5 comments § permalink

Because I live under a rock (Viet­nam), I only recently dis­cov­ered the Google Charts API. When I saw that it sup­ported maps, I thought it might be fun to plot the sales data for Zotero File Stor­age pro­vided by the non­profit cor­po­ra­tion I started along with a bunch of other aca­d­e­mics. Bear in mind that these maps only reflect the billing addresses asso­ci­ated with pur­chasers of Zotero stor­age. Zotero’s gen­eral user base is even more glob­ally dis­trib­uted and sev­eral orders of mag­ni­tude larger than the sub­set depicted here. Nonethe­less the results are stun­ning, I think, and some­thing that pleas­antly reminds me of the last throes of a game of Risk. We have work to do in Africa and the Mid­dle East. Click the thumb­nails below for full-size, detailed images.

Test Your Digital Humanities Knowledge

November 16th, 2010 § 16 comments § permalink

I met up with my French col­league Marin Dacos today while he was in the mid­dle of giv­ing an exam on dig­i­tal pub­lish­ing to his human­i­ties master’s stu­dents. While many U.S. grad­u­ate stu­dents (and pro­fes­sors) would balk at such a “fac­tual” exam, I sus­pect that they would have a very tough time get­ting through it unscathed.1

How would you or your stu­dents fare with a test like this? I’ve trans­lated the ques­tions into Eng­lish. You have ninety min­utes. Go!

Basic ques­tions and def­i­n­i­tions (13 points)

  1. In Wikipedia, what do “Diff” and “Edit war” mean?
  2. Name the XML for­mats use­ful for elec­tronic pub­lish­ing and spec­ify their particularities.
  3. How does PageR­ank work? What are the advan­tages of this sys­tem? What are its drawbacks?
  4. Who is Tim Berners-Lee? What is the W3C?
  5. What is meta­data? What is Dublin Core?
  6. What is DRM? What are its advan­tages and disadvantages?
  7. What is sin­gle source publishing?
  8. What is the dif­fer­ence between the PDF and EPUB formats?
  9. What is Zotero? What does it do?
  10. What is a DOI? What is name resolution?
  11. What is inter­op­er­abil­ity? What is OAI-PMH? What are the main verbs of OAI-PMH and what do they do?
  12. What is the atten­tion economy?
  13. What is the Cre­ative Com­mons License? What is its purpose?

Syn­the­sis (7 points)

  1. Elec­tronic pub­lish­ing falls into three cat­e­gories. For each type of elec­tronic pub­lish­ing, pro­vide a def­i­n­i­tion, at least one rep­re­sen­ta­tive exam­ple, its main tech­ni­cal char­ac­ter­is­tics, its prin­ci­pal qual­i­ties, and its major faults.
  2. The pub­lish­ing indus­try is search­ing for an eco­nomic model of elec­tronic pub­li­ca­tion. Present the dif­fer­ent strate­gies cur­rently under devel­op­ment (name, basic descrip­tion, exam­ple, advan­tages, disadvantages)
  1. Except for #9. Every­one can answer that one. []

Zotero and Open Access

November 12th, 2010 § 4 comments § permalink

Before the excite­ment sur­round­ing last month’s Open Access Week fades com­pletely — too late — I thought it might be appro­pri­ate to describe how and where Zotero inter­sects with OA. When peo­ple talk about Open Access, they typ­i­cally mean free access to pub­lished, usu­ally schol­arly, con­tent. It’s a con­cept that’s ide­o­log­i­cally easy for most researchers to get behind because few of us reap any direct finan­cial ben­e­fit from the major­ity of our pub­li­ca­tions, and we’re all very famil­iar with the annoy­ing fric­tions intro­duced by gat­ing access to con­tent. Cham­pi­oning Open Access is kind of like advo­cat­ing not club­bing baby seals: you’re unlikely to encounter much oppo­si­tion. » Read the rest of this entry «