But What Does it Mean? The Challenge of Culturomics

Intervention

December 17, 2010

The announcement today in the New York Times about the forthcoming article in the journal Science by psychologist Jean-Baptiste Michel and mathematician Erez Lieberman Aiden about the creation of a digital storehouse got me to thinking. The two Harvard researchers have created the database by culling information from nearly 5.2 million books digitized by the internet giant Google. Truly, the creation of such a database is a momentous and exciting development for our field. Depending on what humanities researchers and others do with it, it may even offer what the NYT trumpets as a “new window on culture.” But much depends on what we do with the opportunity with which we have been presented. Without turning our backs on the possibilities opened up by the ability to quantify what has previously not been counted, humanities scholars must continue to exercise our ability to look beyond, behind, and around, as well as inside, the database.

The ability to quantify literary texts does not, as some humanities scholars may fear, eviscerate our traditional contributions to knowledge, but rather makes them more urgent. What we do best—examining the discursive framings of the questions being asked, and contextualizing historically and ideologically the texts being produced—remains crucially important. The effect of quantitative data, when it is presented in neat tables and visually attractive graphs, can indeed be stunning. The problem with the "stunning" quality of data—especially as it operates on those not accustomed to working with it—is that it can shut down inquiry. After all, the meaning of a graph appears so obvious: “Oh! Jimmy Carter was more important than Marilyn Monroe after all!” But whose idea was it to compare Marilyn Monroe to Jimmy Carter in the first place? And what does it mean? More specifically, what does it mean when, in what context, and to whom?

I wish to make a second point about this database. As comprehensive as it may appear to be, surely the words that appear in 5.2 million books are not all that can be counted as “culture.” What about that which has not captured by that treasure trove of data? What books, or newsletters, or chapbooks were judged too ephemeral, or just too unworthy, to be scanned? What kinds of sayings, oral stories, or urban legends barely or never make it into print, and so are by necessity excluded or undercounted? What about habits of interaction and ways of being in the world that are gestural, or visual?

So much has been written about some canonical authors that scholars now argue about whether an author or his or her editor should be credited with the elegance of the resulting prose. There will come a time, I have no doubt, where scholars will compare the number of semi-colons a canonical author uses in one text with the number he or she uses in another. Overkill? Maybe, or maybe not. But it is the kind of project that could be facilitated by the easy availability of quantitative data. Meanwhile, the kinds of scholarly projects that examine underground knowledges, insurgent cultural formations, or non-text-based literatures are at risk of becoming even more marginalized than they already are—or even being reduced to the status of “not-knowledge.”

I look forward to meeting the challenges presented to us by “culturomics.” I hope we meet it well.