I want to begin with the premise that literature is the data of literary studies. The OED tells us that the term “data,” from classical Latin, refers to “an item of information” and to “related items of (chiefly numerical) information considered collectively, typically obtained by scientific work and used for reference, analysis or calculation.” “Data” is the plural of “datum,” deriving from the Latin “dare,” which means to give.” Hence, “datum” and “data” refer not only to information but also (and more generally) to “something given or granted; something known or assumed as fact, and made the basis of reasoning.”
I want to begin by arguing that the current state of affairs with respect to “data” and “literature,” itself a mirror of the entire structure that organizes the cultural relationship between the digital humanities and literary criticism, is bad for proponents on both sides. I mean in the most general possible way, but here I want to focus especially on the antagonism between data-based analysis of literary texts, which has been called “distant reading,” and the more historically traditional reading practice of focusing on small units of meaning, which we call, pretty loosely, “close reading.”
In my time today, I would like to offer a set of provocations that, I hope, will allow us to expand our understanding of the nature of data, its uses, and its implications for literary study. These provocations number three, and they are each derived from that walking provocation, Thomas Jefferson.
When I began thinking about this, I had to ask “What isn’t data in literary studies?” Everything is data, in some sense, and it depends on the position of the analyst and the nature of the project. So I want to narrow the question by situating it: what is data to whom? and for what? In this talk, “data” is that which can serve as input for computer analysis, by someone working with texts using the type of Natural Language Machine Learning I’ve worked with to isolate significant word clusters, topic modeling.
Addressing the question, "what is data in literary studies," offers the chance to enlarge our interpretational procedures to include new methods and materials. But also to apply existing methods of analysis to new materials and questions. Quantitative approaches to archives and texts developed by digital humanists have offered one such expansion. These approaches often treat literature as a data mine. In response, I propose that literature is a heuristic for managing and conceptualizing data.