Data: Three Provocations

Copy Press. Thomas Jefferson Foundation, Inc.

In my time today, I would like to offer a set of provocations that, I hope, will allow us to expand our understanding of the nature of data, its uses, and its implications for literary study. These provocations number three, and they are each derived from that walking provocation, Thomas Jefferson.

The first emerges from Jefferson’s process of data collection, through which I want to call attention to the materiality of data—of his data, and all data—that is too often overlooked in our digital age. Jefferson was famously aware of his own historical legacy, and took active steps to influence that legacy by amassing an immense personal archive. He sought to acquire “one of those copying Machines” in 1783, almost as soon as he learned of their existence, and in 1804, he would purchase one of the first polygraph devices, which represented the next generation of copying technology.

As a result, the 18,000 documents that Jefferson himself composed and then copied, as well as a significant portion of the 25,000 additional documents that he received and subsequently archived, are now available in digital form. But we no longer have access to, for instance, the texture of the porous copying paper that Jefferson so diligently imported from London; the wetness (and likely, the malodorous smell) of the iron-gall ink that he used in his press; or the sound of the advancing roller, which forced the ink through the copying paper, resulting in a facsimile of the original document that, once dry, could be turned over and read from the back.

Attending to the materiality of this process provides a way to understand how the act of data collection influences the data itself. We might follow this pathway, for example, from the technologies of data collection, to the hands that operated the machines—in Jefferson’s case, not only his own, but those of the series of workers who cleaned the ink from the machine’s rollers, and then filed the archival copies, whose names we may or may not know. This form of “lossy” data is no more apparent to us today—think only of the ghostly hands that occasionally appear in the images provided by Google books—but by attending to data’s materially, we can begin to tell new stories about its collection, its implications, and its use.

The second provocation I want to present relates to the organization of data, and it emerges from another aspect of Jefferson’s archive—in particular, from the classification scheme that accompanied his 1783 catalog of books. This collection was the one offered for sale to the federal government after the War of 1812, and provided the seed collection for today’s Library of Congress.

You can see here an illustration of the scheme’s main attributes: three basic divisions—History, Philosophy, and Fine Arts—which were “applied retrospectively” to the main categories identified by Francis Bacon in his Advancement of Learning (you can see those noted above). Then you see the forty-four so-called “chapters” into which Jefferson sub-divided his collection—a departure from Bacon—visually represented here as a hierarchy.

This idea for a hierarchical system of classification came from Carl Linnaeus, with whom Jefferson sided against the Comte du Buffon, and his ideas about the negative environmental influence of North America. This is the same theory of degeneration that Jefferson famously aimed to refute in textual form in Notes on the State of Virginia. And in fact, the seemingly utilitarian organization of this library data is undergirded by the same theoretical stance. So attending to these contexts, as literary critics are trained to do, can reveal the argument that underlies the structure of this—or any—ostensibly anodyne list. 

The final provocation I will offer relates to the content of data—that is to say, to what (or whom) the data represents. Here you see presented—in a form not unlike the library scheme—is the data of Jefferson’s Farm-book, the small leather-bound volume in which Jefferson recorded the names, birthdates (when known), present locations, and countries of origin of the men, women, and children he enslaved.

In contrast to the organization of Jefferson’s library, this presentation of people exposes a different system of order and control—one that enabled Jefferson to erroneously believe that the men and women on his plantation might become objects of empirical knowledge, not only controlled, but also understood, through quantifiable, visualizable facts.

This data, and its visual display, thus illustrate the imperative of examining any process that reduces persons to objects, and stories to names. To be clear: I do not wish to draw a comparison between chattel slavery and anything else. Rather, I want to suggest that our role, as literary critics in the data age, is to attend to the epistemological gap made manifest by Jefferson’s farm book, and to consider what new knowledge we might work towards, and what new stories we might tell, through an expanded sense of the uses—and meanings—of data today.

Lauren Klein is an assistant professor in the School of Literature, Media, and Communication at Georgia Tech. Her research interests include early American literature and culture, food studies, media studies, and the digital humanities. Her writing has appeared, most recently, in Early American Literature, American Literature, and American Quarterly. She has taught at Brooklyn College and at Macaulay Honors College, both branches of the City University of New York. Between 2007 and 2008, she worked as an educational technology consultant for One Laptop per Child, a non-profit aimed at bringing low-cost laptops to children in the developing world.