I’ll offer a provisional definition of data as evidence, and then consider the Bechdel test’s data as an example whose political assumptions I find provocative and counterintuitive.
I want to think about the commonalities between the different kinds of data we use in literary studies, and I’ll risk, for the sake of argument, reducing “data” to “evidence.” Daniel Rosenberg suggests in “Raw Data” is an Oxymoron that data is that which functions rhetorically as evidence: it’s presented as given, and it backs up and in part generates our arguments. The various forms of literary data, within and outside of literary texts, fulfill the same argumentative functions for bibliographic scholars, sociologists of literature, distant readers, or even Freudian readers—as that which is given, as the evidence we marshal forth. For each of these forms of literary scholarly data, there’s a concomitant form of abstraction, a decision about what counts as data: this becomes a distinction between figure and ground, or signal and noise. Literary data emerges through a decision to engage in a Wittgensteinian seeing-as, as Natalia Cecire puts it, data is a chosen “abstraction we use to certain forms of inquiry possible.”
That leveling-out of “data,” I hope, might help to demystify some digital humanities uses of literary data. Literary data that’s algorithmically collected, I’d like to suggest, is aligned less with the instrumental reason of bureaucracy, surveillance, and discipline, than it is with cybernetic reason, which Andrew Pickering characterizes as a thoroughgoing pragmatism, an approach characterized by “black boxes,” or the bracketing of interiors and particulars, which introduces a tolerance for error in favor of function and scalability. Algorithmically (or formulaically) derived sets of data, whether machine or hand-collected—like Franco Moretti’s analysis of the lengths of novel titles—often aggregate extremely thin slices of data about individual texts to enable scalability. What I find most interesting, and politically consequential, about literary data in general are the assumptions inherent in acts of selecting what counts as data, the ways of seeing-as that ground the collection of data.
I’ll turn then to the Bechdel Test, which originates neither from DH nor from academic literary criticism, but from websites and blogs. The test rates films on a set of essentially algorithmic criteria, offering a fresh way of seeing the data of film. Several sites and a YouTube channel run by non-academics collect ratings of films based on the test: “one, it has to have at least two women in it, who, two, talk to each other about, three, something other than a man.” The characters in Alison Bechdel’s 1985 comic strip “The Rule” use Ridley Scott’s Alien as a not-so-innocent example. Bechdel’s test sets aside conventional criteria such as strong female character, individual choices, and interiority, which might qualify Alien as a feminist film. Instead, Bechdel testers ignore most of a film’s content in order to create what is essentially a character-network within each text, where each character is a node, and lines of shared dialogue constitute edges (or lines) between them. The Bechdel Test looks for female community, in both the conventional sense of the word and somewhere near the its more specialized sense in network theory. Bechdel jettisons conventional thinking about agency in literary texts in order to describe it as a network effect: that is, agency in our thoroughly connected world might be described as the potential reach of our ideas within a network. The data of this character network discard most of the literary data we’re used to paying attention to in favor of identifying character networks that leave room for female agency to develop as it will. Significantly for its historical moment—the pre-queer-theory 1980s—the original test avoids prescriptions about the content of female agency by focusing on the social structures through which a resolutely female agency might emerge.
With innovations in the kinds of data we use, then, come new sorts of arguments to be made about literature. If there’s a critical value that I’d most like to emerge here, it’s that we should consider seriously literary data of all kinds, especially as we cultivate the ability to navigate between different forms and scales of data.
 Rosenberg: “The term ‘data’ serves a different rhetorical and conceptual function than do sister terms such as ‘facts’ and ‘evidence.’ To put it more precisely, in contrast to these other terms, the semantic function of data is specifically rhetorical” (18). Daniel Rosenberg, “Data Before the Fact,” in “Raw Data" Is an Oxymoron, ed. Lisa Gitelman (Cambridge, Mass: MIT P, 2013), 15–40.
 Natalia Cecire (ncecire), “Almost nothing in the world IS data. ‘Data’ is an abstraction we use to make certain kinds of inquiry possible.” 28 October 2012. Tweet. For context, see Scott Selisker, “The Digital Inhumanities?: Responses to Stephen Marche,” Los Angeles Review of Books, 5 Nov 2012. https://lareviewofbooks.org/essay/in-defense-of-data-responses-to-stephen-marches-literature-is-not-data.
 Andrew Pickering, The Cybernetic Brain: Sketches of Another Future (Chicago: U of Chicago P, 2011), 18–23.
 Franco Moretti, “Style, Inc. Reflections on Seven Thousand Titles (British Novels, 1740–1850),” Critical Inquiry 36.1 (Autumn 2009), 134-158.
 Speaking of data: the user-edited database at http://bechdeltest.com lists 209 movies for 2013, and on January 5, 2014, it had 4683 ratings total on the site. Anita Sarkeesian’s YouTube.com channel, “Feminist Frequency,” has featured a series on “The Oscars and the Bechdel Test” (http://www.youtube.com/watch?v=PH8JuizIXw8), the latest of which has been viewed 457,000 times. Daniel Mariani, a biology student in Brazil, has done the most thorough and Moretti-esque set of visualizations of the genres, directors, writers, etc., that pass or do not pass the Bechdel Test: http://tenchocolatesundaes.blogspot.com/2013/06/visualizing-bechdel-test.html.