COMM552/Trace ethnography response paper
Trace ethnography is a method for studying distributed cognition in sociotechnical networks (Geiger & Ribes, 3). The "traces" it considers are the publicly-accessible artifacts of online social activities. Examples of such traces include the view count and ratings on a YouTube video, the meta-data attached to a Twitter feed, or the timestamps on eBay bids. These data may be gathered by hand or with the assistance of software tools. Analysis of the collected traces proceeds according to an "ethnographically-derived understanding" of the peculiar norms and practices of the site under investigation (3). For researchers observing phenomena dispersed across multiple online spaces, this approach may reveal connections among simultaneous events that would be otherwise difficult to capture.
Researching Wikipedia's "vandal-fighting" population
Trace ethnography was introduced as a novel method in a 2010 paper about anti-vandalism practices on Wikipedia by Geiger & Ribes. Research conducted in 2006 concluded that vandalism was corrected primarily through the heroic efforts of a small group of dedicated volunteers (1). Geiger, an active Wikipedia editor, observed that a significant strategic change had taken place in the years since that data was collected. He conducted a preliminary study that indicated nearly 80% of all anti-vandalism activity could be attributed to fully- and semi-automated vandal-fighting software "bots" rather than human editors using the conventional web interface (3). Such "helperbots" simply did not exist when the earlier research was conducted.
This observation left the researchers curious about the "social roles of software tools in Wikipedia" (1). Many of the tools are "unofficial" technologies, requiring no sanction from the Wikipedia governance structure to operate. Yet these non-human actors enable a variety of decentralized collective intelligence activities that are crucial to the sustainability and scalability of the Wikipedia project. These activities include correcting syntax errors, adjusting article formatting, and identifying possible instances of vandalism. Previous research focused exclusively on the work of human actors but Geiger & Ribes' preliminary observations suggested a need to assess the editorial contributions of non-human actors.
To apprehend the process by which this dispersed collection of human and non-human actors identifies, documents, and bans a vandal, Geiger & Ribes needed a way to observe activities occurring simultaneously across Wikipedia. By default, MediaWiki, the open-source software on which Wikipedia is built, records the editorial activities of its users. Therefore every malicious edit by a vandal and response by a vandal-fighter resulted in the creation of a unique record in the Wikipedia database. By exploiting this feature, Geiger & Ribes were able to assemble a coherent timeline of events beginning with an initial malcious edit and concluding with the temporary banning of an anonymous vandal.
The banning of a vandal
The MediaWiki platform automatically publishes the latest changes to Wikipedia as a single real-time stream. The typical semi-autonomous vandal-fighting "helperbot" collects and evaluates each of these edits according to a set of tests developed by their developers. Example criteria might be, "Is the author logged in or editing Wikipedia anonymously?" and "Does this edit include sensitive keywords or profanity?" The bot will "flag" any suspicious new edits and queue them for review by a human editor (1). Should the editor determine that the edit is indeed vandalism, they "revert" (or correct) the changes and post a public warning to the author's "Talk" page.
Gieger & Ribes' research indicates that this process can take place in less than one minute and that a single human vandal-fighter might revert as many as 180 edits in a single session. Should the vandal continue to "deface" other articles on the site, they may be caught multiple times by different softwares and different editors. Each bot/editor team builds on the previous work and, from the point of view of the vandal, their actions appear consciously coordinated. Of course, none of the administrators need know about the work of any of the others. Instead, they rely on vandal-fighting conventions, the assumption of shared values, and knowledge that can be quickly ascertained from MediaWiki's traces.
In a recent blog post, Jimmy Wales, the founder of Wikipedia, explained that the project's editorial policies exist to encourage "constructive contributions" while discouraging those who want to "cause trouble" (2009). Geiger & Ribes' research underscores the ambiguity in each of these categories. By recreating the events that lead to a vandal being banned, they highlight the centrality of pre-determind algorithms and autonomous non-human actors in the "epistemic process" of policing user contributions (2). Although the vandal-fighters are not usually in direct communication with one another and each individual behaves idiosyncratically, their collected activities enforce a single epistemology that renders some contributions "constructive" and others "trouble" (2).
The growing number of edits made directly by bots or with the help of bots effectively "reshapes" the way in which both readers and editors engage with Wikipedia and its content (2). Calling upon actor-network theory, Geiger & Ribes consider the impact of "bots" as non-human agents in the vandal-fighting network. The decision to revert a contribution is "inherently moral", and thus the reliance on autonomous software represents a "redistribution of moral agency" away from humans to non-human actors (7). In some cases, questions like "Who is left out?" and "What is erased?" simply do not arise because filtering algorithms have the authority to revert edits without human oversight or approval (9).
Why are non-human actors absent previous research?
Geiger & Ribes make a compelling argument for the importance of bots in maintaining Wikipedia. Why haven't previous researches been able to identify this symbiosis? One possible reason is embedded in Wikipedia's software architecture. Unlike the Twitter platform, which identifies the client used to post each tweet, MediaWiki treats all edits the same, whether they are made from its web interface or posted from third-party software. This design makes it difficult to differentiate between the contributions of human and non-human actors. Geiger & Ribes found that bot developers had voluntarily begun to fill this need by automatically adding small signatures to the edits made by their software. Thus, a watchful researcher observing a page's history might notice that the comments on some edits begin with "(HG)", the signature of Huggle, a popular vandal-fighting tool (3). The trace ethnographer is then obliged to seek out Huggle, learn how it is used, and how its use is discussed within the editorial community. Without the attention to detail characteristic of ethnography, a purely computational approach might have missed the signature's subtle cue.
Reflections on Geiger & Ribes methodology
Historically large-scale studies of Wikipedia have relied on the data structures provided by the MediaWiki software. As noted above, this structure notably lacks specificity regarding the origin of given edits. Unfortunately, even long-term ethnographic explorations of large-scale distributed phenomena will necessarily be limited by the researcher's inability to ever grasp the totality of the scene. Trace ethnography may be a productive bridge between the scale of computational analyses and the detail of ethnographic approaches. By alternating between local ethnographic inquiry and large-scale data collection, each is enriched. The former benefits from the context and trends suggested by enormous volumes of data and the latter is iteratively improved through consistent on-the-ground observation.
Trace ethnography, as it is described by Geiger & Ribes, presents three intriguing advantages over either computational or ethnographic approaches alone. First, trace ethnography is responsive to the characteristics of its subject matter. By paying close attention to the proliferation of digital "traces", the method exploits a characteristic unique to online social interaction. Second, it takes into consideration users' embodied experiences. Acknowledging a multiplicity of interfaces forces the research to account for the wildly diverse appartuses available to online participants. For example, must research account for the differences among users of laptops, desktops, and mobile phones? Finally, trace ethnography presents a scalable alternative to multi-sited ethnography for studies of radically distributed phenomena. By leveraging the trace's potential for recreating a scene, a single ethnographer may be more judicious and strategic with how she spends her time in the field
How does "trace ethnography" differ from other ethnographic methods?
The analysis made by Geiger & Ribes relied on an intimate understanding of the Wikipedia community developed through traditionally ethnographic engagement. They participated in mailing lists, interacted with community members, and learned to use the vandal-fighting tools and helperbots they planned to observe. As increasing numbers of scholars grapple with the challenges and opportunities of online research, "trace ethnography" may cease to be a distinct category. Geiger & Ribes make a convincing case for the utility of digital "traces" but, sheer quantity aside, it is not clear that these artifacts are significantly unique from the artifacts conventionally considered by ethnographers: broadcast media, films, periodicals, etc. Furthermore, the use of computers among ethnographers to better understand the "traces" they collected will contribute to challenge the distinction between "qualitative" and "quantitative" methodologies.
Automating trace ethnography
In this research, Geiger & Ribes collected all of their "traces" by hand. The success of vandal-fighting software suggests that they might have been able to automate this process by either adapting an existing tool or developing their own. In addition to shifting some of the workload, developing or adapting local tools will enrich the researchers' ethnographic understanding of their site. The software development process will reveal discourses that are typically not audible among software users. Whether through writing code oneself or working in collaboration with programmers, the researchers may be able to better understand the degree to which the dominant software architectures enable and constrain the social interaction they observe.
The vast quantity of data available to the trace ethnographer may result in a false sense of certainty. For this reason, it is important that trace ethnography is embedded within a system of researches employing a diversity of methods. As suggested earlier, the collection of traces may be most productively paired with large-scale statistical analysis that gives the ethnographer rough assumptions and broad context for close-up observation. Furthermore, user experience is likely to vary widely to a degree that is difficult to ascertain without engaging users directly via interview, survey, or focus group.
Geiger & Ribes research suggests several potential advantages of working from the various "traces" the linger behind online social interactions. They engage in one such project by reconstructing the collective response of numerous human and non-human actors to vandalism on Wikipedia. Most compelling, however, may be their description of user-generated editorial software that leverages the digital "trace" toward the proliferation of new interfaces to web-based phenomena. Although the research described above was conducted primarily "by hand", the practices they describe should serve as inspiration for the design of future ethnographic research in online spaces.
- Geiger, R. S. and Ribes, D. (2010) The Work of Sustaining Order in Wikipedia: The Banning of a Vandal. In Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work (CSCW), ACM, New York (2010).
- Wales, J. (2009) “What the MSM Gets Wrong About Wikipedia -- and Why.” The Huffington Post, September 21. Retrieved from: http://www.huffingtonpost.com/jimmy-wales/what-the-msm-gets-wrong-a_b_292809.html