Lauren Klein – DH LAB https://dhlab.lmc.gatech.edu The Digital Humanities Lab at Georgia Tech Tue, 19 Mar 2019 13:20:11 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 41053961 Floor Chart Topper https://dhlab.lmc.gatech.edu/floorchart/floor-chart-topper/ https://dhlab.lmc.gatech.edu/floorchart/floor-chart-topper/#respond Tue, 19 Mar 2019 13:19:33 +0000 https://dhlab.lmc.gatech.edu/?p=904 Before we forget to post, some photos of the floor chart topper! (Designed and sewn by Sarah Schoemann).

]]>
https://dhlab.lmc.gatech.edu/floorchart/floor-chart-topper/feed/ 0 904
Vectors of Freedom https://dhlab.lmc.gatech.edu/tome/vectors-of-freedom/ https://dhlab.lmc.gatech.edu/tome/vectors-of-freedom/#respond Wed, 01 Aug 2018 18:42:01 +0000 https://dhlab.lmc.gatech.edu/?p=727 For the past several years, the DH Lab has been working on a project, TOME, aimed at visualizing the themes in a corpus of nineteenth-century newspapers. In designing this tool, our central motivation was to be able to more clearly trace the various and often conflicting conversations about slavery and its abolition that were taking place in these papers, which spanned multiple audiences and communities. (More info here).

Around the same time, a team at the University of Delaware launched the Colored Conventions Project, aimed at recovering the advocacy work performed at the Colored Conventions–organizing meetings in which Black Americans, fugitive and free, strategized about how to achieve legal, labor, and educational justice. Among the key interventions of the CCP project is to emphasize how, in the nineteenth century, organizing work took place in person as much as on the page; and how this work was performed by collectives as much as individuals.

Taking this scholarship into account, we realized that the story told in the corpus of newspapers that we’d assembled for the TOME project was, in all likelihood, a very different one from the story told through the Colored Conventions. We thought could learn more about both conversations by looking at them together. We could ask questions like: “How did themes travel from the conventions into print, or the other way around?” “Were there people or groups who played prominent roles in one venue, or the other, or both?” “What are the key differences between the conversations that took place in person vs. those that took place in print?” And, crucially: “Who are the people or groups who have not yet been recognized for their contributions, but should be?”

In Summer 2018, Arshiya Singh (BS CS ’18), advised by Dr. Klein, began to lay the groundwork for some of the models that will help to answer these questions. What follows are a series of blog posts that document our progress.

NB: Our work employs the CCP Corpus in addition to our own. In making use of that corpus, we honor the CCP’s commitment to a use of data that humanizes and acknowledges the Black people whose collective organizational histories are assembled there. Although the subjects of datasets are often reduced to abstract data points, we affirm and adhere to the CCP’s commitment to contextualizing and narrating the conditions of the people who appear as “data” and to name them when possible. 

 

 

]]>
https://dhlab.lmc.gatech.edu/tome/vectors-of-freedom/feed/ 0 727
Repairing William Playfair at the MLA https://dhlab.lmc.gatech.edu/uncategorized/repairing-william-playfair-at-the-mla/ https://dhlab.lmc.gatech.edu/uncategorized/repairing-william-playfair-at-the-mla/#respond Thu, 25 Feb 2016 15:55:24 +0000 http://dhlab.lmc.gatech.edu/?p=161 What follows is the text of a talk about the Recreating Playfair project, delivered on a panel about care and repair at the 2016 MLA Annual Convention. Several extended publications are in the works, but in the interim, feel free to email me with any questions. 

This talk describes a project that began from a seemingly simple idea: to digitally recreate some of the iconic charts of William Playfair, the eighteenth data visualization pioneer. My premise was that, by remaking these charts, I’d come to see Playfair’s designs, as well as their underlying concepts, in a new light. So this was a reparative project of the kind theorized by Steve Jackson, whose work helped title this session; and also of a kind with the historical fabrication projects that Jentery here, as well as Devin Elliot, Robert MacDougal, and William Turkel have pursued to such generative ends. But there was an additional dimension: by remaking a historical image with digital tools, I also hoped to better understand some of the tools that I use in my own DH work—and in particular, a visualization tool that I and many other DHers frequently employ: the javascript library D3.

So I recruited two Georgia Tech graduate students, Caroline Foster and Erica Pramer, to the project, and we set to work. But then what happens is what always happens: a story enters in.

So what I’ll talk about today is how this project of repairing William Playfair indeed allows us to come to see some of the main concepts associated with D3, and with visualization more generally, in a more enhanced conceptual light—the concepts of data and image, primarily, as you can see on the screen, but also ideas about error, iteration, and interaction (although I won’t have time to discuss those as much).

But I also want to try to articulate some of the features of this project that make it a digital humanities project, and more importantly, what can be gained by conceiving of it as such. Because this is the kind of project that could have been undertaken by practitioners in any number of disciplines, not to mention by any number of nerds on the internet—which, as I found it over the course of the process, turned out to be the case. But as the fields of digital humanities and design increasingly converge, it will be important to demonstrate what, precisely, our humanistic training contributes to this confluence: an attention to history, to how the past informs the present, and to how individual stories—and individual lives—bring out additional dimensions in our objects of study and methods of inquiry.

So, to begin.

Slide02William Playfair, as I mentioned before, is widely recognized as one of the founders of modern data visualization. He’s credited with perfecting the time-series chart, and with inventing the bar chart and the pie chart out-right. Before Playfair, there had been mathematical line graphs, as well as astronomical charts. But as far as we know—with the standard caveat about the limits of the archive—Playfair was the first to abstract non-mathematical data, like the fluctuating price of wheat over time, which is what you see on the right; or the balance of trade between England and America, which is on the left; into visual form.
Slide03Here’s a larger view of the latter, a time-series chart of imports and exports between England and America, for the period between 1700 and 1800. And you can easily perceive—and Playfair liked that you could easily perceive—the massive reversal in, and then what economists today would call the “correction” in the balance of trade between England and America for the decade leading up to, and the decade after, the Revolutionary War.

But remaking this image in D3, or, for that matter, remaking the image in any digital form, requires a different, more atomizing view. You look for the axes and their units. You look for the grid lines—some major (those are the bolded ones), and some minor. You make note of the labels. You think how you might replicate the font. You observe the colors of the data lines, of the fill between them.

One thing you don’t pay too much attention to these days, however, is the actual shape of the graph. As anyone who’s used even Microsoft Excel knows well, you plug in the data, and software takes care of the rest. In fact, when working with visualization software, you need the data before can do anything else.

Slide04But Playfair didn’t. He engraved the data lines of his chart freehand. It wasn’t until James Watt, of steam engine fame, suggested to Playfair that he might include some data tables alongside his chart, that he did so. “Your charts,” Watt explained “now seem to rest on your own authority, and it will naturally be enquired from whence you have derived our intelligence.” So Playfair capitulated, at least for the first two editions of the book, published in 1786 and again in 1787.

But after he gained in confidence, he removed them for the third edition, published in 1801–the largest (and final) print run for the book. He understood the function of his charts as quite distinct from tables, or “figures” as he termed them, and in introduction to this edition, he explains, “The advantage proposed by this method, is not that of giving a more accurate statement than by figures,  but it is to give a more simple and permanent idea.” In fact, two contemporary scholars of statistics, Howard Wainer and Ian Spence, have created updated renderings of several of Playfair’s charts in order to show how the data tables he provided do not sufficiently account for the numerous reversals in the balance of trade that Playfair depicts.

But for Playfair, this lack of data was besides the point. His goal was to model a new mode of “painting to the eye,” one that—following Locke and the dominant Enlightenment view—could be easily perceived by the senses, and subsequently processed by the mind. In other words, Playfair’s understanding of the use and significance of the image is on equal plane with, but—crucially—not connected to the use and significance of the data. The former provides a “big picture” view, one that produces knowledge through sense perception; the latter provides discrete information, the material from which knowledge is first constructed and then disseminated to a receiving and perceiving audience.

The separation between Playfair’s data and the charts he designed is no longer operational, however. D3 promotes itself as a “JavaScript library for manipulating documents based on data” (2015). Its innovation lies not in any new mode of graphical representation, but instead the ease and efficiency with which a dataset can be rendered visible, on the web, according to any conceivable visual form. There’s a lot more to be said about how, in its reliance on the Document Object Model, or DOM, which provides the organizing structure of the web, helps show how the shift from print to digital entails a more complex transformation than merely shifting from page to screen. But I don’t want to lose sight of my principal focus here: the relation between data and image.

Slide06So what you see here is one version of Playfair’s chart in D3, skillfully remade by Caroline Foster, a graduate student in Human-Computer Interaction at Georgia Tech. The middle part, as you can see, replicates some of the color fills of Playfair’s original chart. But the grey areas are intended to indicate her interpolation. Reconstituting the original dataset from the tables Playfair provided in earlier volumes, we realized that the data existed in two separate tables: yearly data in one table, and decade-by-decade data—but only for the years that you see—in another. More than merely a practical issue, this structural and procedural dependency on data points to an evolving—and culturally situated—understanding of the validity of data, and of the (presumed) role of visualization in making its significance clear. Contra Lisa Gitelman, among others, who has worked to draw our attention to the inherent contradiction that the notion of “raw data” entails, this structural reliance on data enforces an understanding of the notion of data as a stable foundation from which knowledge can be drawn.

This reliance on data is not unique to D3. Almost all contemporary visualization tools and platforms require data before an image can be drawn. But in D3, the dependency goes deeper—into the structure of D3 itself. One of the tag-lines of D3 is “transformation, not representation.” So in contrast to a language like Processing, where you conceive of a virtual canvas onto which you then “draw” graphical elements through a set of standard functions, like “triangle” or “ellipse”; in D3, the designer must conceive of the ultimate image (or possible interactions with the image) as emerging from the data themselves, transforming from numerical to visual form without recourse to any virtual canvas or page. In other words, by binding the dataset to the nodes of the document-object-model, D3 bypasses the intermediary representational layer—the layer that creates paragraphs and tables and such that we associate with HTML. Instead, it operates at the level of the what is known as the scenegraph, the data structure that contains the information about the logical—as distinct from the visual—arrangement of elements on any particular web page. The result is a chart that looks like the original, but one that masks the significant shifts, technical as much as conceptual, between the eighteenth century and today.

Thus far, I have been focused on the conceptual shifts that a focus on the tools of DH making bring to light. But there are an additional set of shifts that emerge when considering the technologies that Playfair himself employed.

Slide09So take a look at the original image one last time. And in particular, take note of the black line, on the far right, that trails off the page. What you’re seeing is an engraving error, almost certainly made by Playfair himself. Although he’s since entered the pantheon of visualization demigods, Playfair was, by his own admission, “long anxious” to be acknowledged as an innovator. More to the point, he was almost always nearly broke. So while he chose to commission one of the most skilled engravers in all of London, Samuel John Neele, to produce the plates for his Atlas, he likely requested that Neele work at speed, so as to minimize the costly detailing and other flourishes for which he was known. It’s believed that Neele engraved the charts’ decoration, framing, titles, and other lettering, leaving Playfair to engrave the lines of imports and exports by himself.

And just a bit about engraving in order to understand how this error came to be introduced: to produce a plate like the one used to print this chart, a thin copper disc is first coated with a ground: usually a layer of wax, varnish, chalk, or soot. Using a stylus, the engraver traces an outline of the design in mirror image into the ground. The temporary layer is then removed, and the engraver employs the faint impression that remains to guide the subsequent inscription process. With a metal tool called a burin–and here’s where it gets quite hard–the engraver carves the image directly into the copper plate—a process that requires significant physical strength. Playfair’s error was a common one—a slip of a tired hand—but its frequent occurrence would not have made it any more tolerable. Playfair’s result, an image inscribed into copper, when considered in the context of the time and money invested—not to mention the personal and intellectual stakes—might as well have been set in the proverbial stone.

With visualizations like the one Foster produced, as with most digital artifacts released only in final form, we don’t see the errors of their making. The trial and error happens in the implementation process, and there’s little incentive to—and for that matter, not much of a method or even a language—for exploring what these moments of breakdown might reveal. So in closing, I’ll say only that what began as a simple project, to me, has come to represent the beginning of what a richer intersection of DH and design, as both critical and creative, as both theoretical and applied, could soon become. This is a study that connects past to present, and employs the tools and techniques of the humanities–close reading, historical synthesis, and attention to the archival documents that serve as entry-points into the stories of actual lives–as a way to demonstrate how making, the conditions of making, and their consequences, are forever intertwined.

]]>
https://dhlab.lmc.gatech.edu/uncategorized/repairing-william-playfair-at-the-mla/feed/ 0 161
Talk at Digital Humanities 2014 https://dhlab.lmc.gatech.edu/talks/talk-at-digital-humanities-2014/ https://dhlab.lmc.gatech.edu/talks/talk-at-digital-humanities-2014/#comments Thu, 24 Jul 2014 04:05:37 +0000 http://dhlab.lmc.gatech.edu/?p=109 What follows is the text of a talk about the TOME project delivered at the Digital Humanities 2014 conference in Lausanne, Switzerland. We’re in the process of writing up a longer version with more technical details, but in the interim, feel free to email me with any questions. 

NB: For display purposes, I’ve removed several of the less-essential slides, but you can view the complete slidedeck here.

Just over a hundred years ago, in 1898, Henry Gannett published the second of what would become three illustrated Statistical Atlases of the United States. Based on the results of the Census of 1890– and I note, if only to make myself feel a little better about the slow pace of academic publishing today, eight years after the census was first compiled– Gannett, working with what he openly acknowledged as a team of “many men and many minds,” developed an array of new visual forms to convey the results of the eleventh census to the US public.

Slide04The first Statistical Atlas, published a decade prior, was conceived in large part to mark the centennial anniversary of the nation’s founding. That volume was designed to show the nation’s territorial expansion, its economic development, its cultural advancement, and social progress. But Gannett, with the centennial receding from view, understood the goal of the second atlas in more disciplinary terms: to “fulfill its mission in popularizing and extending the study of statistics.”

It’s not too much of a stretch, I think, to say that we’re at a similar place in the field of DH today. We’re moved through the first phase of the field’s development– the shift from humanities computing to digital humanities– and we’ve addressed a number of public challenges about its function and position in the academy. We also now routinely encounter deep and nuanced DH scholarship that is concerned digital methods and tools.

And yet, for various reasons, these tools and methods are rarely used by non-digitally-inclined scholars. The project I’m presenting today, on behalf of a project team that also includes Jacob Eisenstein and Iris Sun, was conceived in large part to address this gap in the research pipeline. We wanted to help humanities scholars with sophisticated, field-specific research questions employ equally sophisticated digital tools in their research. Just as we can now use search engines like Google or Apache Solr without needing to know anything about how search works, our team wondered if we could develop a tool to allow non-technical scholars employ another digital method– topic modeling– without needing to know how it worked. (And I should note here that we’re not the first to make this observation about search; Ben Schmidt and Ted Underwood, as early as 2010, have also published remarks to this end).

Slide05Given this methodological objective, we also wanted to identify a set of humanities research questions that would inform our tool’s development. To this end, we chose a set of nineteenth-century antislavery newspapers, significant not only because they provide the primary record of slavery’s abolition, but also because they were one of the first places, in the United States, where men and women, and African Americans and whites, were published together, on the same page. We wanted to discover if, and if so, how these groups of people framed similar ideas in different ways.

For instance, William Lloyd Garrison, probably the most famous newspaper editor of that time (he who began the first issue of The Liberator, in 1831, with the lines, “I will not equivocate — I will not excuse — I will not retreat a single inch — AND I WILL BE HEARD”) decided to hire a woman, Lydia Maria Child, to edit the National Anti-Slavery Standard, the official newspaper of the American Anti-Slavery Society. Child was a fairly famous novelist by that point, but she also wrote stories for children, and published a cookbook, so Garrison thought she could “impart useful hints to the government as well as to the family circle.” But did she? And if so, how effective– or how widely adopted– was this change in topic or tone?

Slide07The promise of topic modeling for the humanities is that it might help us answer questions like these. (I don’t have time to give a background on topic modeling today, but if you have questions, you can ask later). The salient feature, for our project, is that these models are able to identify sets of words (or “topics”) that tend to appear in the same documents, as well as the extent to which each topic is present in each document. When you run a topic model, as we did using MALLET, the output typically takes the form of lists of words and percentages, which may suggest some deep insight — grouping, for example, womanrights, and husband — but rarely offer a clear sense of where to go next. Recently, Andrew Goldstone released an interface for browsing a topic model. But if topic modeling is to be taken up by non-technical scholars, interfaces such as this must be able to do more than facilitate browsing; they must enable scholars to recombine such preliminary analysis to test theories and develop arguments.

Slide08In fact, the goal of integrating preliminary analytics with interactive research is not new; exploratory data analysis (or EDA, as it’s commonly known) has played a fundamental role in quantitative research since at least the 1970s, when it was described by John Tukey. In comparison to formal hypothesis testing, EDA is more, well, exploratory; it’s meant to help the researcher develop a general sense of the properties of his or her dataset before embarking on more specific inquiries. Typically, EDA combines visualizations such as scatterplots and histograms with lightweight quantitative analysis, serving to check basic assumptions, reveal errors in the data-processing pipeline, identify relationships between variables, and suggest preliminary models. This idea has since been adapted for use in DH– for instance, the WordSeer project, out of Berkeley, frames their work in terms of exploratory text analysis. In keeping with the current thinking about EDA, WordSeer interweaves exploratory text analysis with more formal statistical modeling, facilitating an iterative process of discovery driven by scholarly insight.

Slide10EDA tends to focus on the visual representation of data, since it’s generally thought that visualizations enhance, or otherwise amplify, cognition In truth, the most successful visual forms are perceived pre-cognitively; their ability to guide users through the underlying information is experienced intuitively; and the assumptions made by the designers are so aligned with the features of their particular dataset, and the questions that dataset might begin to address, that they become invisible to the end-user.

 

Slide11So in the remainder of my time today, I want to talk through the design decisions that have influenced the development of our tool as we sought to adapt ideas about visualization and EDA for use with topic modeling scholarly archives. In doing so, my goal is also to take up the  call, as recently voiced by Johanna Drucker, to resist the “intellectual Trojan horse” of humanities-oriented visualizations, which “conceal their epistemological biases under a guise of familiarity.” What I’ll talk through today should, I hope, seem at once familiar and new. For our visual design decisions involved serious thinking about time and space, concepts central to the humanities, as well as about the process of conducting humanities research generally conceived. So in the remainder of my talk, I’ll present two prototype interface designs, and explain the technical and theoretical ideas that underlie each, before sketching the path of our future work.

Slide12Understanding the evolution of ideas– about abolition, or ideology more generally– requires attending to change over time. Our starting point was a sense that whatever visualization we created needed to highlight, for the end-user, how specific topics–such as those describing civil rights and the Mexican-American War, to name two that Lydia Maria Child wrote about– might become more or less prominent at various points in time. For some topics, such as the Mexican-American War, history tells us that there should be a clear starting point. But for other topics, such as the one that seems to describe civil rights, their prevalence may wax and wane over time. Did Child employ the language of the home to advocate for equal rights, as Garrison hoped she would? Or did she merely adopt the more direct line of argument that other (male) editors employed?

To begin to answer these questions, our interface needed to support nuanced scholarly inquiry. More specifically, we wanted the user to be able to identify significant topics over time for a selected subset of documents– not just in the entire dataset. This subset of documents, we thought, might be chosen by specific metadata, such as newspaper title; this would allow you to see how Child’s writing about civil rights compared to other editors work on the subject. Alternately, you might, through a keyword search, choose to see all the documents that dealt with issues of rights. So in this way, you could compare the conversation around civil rights with the one that framed the discussion about women’s rights. (It’s believed that the debates about the two issues developed in parallel, although often with different ideological underpinnings).

At this point, it’s probably also important to note that in contrast to earlier, clustering-based techniques for identifying themes in documents, topic modeling can identify multiple topics in a single document. This is especially useful when dealing with historical newspaper data, which tends to be segmented by page and not article. So you could ask: Did Child begin by writing about civil rights overtly, with minimal reference to domestic issues? Or did Child always frame the issue of civil rights in the context of the home?

Slide14Our first design was based on exploring these changes in topical composition. In this design, we built on the concept of a dust-and-magnets visualization. Think of that toy where you could use a little magnetized wand to draw a mustache on a man; this model treats each topic as a magnet, which exerts force multiple specks of dust (the individual documents). (At left is an image from an actual dust-and-magnets paper).
In our adaptation of this model, we represented each newspaper as a trail of dust, with each speck– or point– corresponding to a single issue of the newspaper. The position of each point, on an x/y axis, is determined by its topical composition, with respect to each topic displayed in the field. That is to say– the force exerted on each newspaper issue by a particular topic corresponds to the strength of that topic in the issue. In the slide below, you can see highlighted the dust trail of the Anti-Slavery Bugle as it relates to five topics, including the civil rights and women’s rights topics previously mentioned. (They have different numbers here). I also should note that for the dust trails to be spatially coherent, we had to apply some smoothing. We also used color to convey additional metadata. Here, for instance, each color in a newspaper trail corresponds to a different editor. So by comparing multiple dust-trails, and by looking at individual trails, you can see the thematic differences between (or within) publications.

Slide15Another issue addressed by this design is the fact that documents are almost always composed of more than two topics. In other words, for the topics’ force to be represented most accurately, they must be arranged in an n-dimensional space. We can’t do that in the real world, obviously, where we perceive things in three dimensions; let alone on a screen, where we perceive things in two. But while multidimensional information is lost, it’s possible to expose some of this information through interaction. So in this prototype, by adjusting the position of each topic, you can move through a variety of spatializations. Taken together, these alternate views allow the user to develop an understanding of the overall topical distribution.

This mode also nicely lends itself to our goal of helping users to “drill down” to a key subset of topics and documents: if the user determines a particular topic to be irrelevant to the question at hand, she can simply remove its magnet from the visualization, and the dust-trails will adjust.

This visualization also has some substantial disadvantages, as we came to see after exploring additional usage scenarios. For one, the topical distributions computed for each newspaper are not guaranteed to vary with any consistency. For instance, some topics appear and disappear; others increase and decrease repeatedly. In these cases, the resultant “trails” are not spatially coherent unless smoothing is applied after the fact. This diminishes the accuracy of the representation, and raises the question of how much smoothing is enough.

Another disadvantage is that while the visualization facilitates the comparison of the overall thematic trajectories of two newspapers, it is not easy to align these trajectories– for instance, to determine the thematic composition of two newspapers at the same point in time. We considered interactive solutions to this problem, like adding a clickable timeline that would highlight the relevant point on each dust trail. However, these interactive solutions moved us further from a visualization that was immediately intuitive.

At this point, we took a step back, returning to the initial goal of our project: facilitating humanities research through technically-sophisticated means. This required more complex thinking about the research process. There is a difference, we came to realize, between a scholar who is new to a dataset, and therefore primarily interested in understanding the overall landscape of ideas; and someone who already has a general sense of the data, and instead, has a specific research question in mind. This is a difference between the kind of exploration theorized by Tukey, and a different process we might call investigation. More specifically, while exploration is guided by popularity—what topics are most prominent at any given time—investigation is guided by relevance: what topics are most germane to a particular interest. We wanted to facilitate both forms of research in a single interface.

Slide16With this design, at left, it’s time that provides the structure for the interface, anchoring each research mode– exploration and investigation– in a single view. Here, you see the topics represented in “timeline” form. (The timeline-based visualization also includes smooth zooming and panning, using D3’s built-in zoom functionality). The user begins by entering a search term, as in a traditional keyword search. So here you see the results for a search on “rights,” with each topic that contains the word “rights” listed in order of relevance. This is like the output of a standard search engine, like Google, so each topic is clickable– like a link.

Rather than take you to a web page, however, clicking on a topic gets you more information about that topic: its keywords, its overall distribution in the dataset, its geographical distribution, and, eventually, the documents in the dataset that best encapsulate its use. (There will also be a standalone keyword-in-context view).

Another feature under development, in view of our interest in balancing exploration and investigation, is that the height–or thickness- of any individual block will indicates its overall popularity. (We actually have this implemented, although it hasn’t yet been integrated into the interface you see). For example, given the query “rights,” topic 59, centered on women’s rights, represented in blue at the top right, may be most relevant– with “rights” as the most statistically significant keyword. But it is also relatively rare in the entire dataset. Topic 40, on the other hand, which deals with more general civil and political issues, has “rights” as a much less meaningful keyword, yet is extremely common in the dataset. Each of these topics holds significance for the scholar, but in different ways. Our aim is to showcase both.

Slide17Another feature to demonstrate is a spatial layout of topic keywords. In the course of the project’s development, we came to realize that while the range of connotations of individual words in a topic presents one kind of interpretive challenge, the topics themselves can at times present another– more specifically, when a topic includes words associated with seemingly divergent themes. So for instance, in T56, the scholar might observe a (seemingly) obvious connection, for the nineteenth-century, between words that describe Native Americans and those that describe nature. However, unlike the words “antelope” or “hawk,” the words “tiger” and “hyena,” also included in the topic, do not describe animals that are native to North America. Just looking at the word list, it’s impossible to tell whether the explanation lies in a new figurative vocabulary for describing native Americans, or whether this set of words is merely an accident of statistical analysis.

Slide18So here, on the left, you see a spatial visualization of the topic’s keywords using multidimensional scaling, in which each keyword is positioned according to its contextual similarity. Here, the terms “indian”, “indians”, and “tribes” are located apart from “hyena”, “tiger”, and “tigers”, which are themselves closely associated. The spatial layout suggests a relatively weak connection between these groups of terms. For comparison, at right is a spatial visualization for a topic relating to the Mexican-American War, in which terms related to the conduct of the war are spatially distinguished from those related to its outcome.

Slide19But returning, for a minute, to the overall view, I’ll note just that there are limitations to this interface as well, owing to the fact of translating textual and temporal data into a spatial view. Through our design process, though, we came to realize that the goal should not be to produce an accurate spatial representation of what is, after all, a fundamentally non-spatial data. Rather, our challenge was to create a spatial transformation, one that conveyed a high density of information while at the same time allowed the scholar to quickly and easily reverse course, moving from space back to the original, textual representation.

Our project is far from concluded, and we have several specific steps we plan to accomplish. In addition to implementing the information about specific topics, our most pressing concern, given our interest in moving from text to space and back to text again, is to implement the KWIC view. We also plan to write up our findings about the newspapers themselves, since we believe this tool can yield new insights into the story of slavery’s abolition.

But I want to end with a more theoretical question that I think our visualization can help to address– in fact, one that our interface has helped to illuminate without our even trying.

Slide20I began this presentation by showing you some images from Henry Gannett’s Statistical Atlas of the United States. You’ll notice that one of these images bears a striking similarity to the interface we designed. Believe it or not, this was unintentional! We passed through several intermediary designs before arriving at the one you see, and several of its visual features: the hexagon shape of each blog, and the grey lines that connect them, were the result of working within the constraints of D3. But the similarities between these two designs can also tell us something, if we think harder about the shared context in which both were made.

Slide21So, what do we have in common with Henry Gannett, the nineteenth century government statistician? Well, we’re both coming at our data from a methodological perspective. Gannett, if you recall, wanted to elevate statistics in the public view. By integrating EDA into our topic model exploration scheme, our team also aims to promote a statistical mode of encountering data. But that I refer to our abolitionist newspaper data as “data” is, I think, quite significant, because it helps to expose our relation to it. For antislavery advocates at the time– and even more so for the individuals whose liberty was discussed in their pages– this was not data, it was life. So when we are called upon, not just as visualization designers, but as digital humanities visualization designers, to “expose the constructedness of data”—that’s Johanna Drucker again, who I mentioned at the outset. Or, to put it slightly differently, to illuminate the subjective position of the viewer with respect to the data’s display, we might think of these different sets of data, and their similar representations—which owe as much to technical issues as to theoretical concerns–and ask what about the data is exposed, and what remains obscured from view. That is to say, what questions and what stories still remain for computer scientists, and for humanities scholars, working together, to begin to tell?

 

]]>
https://dhlab.lmc.gatech.edu/talks/talk-at-digital-humanities-2014/feed/ 2 109