Caroline Foster – DH LAB https://dhlab.lmc.gatech.edu The Digital Humanities Lab at Georgia Tech Thu, 26 Jan 2017 19:09:52 +0000 en-US hourly 1 https://wordpress.org/?v=6.2.2 41053961 TOME Discussion https://dhlab.lmc.gatech.edu/tome/513/ https://dhlab.lmc.gatech.edu/tome/513/#respond Thu, 26 Jan 2017 19:09:52 +0000 http://dhlab.lmc.gatech.edu/?p=513 Following the previous two blog posts where I looked at a few topic modeling interfaces, I’ll return to the lab here and write about TOME. These are based on image documentation of the project – not actual interaction with the program.

for_dh

Strengths:

Multiple views: One of the first things I noticed was that it combines different views onto one page. This combines the previous two interfaces I discussed – where one had too many disparate views and the other had one view, but was limited in other representations as a result. Here, the main visualization can be seen at the top and is clearly the most important since it is the largest. Underneath, which is cut off in the image, are other visualizations including prevalence, related topics, and geographic distribution. This might require scrolling, but is certainly better than a completely different page.

Sorting and filtering: There appear to be many different ways to sort and visualize the display – including by year, relevance, popularity, and within a certain group of documents. For exploratory purposes having more options is a good thing – so long as it doesn’t get overwhelming.

Limitations:

Scrolling: It always depends on how many topics there are, but it would be ideal if they could all be shown without requiring scrolling.

Searching: Less of a limitation, but more of a note on how users will vary on their needs based on how much they already know about the topic and/or sets of documents. The standard search bar is good if you already know what you’re looking for, but otherwise it is not very helpful and might cause frustration.

]]>
https://dhlab.lmc.gatech.edu/tome/513/feed/ 0 513
InPhO Topic Explorer https://dhlab.lmc.gatech.edu/tome/inpho-topic-explorer/ https://dhlab.lmc.gatech.edu/tome/inpho-topic-explorer/#respond Thu, 26 Jan 2017 14:25:28 +0000 http://dhlab.lmc.gatech.edu/?p=500 This interface works in terms of visualizing a topic modeling and understanding context. However, less effort has gone into the visual and interaction design than in the last interface discussed, which makes for a steeper learning curve.

Starting Interface

screen-shot-2017-01-26-at-8-30-12-am

The interface begins as a drop-down menu within the homepage, which also includes documentation on the code and how to download it. I already know that in our design, we’re  focusing attention on the interface and exploration of the topic model results. We will probably want the documentation to be separate from the interface, like the interface in the previous blog post.

Strengths and Limitations

screen-shot-2017-01-26-at-8-30-34-am

Two inputs: The simplicity of the starting interface is helpful. It’s clear that I need to choose a corpus of text – so I chose “Letters of Thomas Jefferson.” At first glance, it appears to require some knowledge of the documents already in the “Type to match document titles…” bar. What if I don’t know any? Clicking on the “random” button, which is also the “shuffle” button in music apps like Spotify, does nothing at first. It actually does, it just takes a little too long to load. It selects the letter “To Mr. Dumas, July 13, 1790.” I can also just click the Visualize button without any document in the bar, although this is not clear through the interface.

screen-shot-2017-01-26-at-8-33-48-am

Visualize #: The Visualize button allows selection of 20, 40, 60, or 80 topics. I have no idea how this will affect anything, so I choose 20. It might be good to have this variation, but more indication on how it might change things for users unfamiliar with topic models might be necessary.

Main Interface

screen-shot-2017-01-26-at-8-39-34-am

Overall, it’s nice that the primary interface is more or less on one page so there is not as much need to move around between separate views that can feel disjointed. However, there are less ways to view the topic model. The main view is a horizontal bar chart, with each bar representing a document and each section of the bar representing a topic.

Strengths and Limitations

screen-shot-2017-01-26-at-8-45-31-amscreen-shot-2017-01-26-at-8-42-36-am

Color: I’m not sure if this was just unlucky color assignment, but the top two topics that appear the most in the focal document were assigned the same color variable, making it nearly impossible to distinguish between the two. Typically, when visualizing categorical data (here, the categories are the topics), people can only distinguish about 8 colors – after that, it becomes much more difficult. Here, for the 20 topics, 9 colors are being used, but more than one is assigned to  multiple topics, which defeats the purpose of distinguishing by color.

Connections: The interaction of hovering over the topics in the document and showing the name of the topic in the key is helpful.

Scale: Someone familiar with topic modeling might understand the “Similarity to” scale at the top, but others that are not might want a quick note of what it means.

Checkbox features: “Normalize topic bars” makes each part of the bar for the document in proportion to the collection as a whole, rather than the individual document. This is definitely a useful feature for context, and using a checkbox makes it easy. Similarly, the “Alphabetical sort” option is a useful, and simple, feature.

Topic model #: Changing the topic model quantity is helpful, using the little dropdown menu next to the dropdown menu for the focal document. The loading time is quick, a loading status bar is provided to show it is working, and then there is animation so the transition isn’t jarring. However, there is also a bar on the far left, where you can click on the same numbers (20, 40, 60, 80) and change to the topic model, but then it transitions to a blank slate. The bar on the left likely indicates a “home” or “reset” which is why this happens, but I’m not sure what it adds or what the use cases would be.

Reordering: Clicking on a segment sorts the documents by “Top Documents for Topic #”. This is useful for exploring context. However, the “focal document” then becomes lost in the reordered list. There is no highlight or visual call to attention on the document listed at the top, which is what we started with. This would probably be a useful feature to have, in order to trace a document throughout explorations of various topics.

Randomizing: Randomizing the document brings up new titles, but still requires the press of the Enter button to display the new data. Having a random button allows for playful discovery, so it’s nice to have.

Undo: The browser’s back button doesn’t always take you to the exact last place in the model viewing, so having an “undo” button of sorts would provide for handling of user mistakes or simply additional navigation.

]]>
https://dhlab.lmc.gatech.edu/tome/inpho-topic-explorer/feed/ 0 500
Topics in PLMA interface https://dhlab.lmc.gatech.edu/tome/topics-in-plma-interface/ https://dhlab.lmc.gatech.edu/tome/topics-in-plma-interface/#respond Wed, 25 Jan 2017 15:33:09 +0000 http://dhlab.lmc.gatech.edu/?p=483 To start thinking about how we might design an interface to act with topic modeling results, it’s worth looking at existing interfaces and what they do well or don’t do well.

This interface by Andrew Goldstone allows for browsing topic models of articles from PMLA, the journal of the Modern Language Association of America. The model and code can be used for other sets of text – this is just an example.

The interface has multiple views and ways to explore the model – something can be learned from each of these. The site as a whole, and the navigation between the views, is slightly confusing and conceptually basic, but since it is described as an alpha version, I won’t dwell on those issues. The bulk of the content is in the “Overview” section.

Overview: Grid

screen-shot-2017-01-24-at-9-42-24-pm

A simple display of each topic as a circle, with the words sized corresponding to its weight within the topic. The topic bubbles are arranged in order of the number of the topic. Clicking a bubble leads to the corresponding topic page, which is not part of the “Overview” section.”

Strengths:

  • Easily apparent each bubble represents a topic
  • Movement from overview of topics to one specific topic is clear

Limitations:

  • Perhaps a technical flaw, the thickness of the border on each bubble varies but doesn’t seem to actually represent anything.
  • Six words are in each bubble, however there are many more “top words” within the topic model. This may or may not be a “problem” but it is something to be aware of.
  • Besides word size within each bubble, there are little other visual cues to guide the experience…which is why there are more views…

Overview: Scaled

screen-shot-2017-01-24-at-9-54-13-pm

The topics are spatially arranged by similarity. This uses principal coordinates analysis, which I mentioned in the previous blog post.

Strengths:

  • Provides additional information – the similarity
  • Provides interaction to discern overlapping topic bubbles
  • Useful for discovery

Limitations:

  • How it is arranged is not immediately clear – at least to a novice user, one less familiar with topic modeling
  • Requires additional interaction to zoom in to areas with overlapping topic bubbles
  • Difficult to locate topic within scale

Overview: List

screen-shot-2017-01-24-at-9-45-16-pm

A table format allows for comparison of more information – adding a small “over time” illustration and proportion of words in the corpus assigned to the topic.

Strengths:

  • More information viewable at once

Limitations:

  • The author mentions in the documentation that the bar for the proportion of words can be misleading because “the highest-proportion topics are often the least interesting parts of the model — agglomerations of very common words without a clear thematic content.”
  • The y-axes of the mini bar charts “over time” are not all the same scale
  • Requires scrolling, so it can’t all be viewed at once

Overview: Stacked

screen-shot-2017-01-24-at-9-45-21-pm

This view evokes D3 the most – varied by color, each topic model is stacked on top of each other and shown as the appearance increases or decreases over time.

Strengths:

  • Shows trends of each topic model over time, all at once

Limitations:

  • Topic models with less appearances are harder to distinguish without interaction

Topic

screen-shot-2017-01-24-at-9-42-35-pm

Each topic has its own view, including top words and their weight, proportion over time, and top documents.

Strengths:

  • Clicking on the bar in the timeline limits documents. Great use of drilling down. Although, it would be nice to be able to select multiple at a time. 

Limitations:

  • Clicking on the document or the word opens up the document or word interface, leaving the context of the topic itself behind. Expansion within topic is alternative option.

Document

screen-shot-2017-01-25-at-10-32-25-am

View shows title and topics by proportion. To reach a document, it must be clicked on from the bibliography page or from a topic page. If topics are the priority, this makes sense – otherwise it might be interesting to also have a document view (beyond standard bibliography).

Word

screen-shot-2017-01-25-at-10-32-15-am

View shows topics in which the word occurs. Similarly to document, it must be reached by a list of all words (for documents, bibliography) or a topic page. If the word only appears in one topic, it’s surprising when the view changes to show a different word among many topics. The animation helps with these transitions.

General considerations

  • Time to load – with a large dataset, lag time can be frustrating for the user and even make interaction impossible
  • Difficult to follow a topic throughout the different views – As the screenshots that I took show, I looked at Topic 18 throughout my exploration. When on a topic page, there is no way to see the topic simultaneously in any of the Overview views. For example, I would like to return to the Overview and perhaps see Topic 18 highlighted in the Scale or Table view. However, navigating between words and documents, and discovering other topics in the document, is likely helpful to a humanities scholar.

UX Questions 

From exploring this interface, I’ve come up with a list of questions to consider in future designs. Many of them are classic information visualization questions, with no one answer.

  1. How many words of a topic should be displayed at once to best indicate the topic? Generally, how much information should be shown at once?
  2. How do we design for exploration and discovery of information? Rather than a search like a database query?
  3.  When is it appropriate to change the presentation of data, e.g. axis variables?
  4. When drilling down to a detail, how much context should be shown?
  5. How much visualization beyond the topic itself is necessary, e.g. do we also want to visualize all of the documents where each document is a variable?
  6. What should be sacrificed for quicker load times?

 

There are a lot of things to be learned from these different views. Great documentation, thought process, explanation of design decisions by author here: http://agoldst.github.io/dfr-browser/

]]>
https://dhlab.lmc.gatech.edu/tome/topics-in-plma-interface/feed/ 0 483
Topic Modeling and Digital Humanities: Overview (1) https://dhlab.lmc.gatech.edu/tome/topic-modeling-and-digital-humanities-overview-1/ https://dhlab.lmc.gatech.edu/tome/topic-modeling-and-digital-humanities-overview-1/#respond Fri, 20 Jan 2017 14:51:32 +0000 http://dhlab.lmc.gatech.edu/?p=481 In this post:

  • What is a topic model?
  • UX considerations
  • Existing techniques

This will be the first post in a series of posts as we begin a new project on exploring topic modeling for the digital humanities, following the previous work of (link)TOME.

A topic model is a model of how often words occur together in a group of texts. Many other posts have been written about the definition of “topic model” in detail, in addition to detailing various algorithms.

http://journalofdigitalhumanities.org/2-1/topic-modeling-and-digital-humanities-by-david-m-blei/

http://journalofdigitalhumanities.org/2-1/what-can-topic-models-of-pmla-teach-us-by-ted-underwood-and-andrew-goldstone/

http://programminghistorian.org/lessons/topic-modeling-and-mallet

https://tedunderwood.com/2012/04/07/topic-modeling-made-just-simple-enough/

Here, I am going to highlight some challenges of viewing, exploring, and learning from topic modeling results.

UX considerations

How much information to show at once? Topic models typically have a lot of topics – which leads to information overload. How can you browse it a manner to learn something and not be overwhelmed?

How can you understand different views of the model? The results change based on number of topics specified by user. A small number means separate topics will merge, larger number means combined topics will split. Both are correct, but each occludes information.

Can you design for both a user familiar with the contents of the texts and a user who is unfamiliar? These are likely very different use cases. One will have questions in mind and one will be attempting to gain an initial understanding.

How do you design for trust? The model may or may not be misleading, or both. It’s not inherently bad if it is misleading – so long as the user recognizes that it is – and design can aid with this understanding.

What capabilities can metadata add? Many topic models disregard metadata and just use content. But if we use metadata, which might include things like author’s gender, race, regional location, and year, how else might one be able to explore the model?

How do you make the topic model a sustainable addition to existing work flows? Topic models would presumably be more useful if they are integrated into existing ways people work. This especially applies to people who may be less familiar with technical fields like computer science.

Existing techniques (some of them)

Dendrograms: A type of tree diagram emphasizing hierarchical clustering

More on the definition: https://en.wikipedia.org/wiki/Dendrogram

Use: http://blog.rolffredheim.com/2013/11/visualising-structure-in-topic-models.html

Pro: It maintains more complexity

Con: Its linear structure restricts links between topics

Network visualization

Use: https://tedunderwood.com/2012/11/11/visualizing-topic-models/

Pro: Shows some connections, nodes can be sized based on number of occurrences

Con: Topic models aren’t actually networks

PCA (Principal Component Analysis)

Explores the model into two dimensions

e.g. https://tedunderwood.files.wordpress.com/2012/11/prettierpca.jpg

Pro: Solves issue of false network diagrams

Con: Words overlap

There are many other interfaces, particularly ones specific to a given dataset. These will be the subjects of the next few blog posts.

]]>
https://dhlab.lmc.gatech.edu/tome/topic-modeling-and-digital-humanities-overview-1/feed/ 0 481
Example projects, supportive materials, and square-circuit interaction https://dhlab.lmc.gatech.edu/uncategorized/example-projects-supportive-materials-and-square-circuit-interaction/ https://dhlab.lmc.gatech.edu/uncategorized/example-projects-supportive-materials-and-square-circuit-interaction/#respond Wed, 04 May 2016 17:41:29 +0000 http://dhlab.lmc.gatech.edu/?p=258 Two example projects

Finding example projects that might influence our design and help us make technical decisions was quite difficult. However, here are two projects that might be useful to refer to:

  1. LED Matrix Quilt – using conductive thread, 64 individually-sewn LEDs, resistors at the end of each row, snaps for Arduino LilyPad attachment, inner fabric piece for hardware, outer fabric for aesthetics/design. Some key takeaways:
    1. If you don’t know anything about fabric and technology, you might think that the fabric itself actually has to be translucent. This isn’t the case, LEDs can shine through opaque fabric! So buy a bunch of swatches and test.
    2. Individually sewn LEDs is a viable option for a final version, likely lends more flexibility.
  2. Child’s Interactive Quilt – multiple circuits with a combinations of LilyPads, LEDs, conductive thread, backing fabric. Some key takeaways:
    1. Conductive fabric can be used as a pressure sensor (as Erica mentioned in the previous post). This project layered the fabric on a small piece of foam before attaching it to the quilt.
    2. Multiple circuits can exist on the same quilt (obvious, but useful to note).

 

Supportive materials – an embroidery hoop and a quilting frame

From these example projects, and discussions with knowledgeable friends, some supportive materials for the making might be helpful. The LED Matrix Quilt used an embroidery hoop to help with the sewing of the individual LEDs. If we decide to sew individual LEDs, or work with small squares of fabric individually to start, then this would be a good purchase. For a bigger piece of work, a quilting loom (like a drafting table without a top) might be helpful.

 

Circuit Design

What follows discusses what an ideal design would be, not necessarily what we are first going to prototype. Each square – one of nine squares representing the nine events in a year – should have two circuits. The first should be an on/off switch for the LED, and the second should change the color that represents the country. This could also potentially be one circuit, coded as one bigger cycle.

Next steps: sketching out the circuit design, taking into account constraints of conductive fabric and thread.

]]>
https://dhlab.lmc.gatech.edu/uncategorized/example-projects-supportive-materials-and-square-circuit-interaction/feed/ 0 258
Lighting up the NeoPixels https://dhlab.lmc.gatech.edu/floorchart/lighting-up-the-neopixels/ https://dhlab.lmc.gatech.edu/floorchart/lighting-up-the-neopixels/#respond Mon, 04 Apr 2016 18:30:32 +0000 http://dhlab.lmc.gatech.edu/?p=241 There is extensive documentation on NeoPixels, including a very handy “überguide”. It is written assuming experience with Arduino/electronics. I would recommend following this, and using this post for clarifications.

While I’ve had a reasonable amount of experience, I would still classify myself as a “beginner.” The biggest problem we encountered in setting up the NeoPixel strip can be attributed to not recognizing that small screws can fasten things together…now it seems obvious.

For expert users, I might be oversimplifying this, but it should help beginners. The NeoPixel strip has 4 connections, two for the additional power supply and two for data. For power, the red cable is connected to the positive plug of the adapter, and the black cable is connected to the negative plug. In addition, a capacitor is attached matching the positive and negative sides. This might not be needed, but it’s probably a good idea. To attach all of these things together, you can either solder or place all the ends of the cables in the little holes of the green part of the adapter, and tighten the little screws to keep everything together.

The other black cable goes to ground (GND) on the Arduino board, and the white cable goes to Digital Pin 6 on the Arduino board. (It can actually go to just about any digital pin, but the example code uses 6). It is also recommended, that you use a 470 ohm resistor in between the white cable on the NeoPixels and the digital pin. To do so, use an additional jumper cable and a breadboard.

Finally, triple check your connections…otherwise something might short and become unusable 🙂

neopixelParts neopixelParts2

 

]]>
https://dhlab.lmc.gatech.edu/floorchart/lighting-up-the-neopixels/feed/ 0 241