DH LAB

Data by Design: Automating the Chapter Timeline

Dan Jutan — Tue, 02 Mar 2021 14:31:28 +0000

Each chapter of Data by Design has a chapter timeline, which places images and visualizations on a vertical minimap, allowing the user to get a sense of the chapter and their progress within it at a glance. Additionally, all user highlights will be visualized on the timeline as they’re made. The map will allow the user to click objects within it and be taken to the corresponding spot in the chapter. Embracing the notion of a “meta-visualization,” each chapter’s timeline is styled after the main visualization in that chapter.

The Peabody timeline at time of writing

Initially, the chapter timeline rendered manually-entered metadata, a process that was hard to maintain and required an additional source of truth—whenever the structure of a chapter changed, we’d have to manually update the timeline data. Instead, I built an automated system that looks at the chapter content as it is to build the metadata that is given to the timeline. In this post, I’d like to an overview of how that works.

Compile-time or Runtime?

All of the chapter content is written and stored before it ever appears in your web browser. So should our chapter metadata system analyze the chapter content files or the rendered chapter content as it appears in your web browser? The former would happen at “compile-time”—during development, metadata would be generated by some sort of script, and the output of that script would be saved and uploaded along with the rest of the project to be referenced by the chapter timeline. The latter would involve an ad-hoc analysis of the chapter after it is rendered by the browser; metadata would be generated on the fly.

Both approaches have their advantages—the compile-time system would be more verifiable, more easily allow for manual tweaks, and would result in a slightly faster loading time for the user—but I went with a runtime approach. This is for two main reasons. First, I wanted a component’s exact vertical placement in the page to be part of this analysis, so that we could send the user to the right spot in the chapter when a square is clicked in the timeline. This exact rendering information is only available after all of the components have been rendered in the browser. Second, I wanted to be able to account for changes in the chapter structure that happened at runtime. For example, let’s say we had a chapter that dynamically loads an image only if the user clicks a certain button; I would want that new image to show up in the timeline then and only then.

Sections

In the screenshot above, the start of a new section is represented by a big square, and the length of the line to follow corresponds with the “length” of the section, with spots reserved for paragraphs, images, and visualizations (“subsections”). So the first order of business when building this timeline is identifying the sections in the chapter. My first try at this was to analyze the DOM directly and find the header elements, but we ended up building a reusable Section component so that the style of all the section headers could be changed at once. Once that component was built out, it made more sense to have the Section component register itself rather than for the chapter timeline to go looking for it. But what was to receive and manage that registration?

The Source of Truth

We need a place to keep track of what we’ve found once we’ve analyzed the chapter. Our project uses Vuex, a Vue-integrated state management system, to manage the data needed by chapter visualizations, so I created a Vuex store to track the necessary information for the chapter timeline. When a Section component is mounted onto the page, it tells this store that it’s been created and passes in the id of the section element. Then, the store calls a function that analyzes the children of the section element to determine how much space to reserve for subsections. Let’s check out an example: the first section of the Peabody chapter. When it first gets registered, the store analyzes its DOM children, and finds four elements:

This is why you see four slots for squares on the timeline for this section. The next step is to take a look at these four elements and see what we know. One of the children—at index 3—has the special predefined “IMG” class, so we can go ahead and register an image at that subsection, which will show up as a green square in the timeline. We thus pWe don’t know anything inherently about the div at index 1 – is it a visualization, a scrollytell, or something that shouldn’t be put on the timeline? For elements like these, we again rely on the component to register itself with the store so that we can be confident about its identity. The div at index 1 in this example is the Map Scroller visualization. As I mentioned in my last post, visualizations all have a mounted hook that registers them when they get loaded into the page:

This actually happens before the section itself is registered, because in Vue, subcomponents’ mounted hooks are called before their parents. So by the time we do our section-child analysis, we’ve already been told to expect a visualization that looks like this element. So we’ll go ahead and save that information – a visualization square (orange) should be rendered at the second subsection slot of the first section.

The store allows us to have all the data required for the timeline in one, trackable place and decoupled from the timeline rendering itself. Thanks to Vuex’s debugging tools, which log every mutation to the store, it was easy to find bugs during development. And the store is decoupled in the sense that it doesn’t need to know any rendering details like the size or color of squares. It simply maintains the body of data needed for rendering, which is later passed to the Navline component for rendering.

Paragraphs and Highlights

Paragraphs don’t themselves get a square on the timeline: instead, they’re represented by whitespace. However, when the user highlights something in that paragraph (and drags it into their notebook), an indicator shows up next to that paragraph slot:

Similarly, when the user highlights a caption of an image or text inside a visualization or scrollytell, a highlight indicator shows up beside that:

As you might guess, this follows the same registration pattern I’ve described: when a highlight is created, it registers itself with the store, which tracks down its parent section and figures out which subsection slot it should be in.

You might notice that the Peabody chapter has two paragraphs before the Map Scroller (at least at time of writing), but the timeline only shows one slot. In fact, a highlight indicator will show up in that slot regardless of whether you highlight in the first paragraph or the second. This was a deliberate design decision to combine adjacent paragraphs into one paragraph slot, keeping the timeline from getting too long.

Takeaways

There’s always a tradeoff when automating something: you want to make sure that the time and effort spent in the automation process really does save you time in the long run. In the case of this chapter timeline, this automation process drastically speeds up future chapter development: we only have to worry about following certain registration patterns—most of which are codified in reusable components and mixins—and the timeline will be functional right off the bat. It also makes maintenance far easier: updating the timeline is the same process as updating the chapter itself. Lastly, it guarantees that the timeline always works the same way in each chapter; if we made them by hand, we’d have to consciously make sure we were consistent.

All in all, this feature was incredibly satisfying to build. I love constructing these process-improving systems, and there are a few more to talk about in future posts, but for now—thanks for reading!

Data by Design: Static Visualizations for Playfair’s Chapter

Jianing Fu — Sat, 13 Feb 2021 22:40:55 +0000

Besides the “Export and Import to and from all North-America” scrollytelling visualization, I also created several static visualizations using d3 to showcase Playfair’s method with modern technologies.

The first two visualizations use the same data of Playfair’s import-export chart.

The stacked bar graph on the left uses a similar axis and labels as the import-export chart. I have each bar with constant width representing each year we have in the CSV and positioned chronologically. For year 1770 to year 1782, where we have detailed data points, the bars are overlapping each other.

The Coxcomb graph on the right uses a circular axis, where each sector represents a decade. The width (inner angle of the sector) depends on the duration of time each data point represents, so from year 1770 to year 1782, the sectors are thinner than the others. I also made the choice to have the sectors outflow the circular axis instead of using a larger scale because I believe this dramatic visualization will attract readers’ attention to the later decades of the chart and emphasize its visual effect.

Covid-19 Death Comparison

Women Representatives Comparison

Average Income Comparison

These three charts use the same format as Playfair’s import-export chart. The d3 code is also similar. I only need to use a different dataset and modify its axis scale and title positions. The visualizations showcase an interesting comparison between U.S. and U.K. for some relevant topics.

Data by Design: Code Reuse and Visualizations

Dan Jutan — Tue, 09 Feb 2021 02:08:06 +0000

When I joined, the team had done a considerable amount of research, design, and requirements exploration in addition to our small prototype. One of the ideas continually emphasized was that the book would be heavily data-driven. This can be a bit of a buzzword and a swiss-army-knife of a term – the first two D’s in the popular web visualization tool D3 are “data-driven,” for a technical example, while data-driven can characterize approaches in other disciplines, like data journalism.

Data by Design is situated in a broad meaning of the category – we’re data-driven in the sense that the scope of the book is driven by contours in the history of data, and the research follows data and the people that wield it. But in addition to researching data and using data for research, we make our argument by presenting data, rendering it visually throughout the book. These include recreations of historical data visualizations and new takes on them: new visualizations of old data and historical visualizations swapped out with new data. So we’re also data-driven in the technical sense: the book is filled with interactive web components that render and depend upon collections of data.

That characterizes a common task throughout development: in every chapter, we need to build components that take some dataset and present it visually. Software engineering is all about abstraction and generalization—if there’s a repeated task, find a way not to repeat yourself—and while this might not sound like a lot of repetition (after all, there’s a big difference between the visualizations of Peabody and Playfair), from the development side, the visualizations have much in common. They each do all or many of the following:

Require a data source to be loaded
Reformat that data in some way
Contain subcomponents that also need to see the data
Respond to events (think of these as messages) from subcomponents and send events back out to the chapter
Mutate the data (after user interaction)
Allow the user to drag it into the notebook
Save the mutated data to the notebook server when dragged into the notebook
Show up in the chapter timeline

This adds up to a sizeable chunk of functionality that can be shared across visualization components so we don’t have to reimplement these features in every new visualization.

In object-oriented programming, there are two common ways of sharing groups of functionality among objects: composition and inheritance. With composition, the objects each have a copy of an object that does the desired functionality; with inheritance, the objects declare themselves as being a version of the object that does the desired functionality, and thus “inheriting” its functionality, thanks to language features like subclassing. In this example, if we use composition, the visualization components would each create an instance of some helper object that has the features implemented, and it would call the helper’s methods when necessary.

If we use inheritance, we’d declare each visualization component as being a Visualization, which would be thought of as a “parent” object, and then our visualization components would automatically contain all the functionality that was in the parent. (In many OOP languages these are called “superclasses,” or “abstract superclasses” when the parent is simply a template that depends on the child to flesh it out.)

Ever since the initial prototypes, Data by Design has used the JavaScript framework Vue.js. When I say “component,” I refer to the building blocks of a Vue application (somewhat analogous to “class” which is the building block of many OOP languages). I won’t go into all of the elements in a component, but they include:

data, which are reactive properties that can be referenced and updated throughout the component and its UI. For example, the interactive Peabody grid uses a data field to keep track of and respond to the current pixel that the user is hovering over.
props, which are like data, but they’re passed from the parent component to the child and can’t be changed by the child. That grid takes a “century” prop so it knows what year the first square of the grid represents.
methods, pieces of functionality that it can call and reuse. The Peabody grid calls one of its methods every time the user hovers their mouse over it.
lifecycle hooks, which set up the component to trigger pieces of functionality when something happens in the program. For example, a component might use the mounted lifecycle hook, which triggers when the component is added to the page, to register itself in the chapter timeline when it first becomes visible.

The newest release, Vue 3, ships with a Composition API to make it as easy as possible to share code across components using composition. OOP design patterns tend to favor composition over inheritance: composition can be easier to maintain, it avoids the rigid, predetermined taxonomy required by inheritance, it always allows for multiple helper objects (inheritance frowns upon multiple parent objects), and it even allows for techniques that swap out the helper object for another at runtime.

The Composition API brings a whole suite of code-sharing features to Vue that I’ve enjoyed using in newer projects, but most current Vue projects, Data by Design included, are built on Vue 2. A primary way to share functionality among components in Vue 2 is its mixin feature, which is similar to inheritance. Instead of building a “parent” object, you build a mixin object. This is written like any other Vue component and can have all the features that a component can, but it can’t be used directly. Instead, a component can declare that it is using that mixin (or any number of mixins), in which case all the data, methods, and hooks in the mixin are merged with (or “mixed into”) the component.

This lets us do something that composition often can’t: completely move functionality out of the way of the child components. When our team builds a new component, I don’t want us to have to think about registering it in the timeline or making it draggable or figuring out how to send data to subcomponents: if it isn’t unique to the component, I want it to be done automatically. With composition, you’d still have to explicitly configure the appropriate hooks, even if that configuration is a single call to a helper object. This is preferred in many applications: with composition, you can involve functionality as needed, without having to include things that aren’t. But in this case, aside from prioritizing an easy developer experience, we do want to enforce all of the functionality. In other words, we want to make sure that every Visualization does certain things and has certain capabilities. It’s a case where inheritance is indeed desired over composition.

The mixin is imported and added to a component using the mixins option in Vue:

And then the visualization automatically gains all of the functionality in the mixin. Any props that that are specified in the Visualization mixin are now accepted by the child component, any data and methods created in the mixin are accessible, and its lifecycle hooks are registered.

So for the Visualization mixin:

The props allow the component to take the name of a static dataset (which is to be grabbed from the server) and/or a mutable dataset (which is to later be sent to the server as part of the notebook). There’s also a width prop, which allows the component to base the size of visualizations on a maximum width that’s determined by the chapter.
The hooks make sure that when the component is created, the specified datasets are downloaded or registered, and when the component is mounted (i.e., when it appears in the page) it creates a draggable icon in the corner for dragging the visualization into the notebook and lets the chapter timeline know.
The data (really, computed properties) allow the child to view the dataset and various details about how it is registered.
The various helper methods allow the component to transform the data, easily register events, and create lengths based off of the passed-in width.)
Additionally, many of the properties and methods are set up to be injected further down the component tree. That means that they aren’t just accessible by the component that’s been directly “mixed” with the mixin; any component contained by it can request them.

Many of the Visualization mixin’s roles rely on its communication with various Vuex state modules. In other words, the Visualization mixin doesn’t keep track of all the visualizations and all the data itself; its job is to coordinate with those systems. I’ll talk about how these modules work in a future post, as this one’s getting long enough!

You can view the full code here.

Data by Design: Playfair Visualization, Step by Step

Jianing Fu — Mon, 08 Feb 2021 01:12:22 +0000

In Playfair’s chapter, we remade his original “Imports and Exports to and from North-America” graph with d3. We also breakdown his process of making visualizations with engraving and compared it to the modern way of using software tools.

Playfair’s original visualization

D3 recreation of Playfair’s visualization

" data-medium-file="https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-300x142.png" data-large-file="https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-1024x484.png" decoding="async" loading="lazy" class="wp-image-1169 " src="https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-300x142.png" alt="" width="361" height="171" srcset="https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-300x142.png 300w, https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-1024x484.png 1024w, https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50-768x363.png 768w, https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Shot-2021-02-07-at-17.07.50.png 1472w" sizes="(max-width: 361px) 100vw, 361px" />

D3 recreation of Playfair’s visualization

The recreation integrated the scrolly-telling feature we had for this project, breaking down the process of Playfair’s engraving and transformed it into today’s data plotting method with d3. As the reader scroll through the web page, components of the visualization appears in the order of creation along with text explanation.

https://dhlab.lmc.gatech.edu/wp-content/uploads/2021/02/Screen-Recording-2021-02-13-at-13.37.49.mp4

The visualization starts with borders, axis, and labels. The first version of the impart and export line corresponding to Playfair’s first draft.

Playfair’s First Draft

Next, the lines transformed into Playfair’s final version and then today’s recreation. The lines of Playfair’s versions were crafted by tracing the original visualizations in Adobe Illustrator and exporting them into SVG format. After carefully position the SVG lines into the d3 visualization, I then have to manually match the line with data points in the CSV used for the recreation in order to implement the transforming part.

Line Transformation GIF

The visualization then displays the title and also the datapoint used to form the curves in the version of creation. We can see that the curve of the reaction was generated by those data points so that there is a perfect match between them, whereas Playfair’s original versions would resulting in some error.

Data by Design: Front Page Timeline

Jianing Fu — Mon, 08 Feb 2021 00:53:49 +0000

On the front page, we decided to implement a timeline below the main title to give our readers an overview of all the visualizations we will be covering in the book. The timeline consists of three main parts, the gray horizontal bar representing time from the year 1786 to the year 1900, multiple blue vertical bars at each time mark representing the number of visualization we collected for that specific year, and the image frames that showcase the thumbnails of those visualizations.

The image information is stored in the points array in the data function of the Vue component. The structure of each data point is shown in the screenshot below.

Each data point has a unique id, the year it belongs to, the order of that year, the link to the location of the image stored, and the image’s width and height. These information is used to calculate the position and size of the image frame. The structure should also be optimized and stored as a JSON or CSV in the future.

The timeline is implemented in the Picline Vue component, takes in the points array from the home page. The timeline is made purely through SVG. We first have a function to group the data points by year and determine the height of the blue bar, then draw the blue bars and the corresponding text, the position of text alternates by odd and even. We then iterate through all the data points to draw image frames. The position of the frame is determined by the position of the year plus the offset of the order of the point in the year.

We added @mouseon and @mouseleave to the function to observe users’ interaction. When hovering over each frame, the frame size and the picture size will be enlarged by two, displaying on the top layer with a blue arrow pointing the position on in gray line.

The entire timeline would work as the gif showed below.

The Data by Design Notebook: Drag and Drop

Dan Jutan — Thu, 16 Jul 2020 02:39:40 +0000

In my last post, I introduced Data by Design’s notebook feature, and spoke a bit about the design and implementation of the chapter highlighting capability. Highlighting, though, is only the first step in this user story, and we wanted a drag-and-drop interface that felt simple and natural to move highlights to the notebook. In this post, I want to outline how I integrated drag-and-drop throughout DxD’s UI.

Why Drag and Drop?

Although we all intuitively felt that we wanted drag-and-drop for this interface, I want to parse that here. Why does drag and drop feel natural and appropriate? When does it not? How do we make the experience as intuitive as possible?

At a basic level, drag-and-drop is intuitive because it maximizes the correlation between interaction and representation. At each moment in the drag, the user is given the immediate feedback of what their completed action would mean. Drag-and-drop interfaces are everywhere, so let’s illustrate with an example.

The Chrome tab system allows the user to freely reposition tabs using drag-and-drop. As the user drags, the view is instantly updated to reflect the new tab position, whether or not the user ultimately chooses that position. This reduces cognitive load, as the user doesn’t have to imagine anything: What You See is What You Get. This complete visual correlation between interaction and result also allows the user to make use of the representation to help make their decision. They can drag a tab back and forth and the tab flow updates automatically, allowing the user to test out different positionings.

To use terms from Don Norman’s design vocabulary, great drag-and-drop controls maximize mapping: the interface creates an effective correlation between interaction and result, reducing the need to learn or remember anything new. It’s important, then, to make clear exactly what will happen when the user completes the drag, to give them feedback along the way: in the browser tab example, feedback takes the form of the surrounding tabs’ movement.

Drag-and-drop interfaces don’t rely on on-screen controls, like buttons, drop-downs or textboxes. This is their strength, as it allows drag-and-drop to be as intuitive as point-and-click and turn-to-steer. But it can also present a problem: how does the user know that something is draggable? The chrome tabs change color on hover, but this signifier is rather abstract: it isn’t obvious that “brighter color” means “draggable,” and a user might not ever try. And so the first test for whether drag-and-drop is appropriate for a given interface is the question, “would I assume I might be able to drag it?”

It’s these things that I think about as I design and build the drag-and-drop notebook: creating an obvious mapping, giving appropriate feedback during the drag, and signifying that the drag is an option. It’s certainly a work-in-progress, as any initial, pre-user-research design must be, but it afforded my first foray into drag-and-drop on the web.

The Drag and Drop API

Before implementing anything, I had to learn the Web API for drag-and-drop. This essentially amounted to some articles on MDN, but it really isn’t the most intuitive API. I want to summarize the basic process in hopes that a step-by-step breakdown might be of use to someone, not the least to my future self!

Step 1: The draggable attribute

On your draggable element, you must setdraggable="true". If you don’t, it won’t register your drag events. (By the way, you can set draggable="false" to turn off the browser’s default dragging functionality for that element, which normally allows the user to drag text, images, and links around and into other programs.)

Step 2: The dragstart event

When the user begins to drag the element, it will trigger any registered dragstart event. Within that event, you set the parameters of the drag, which include:

the drag image (the ghosted image that shows up to represent the element during the drag). You set this with event.dataTransfer.setDragImage(someElement, xOffset, yOffset).
the drag data, which is a series of labeled pieces of data. You attach these pieces of data to the event using event.dataTransfer.setData(key, data). This data then gets passed through to the event.dataTransfer object on the drop event—we’ll get to that.
- In the examples on the docs, key is always a kind of data type, like “text/plain”. In practice, it’s simply a key for the data, and you can choose anything.
- You can only send strings as that data. If you do need to send a full object, you can stringify it or send each property as its own setDatacall.

Step 3: The dragenter and dragover events

Yep, there are two of these that you need to keep straight. Registering dragenter on an element will trigger that event when a drag enters the bounds of the element, while dragover triggers as long as the user is dragging within that element, and it will trigger repeatedly during that time. (If you want to trigger an event as long as the user drags a draggable element anywhere, you can use the drag event on the draggable element. These latter two events sounded pretty useless to me, but perhaps they can come in handy if you want reactions on precise locations within your drop elements.)

Here’s where the docs get confusing. According to this, you need to register and cancel both the dragenter and dragover events to designate an element as a drop target; according to this, you only need dragover (along with the drop event). I’ve been able to make things work using just the dragover event, but I add and cancel dragenter just to make sure: maybe it’s a documentation mistake, or maybe it’s platform-specific gotcha.

What do I mean by “cancel”? In order for the drop to register, you have to cancel the event by calling event.preventDefault() within the event listeners. In our case, I fulfill this by having a bunch of mostly-empty event listeners that just call event.preventDefault(), but you could use control statements to only call event.preventDefault() when a condition on the drag element or the drag data (both of which you can access through the listener’s event argument) are met.

You can also supposedly set the event’s drop effect within these listeners, which is supposed to give visual feedback by changing the cursor based on three available types of drag actions, but frankly I haven’t gotten it to work, and I find it easier to change the cursor by manipulating CSS. For now, I’m happy with the default:

Step 4: The drop event

They saved the most straightforward for last. Register a drop event listener on the target element and it will be called when the user has dropped the drag element on the target element. Within this listener you can access the data we set earlier by calling event.dataTransfer.getData(key). You can get an array of all the set keys with event.dataTransfer.types.

The docs tell you to call event.preventDefault() at the end of the event so it won’t call the browser’s default drop handler; I haven’t had an issue with this, but I can only assume it’s a good idea. I also call event.stopPropagation() on a parent so as not to trigger the drop event on its children.

There’s also a dragendevent called on the dragging element, but I’d suggest to keep all of your drop handling in one place.

Signifiers and Feedback: It shouldn’t be a drag to drag

Terrible pun aside, let’s talk about the basic visual feedback in this initial design, and how I applied the API with design principles in mind.

To signify that a highlight is draggable, I’ve set the cursor to grab:

For highlights that span more than one paragraph, I dynamically generate a drag image that contains all of the dragging elements, rather than just the one the user clicked on, to accurately represent what data is being transferred.

As for visualizations, in this design they all have this little dragger in the corner, which serves both as an indicator and as a handle for the drag. (We don’t want to make visualizations draggable from anywhere, as many of them have internal mouse events.)

As you can see in this gif, the entire notebook is registered as a drop handler; by default, it puts what you dragged at the bottom of the notebook. But there are also individual drop handlers before and after each node in the notebook. As you drag over one, it changes color, indicating your target position:

As I’m writing this, I’m still not sure whether these indicators are tall enough, and it might be better not to necessitate dragging on to them at all, and have a “snap” feature when you’re close enough. It’s a continual process of tweaking, based on my own experience using the design and that of my teammates and, eventually, our target users. For me, it isn’t tedious: it’s part of what makes web development dynamic and enjoyable.

The Data by Design Notebook: Highlighting

Dan Jutan — Sun, 07 Jun 2020 16:23:10 +0000

From the start, we knew that Data by Design was going to have a powerful dynamic notebook feature that would not only allow the reader to take notes on a chapter, but to add any part of that chapter into their notebook. Although it may sound like a self-contained feature, the notebook functionality involves multiple subsystems:

A highlight system (to allow the user to select text and save text, and to serialize the location of that highlight for later)
A drag-and-drop system (to allow the user to drag their highlights and other visualizations from the chapter to the notebook, and then to rearrange their notes)
A visualization state system (to keep track of the relationships between the visualizations and to manage the “copying” of the visualization when dragged into the notebook)
A user authentication system (to allow the user to create an account and to store and retrieve their notebook to and from the server)
The notebook user interface (to allow the user to add text notes and edit their notebook)

In this blog series, I’ll be describing a bit of the development process for each of those subsystems, starting in this post with the highlight system.

Note: The screen captures in this post use an early development version of both the chapter text and the notebook feature.

As mentioned, the goal of the highlight system is to allow the user to make visual highlights in the text. Those highlights can then be dragged into the notebook. Along with the content of the highlights, the precise location of the highlight is stored so that the user can click on the highlight in the notebook to go to its origin. Additionally, enough data must be stored about the highlight to repopulate it later, because we want to put all the highlights back in place when a user logs in.

This highlight feature was one of the most challenging and rewarding front-end features I’ve ever built. Lets dive into some of the challenges that came up along the way!

Challenge #1: It isn’t enough to just change the color

It isn’t enough to just change the color of the highlighted text; we have to treat a highlight as its own object. Imagine the following scenario: a user drags from point A to point B to create a highlight to drag that text into the notebook. Later, they drag from point B to point C. The user would expect to be able to drag this new highlight independently, but if we only kept track of the visual indication of the highlights, we wouldn’t be able to recognize it as its own highlight. The chief purpose of this highlighting functionality is to enable the user to select text and then drag that text to add it to the notebook, and it’s often the case to want to highlight two consecutive lines but comment on them separately.

While we must maintain the distinction between separate but juxtaposed highlights, we do want to allow the user to “edit” their highlight: to naturally expand a highlight by clicking and dragging outside an existing highlight to inside the highlight to select more text, or to subsume a highlight with a bigger one. In other words, we must differentiate between selections that represent a new highlight and those that represent a change to an old one.

Challenge #2: We dynamically create, remove, and consolidate elements

In web development, there is no easy way to change the styling and functionality of part of an element; generally, you create a new element that contains that part, and then manipulate that new element. In our case, when the user highlights part of a paragraph, we need to create a new span that contains what they highlight.

This can get tricky. What happens when the user selects from one paragraph into another? We can’t create a child element across two parents! Instead, we analyze the user’s selection, and if it bleeds into another block element, we create another span to place there. But since both spans belong to the highlight, and we want them to be dragged together, we must keep track of their relationship. Additionally, when the user starts in one paragraph, drags over another and into a third, we have to turn the entire middle paragraph into a span, and connect it to both the span in the previous element and the span that comes after.

Another challenge: what if the user highlights text that starts in a non-styled part of the paragraph, but ends in a styled or interactive one, like a link? We must make sure that the new highlight span keeps the styles and attributes of the correct portion of the highlighted text.

Luckily, the browser provides the Range API, which helps extract the contents of a user’s highlight, even when it starts and stops in incongruous positions. By analyzing, manipulating, and reinjecting those elements, I was able to create a consistent solution.

Challenge #3: Serialization

When the user logs in, we want to populate the page with their previous highlights. To enable this, I had to find a way to represent a highlight as a string that we could save and load from the server. Once I built the procedure to turn a selection range into a highlight, I could use that to repopulate the highlights as long as I could save that selection range—a start position and end position on the page—to the server. What this essentially amounts to is taking an element and generating a unique selector string than can be saved and used to find that element.

It turns out that generating concise selector strings is not a trivial problem, with dozens of solutions and libraries existing online. Since I joined the team, one of our goals for Data by Design has been to keep the number of dependencies low, and most of these libraries would be various degrees of overkill: while these selector-generation libraries create efficient general-use CSS selection strings compatible with built-in functions, I didn’t care about the format of the strings, since I was going to use my own deserialization function.

Ultimately, I opted for a simple solution that stores the path from the closest ancestor with an id—given that our element ids are unique—as well as the offset within that element. This solution might struggle if we expect the chapters to change after publication, but we can make the system more resilient to structural changes by adding ids to every low-level paragraph. (A system resilient to textual changes could save the actual HTML of the highlight and do a best fit when repopulating.)

Challenge #4: Grouping and Dragging

As mentioned earlier, multiple highlight span elements may make up a given highlight. That means that the user should be able to start dragging from any of them and expect that the whole group will come along. We thus need to store the relationship between the elements. (Recall the AB, BC example: we that we can’t simply check the DOM to see if there’s a highlight span directly after the dragged one, as that span might be part of a different highlight sequence.)

One way to do this is to make the Highlightable mixin (the object that handles all of this highlight creation logic) stateful: storing a reference to each highlight span, keeping track of their relationships, and providing lookup functionality to grab a highlight span’s buddies. I felt that the simpler solution was to put a little bit of state on the span elements themselves; namely, to add “overflow-next” and “overflow-prev” classes to signify when an element comes with a friend. Then, no matter the starting point, the drag event handler traverses and collects the elements. More on the drag-and-drop sequence in the next blog post!

Challenge #5: Non-highlightable elements

I saved the hardest for last. The multimedia chapters of Data by Design have more than just text; embedded images and interactive components can be dragged into the notebook, but it doesn’t make sense to make them highlightable in the same way that text is. Given that, what should the highlighting experience be like when the user encounters these elements? To maintain a non-obtrusive experience, I allow the user to select over these elements, but when they let go of the drag, any element that is not highlightable will not be highlighted.

This involves an array of allowed “highlightable” block element types (which can be configured by each chapter), as well as functionality to examine an element’s closest block parent. One of my first attempts involved removing the entire selection from the DOM and returning each of its block children to the DOM as either unchanged or as a highlight span element. However, my best solution involved figuring out the highlightable portions and only extracting and modifying those.

After the user finishes selecting, we break down the range into an array of smaller, highlightable subranges. That array is iterated, creating highlightable elements; the non-highlightable components aren’t added to this array and are left untouched.

That’s it for the highlight feature! While it’s tailor-made for Data by Design, it provides a feature that I think a lot of digital humanities projects could benefit from. I wrote (and rewrote and rewrote) more code for this feature than for any other part of this project, and I learned a lot about the ins-and-outs of the DOM and some of its less well-known APIs. As an undergrad, it felt amazing to be entrusted with a problem as complex is this one, and I’m proud of what I came up with.

Data by Design Chapter Navline Implementation

Jianing Fu — Fri, 29 May 2020 18:45:34 +0000

I’ve been working on the implementation of the chapter navigation line.

Here are some features listed:

Blocks/Nodes with different colors represent different types of visualization, where their position in the navline is determined by their position in the chapter.
The user is able to scroll through the chapter and having their reading progress represented by the colored part of the navline.
The user’s current location is indicated by a separate block/node on the navline.
The user’s highlight(linked to the notebook) is represented by a separate column of blocks/nodes (peabody and dubois) or its background (playfair). The transparency of the blocks/nodes/background color is determined by the number of highlights attached to the subsection.
Able to hover over the block/node and able to jump to the visualization location in the chapter by clicking.

https://dhlab.lmc.gatech.edu/wp-content/uploads/2020/05/peabody.mp4

Peabody Chapter

https://dhlab.lmc.gatech.edu/wp-content/uploads/2020/05/dubois.mp4

Dubois Chapter

https://dhlab.lmc.gatech.edu/wp-content/uploads/2020/05/playfair.mp4

Playfair Chapter

Floor Chart Topper

Lauren Klein — Tue, 19 Mar 2019 13:19:33 +0000

Before we forget to post, some photos of the floor chart topper! (Designed and sewn by Sarah Schoemann).

The code that runs behind FloorChart

NoahSutter — Thu, 07 Mar 2019 22:50:57 +0000

Floor Chart Code

The Floor Chart is a 30×30 grid of buttons and matching 30×30 grid of LEDs. Each button is underneath an LED. The floor chart code can be thought of at a high level as taking input from the buttons and translating those into color changes in the LEDs. The ultimate goal is that clicking a particular button results in the cycling through a set of colors for the corresponding LED.

The 30×30 grid of buttons is made of 30 vertical strips overlaid on 30 horizontal strips. When pressing a particular place on the grid, one vertical strip and one horizontal strip will connect completing a circuit. This connection can be read as inputs to the digital pins on the arduino. Furthermore, the leds can be controlled through digital pins on the arduino turning them on and off and changing the colors. Each arduino has about 50 of these digital pins. Because of this limited number of digital pins, it would not be possible to hook up all 60 metal strips and the 30 LED strips. For this reason the 60 metal strips are broken up in half. So one arduino reads the input from 30 horizontal strips and 15 vertical strips, this means one arduino can read 450 of the 900 total buttons. So there are two arduinos reading in inputs from buttons.

Once these two arduinos get the button input, they need a way to relay that information to the corresponding LEDs. The way they do this is by sending that information to a third arduino that then relays that information to the LED on one of the 30 LED strips that matches up with the button that was pressed.

The Details

The code to do what was described above is broken down into three parts. The code that controls changing the color of the LEDs (the Master code), the code responsible for reading button presses in two separate arduinos (Minion code), and the code that supports reading button presses by reading changes in the current of the strips.

master.ino

The Master code is run on the arduino that changes the color of the LEDs. The Master code is called Master because it is responsible for all communication between arduinos as well as changing the color of appropriate LEDs. In order to change the color of specific LEDs the Master must first get information about button presses. It does so by setting up an I2C connection with the two Minion arduinos (arduinos that read button input) and polling each of the minions at intervals to see if they have received any button input. If the minion has observed any button input it will tell the Master by sending a payload over I2C that contains the address of the Minion, the row and column of the metal strips that changed state as well as the type of state change such as pressed or released (discussed further in Minion section). If there is no button input, the Minion will send an empty payload.

The row and column of the button as well as the information about the type of state change it is experiencing can be used to then update the colors of the LEDs. In order to update these LEDs we must have a way to keep track of the state of every pixel. The state of every LED is stored in an array with a slot for every pixel. Each element of the array is an unsigned integer of 32 bits. The most significant 24 bits of this integer store the time of the last interaction with the LED (this supports long presses and other time sensitive button inputs) while the least significant 8 bits store the state of the pixel.

When receiving a button input the Master will look up the current state of that pixel and take the appropriate action dependent on the current state and the input. For example if the pixel is currently green (ex: state 2) and its button gets pressed, it might be the case that the pixel should change to red (ex: state 3). In that case the Master would look up the state of the pixel which would be 2 and change the pixel to red based on that state and the fact that there was a button press. Once the pixel is changed physically, the state is then updated in the pixel state array.

This process of polling the Minions for button input then updating the state of the LEDs both physically and in their state array is repeated on an interval, in this case every 100ms.

masterTest.ino

This code is used to test that all of the LEDs are connected correctly. This code will light up each of the LEDs one at a time, first red then blue and repeat.

minion1.ino/minion2.ino

As stated above, the Minion code is responsible for reading button changes and sending those changes to the master. Each minion takes on half of the button grid to read for changes. The way the Minion reads information from its digital pins is by creating a Keypad which is a set of code that represents the vertical and horizontal strips of metal that make up the button grid and allows the state of each button to be read and repeatedly requesting to update the state of the Keypad.

When starting, each Minion joins the I2C bus with the master and sets up a function to be called when the Master wants information about the state change of buttons. This function grabs a button change from the Keypad containing the row, column, and type of button change. If there was not button change since the Minion last checked, Keypad will return a button change with all 0s for values. This button change is then packed up into a payload with the address of this Minion (used in the Master to adjust rows and columns depending on which half of the inputs the Minion reads), the row and column of the button change, and the type of button state change (press or release).

The Minion continually requests that the Keypad update the state of the input grid and only stops to send button state changes when requested by the Master.

Keypad.cpp/Keypad.h

Keypad is a data structure that supports the Minions in reading the state of the input grid (buttons). Keypad stores the row and column digital pin assignments as well as the overall size of the grid. Keypad also contains a method getKeys() for the Minions to call that updates the state of each of the buttons.

The way the update method works is that it can be called every 10ms max. If it has been at least 10ms since the last time called, it will scan they grid to check for any vertical strips touching any horizontal strips. This is done by sending a current through each vertical stirp one a time and reading the state of each of the horizontal strips to see if they are connecting with the vertical strip that has current running through it. This way the current state of the grid can be found.

Once the current state of the grid has been updated the code compares the current state to the previous state and if there is any change it adds that change to a buffer. The buffer is used as the Master can request button updates slower than the Minion can read button inputs. It could then be the case that the Minion finds two button changes in the time it takes the Master to request a button change. If there was no buffer the Master would only get one of the button changes. Once the button change is added to the buffer the previous state is set to the current state and the process of updating the state of the buttons can start again.

When the Minion requests a button change event, Keypad simply pops the first one off the buffer it has. This change event is the same one that the Minion then sends to the Master.

Overall

Overall the Minions keep updating the current state of the grid and find the button changes. The Master then communicates with the Minions to find out the button changes. Once it has a button change the Master then updates the state of the LEDs as appropriate. This process can be seen as drawn below.