Using Tag Clouds to Visualize Revision

You probably know this: for folks in many humanities departments at large research schools like UIUC, a published academic monograph is a key part of getting tenure. That book is often derived in part from the dissertation, but there's this culture in place where, no matter how great that dissertation is, one generally has to demonstrate how much the book is different from the dissertation. This demonstration is usually done in prose.

Enter the tag cloud. Dump the text from both documents into an online clouder like tagcrowd.com, and compare results.

Here's the cloud from my dissertation, completed in 2005:

created at TagCrowd.com

And then here's the cloud of the book manuscript circa 6.2009:

created at TagCrowd.com

As you can see, some themes have stayed the same; others have disappeared or been added. For me, knowing these two documents so well, the tag clouds are detailed maps of the writing and rewriting. Each term is a pathway into so many decisions, dead ends, and productive avenues.

Next time I teach revision, I think I'll have students cloud their drafts.


  1. Glad to see this. I've gained a lot of insight (learned much, that is) from similar clouding practices to the ones you've described here, and I've tried in my dissertation to articulate some of the ways tag clouds might make a formative contribution to "network sense." During the most recent hiring cycle, one of my job talks was titled "Cloud Composing," (on cloud-making practices for research and teaching, basically) and in a post-talk Q&A the conversation toward how nifty it would be to use the tag cloud as an abstract writing prompt of sorts. Something along the lines of, "Write the document that would render into this cloud." Something like this hints at the usefulness of clouds for invention, too.

  2. total genius. i hope you're planning to include this as part of your dossier.

  3. Very cool, Derek. Since posting this, I've been thinking a bit about what tag clouds speak to, in terms of composition and language, and what they miss.

    Of course, what they reveal is lexical prevalence ... or content-word prevalence. Because of this, how much do they leave out?

    Clearly aspects of form, structure, organization are unclouded. Grammatical and non-grammatical writing tag identically, and such important features such as code switching are invisible to the clouder.

    But I still love what the clouds reveal. In the cloud of my book, for instance, "http" shows up indicating a move to many more online sources. "York" appears in both clouds, as they include my bibliographies into the data pools. New York, we know from the clouds, is still such a hub of publishing.