The first (available) 100,000 bytes of data from the website’s database is extracted and turned into a bubble visualization, with occurrences of different words (and themes) driving the size of bubbles. The final bubble maps show a garble of letters and occasionally words; it may seem nonsensical, but small patterns emerge. You can tell that beyond the daily grind of promoting the news site, top headlines included the Zimmerman trial, the Egyptian revolution, Snowden, and the Boeing 777 crash. The words stored in the databases have no spatial tag, but comparing the “maps” of data show the varying ways in which news sites structure their sites. One is driven by time stamps; a few are driven by self-branding with names of their sites appearing the most times; and only a few have ‘news’ relevant words catalogued within the set pulled. Content structures of sites look very different ; compare the NY Times to Salon to Fox News.
It would be difficult to review this data without a human, as so many of the letters and word clips only made sense to me because of the context.
Websites reviewed on July 6, 2013:
- google news –> server blocked
- nytimes.com –> server blocked
- http://www.nytimes.com/pages/national/index.html used instead
Resulting bubble maps:
I thought this was a neat way of using a kind of self-mapped data to compare things that would be normally difficult to compare. The next step would be working with the algorithm to tweak how things were grouped, labelled and colored, and further investigate how data is stored on similar websites.