Under the Hood of Data-Driven News Sites

Here’s a fun visualization I made for a cartography course at John Hopkins (my MS-GIS alma mater). I “mapped” data flowing from different news sites using bubble visualizations driven by Infocatpor and found on D3, a Javascript library for data-driven document visualizations. Visualizing the data emerging from landing page for the modern day news page could give a peek under the hood of the website, and goals of the information architecture. Being a SEO and info architecture nerd myself, I was curious to see what I’d find, especially on high-powered, fast-moving news sites.

The first (available) 100,000 bytes of data from the website’s database is extracted and turned into a bubble visualization, with occurrences of different words (and themes) driving the size of bubbles. The final bubble maps show a garble of letters and occasionally words; it may seem nonsensical, but small patterns emerge. You can tell that beyond the daily grind of promoting the news site, top headlines included the Zimmerman trial, the Egyptian revolution, Snowden, and the Boeing 777 crash. The words stored in the databases have no spatial tag, but comparing the “maps” of data show the varying ways in which news sites structure their sites. One is driven by time stamps; a few are driven by self-branding with names of their sites appearing the most times; and only a few have ‘news’ relevant words catalogued within the set pulled. Content structures of sites look very different ; compare the NY Times to Salon to Fox News.

It would be difficult to review this data without a human, as so many of the letters and word clips only made sense to me because of the context.

Websites reviewed on July 6, 2013:

Resulting bubble maps:


I thought this was a neat way of using a kind of self-mapped data to compare things that would be normally difficult to compare. The next step would be working with the algorithm to tweak how things were grouped, labelled and colored, and further investigate how data is stored on similar websites.