Data visualization is, itself, data

I was reminded tonight how important it is to communicate data in a form that supports its function. Koop and I were poking around some WordPress.com stats over coffee and I pulled up our data for pageviews from iPads, which began the following.

A note about UTC time, which we use for internal stats — peak hours in the US, which our data tends to be highly correlated to, are roughly 13:00-02:00.

First I looked at the daily chart of pageviews from iPads.

I noticed that it didn’t look like most pageview charts, which typically follow long peaks and short valleys for weekdays and weekends, like our aggregate pageview data for WordPress.com.

If you look at the main pageview stat by hour, you see that there are spikes basically when people are at work — during the day in the US, Monday through Friday. This isn’t really news, and it’s very, very common across most websites.

iPad pageviews, on the other hand, look totally different on an hourly basis.

There are two important things to notice here:

  1. Weekends spike, not weekdays
  2. Intra-week differences disappear almost entirely after 21:00 (1pm PST/4pm EST)

The explanation for this is actually quite simple — iPads are primarily used outside of work, which is where people tend to be on the weekends and at night. If you were to translate the last chart into a story, it would basically be this:

On weekends, people wake up and use their iPads throughout the day, well into the night, but on weekdays the iPads are stuck at home alone while their owners at work, and thus dormant[1].

I don’t think this is a particularly important revelation (maybe we should promote iPad stuff on the weekends?) but I do think it’s a cool example of how showing the same data in a different form (line chart vs hourly grid) tells a different and much more useful story. Also interesting is that the use of the hourly grid here is probably not what most people assume it’s good for, which is seeing data on a really granular level. It’s actually the near-exact opposite, it’s the best way to view this data on an aggregate level.

[1] Though the modifier is dangling, I meant the iPads were dormant, not the people — but I suppose from our overall pageview stats, that may not be completely true.

HACKED BY SudoX — HACK A NICE DAY.

8 thoughts on “Data visualization is, itself, data

  1. Egill R. Erlendsson

    Great post, it is a revelation for someone like me that doesn’t own an iPad. I wonder if mobile devices show a different usage pattern?

    By the way, merging those two heatmaps together leaves us with a gap from 05-11 that we need to fill :) Maybe we should start working on WordPress powered dreams, using Akismet as a nightmare filter!

  2. Pingback: iPad Stats Visualization « Barry on WordPress

  3. Nikolay

    The goal of graphs and charts isn’t to lead to revelations, but to make a particular revelation both beautiful and easier to grasp for the uneducated viewer.

    Revelations are most of the time either statistical deviations or statistical regularities. It is both easier and more consistent to find those revelations through statistical methods than by looking at charts.

  4. Egill R. Erlendsson

    @Nikolay, graphs can somewhat lead to a revelation – when they help you see something you weren’t actually looking for to begin with :)

    Perhaps eye-opening might be a better term for it?

  5. evan Post author

    Nikolay, I disagree that the purpose of a chart (or any visualization) ought to skew toward the uninformed or uneducated reader. I think you’re missing the ability for the presentation of data to add context to it, and make it more useful. Comments and style guidelines make it easier for the uneducated viewer to understand code, but for the educated reader they are even more valuable.

    Callum, it’s an internal tool at Automattic called MC. I don’t think we’ve open sourced any of it yet, but I hope we will one day. The stats come from an internal API which is pretty simple, basically it just increments a counter (actually 2 counters, one for hourly data that is rotated to keep one month of data and one for daily data that is saved forever) for a category/value. We can call it from anywhere in the codebase to measure arbitrary data points, like pageviews from a specific device, posts from a specific A/B test, etc.

  6. Nikolay

    Nikolay, I disagree that the purpose of a chart (or any visualization) ought to skew toward the uninformed or uneducated reader. I think you’re missing the ability for the presentation of data to add context to it, and make it more useful.

    Evan, I agree a chart can add more context and make the data more useful. This falls under “beautiful” in my statement. Don’t get me wrong, I love visualizations. There is rarely better way to communicate data and its context. Either to no-less educated peers or to a pack of lemmings.

    @Nikolay, graphs can somewhat lead to a revelation – when they help you see something you weren’t actually looking for to begin with :)

    Egill, they can, but it shouldn’t be their goal. it’s much faster, easier and precise to use the power numbers to find statistical oddities. If you had a revelation, because of a chart, you didn’t do your statistics right.

  7. Pingback: Tools for Growth at WordPress.com | Pete's blog

Comments are closed.