Practical tools for exploratory visualization

Carson Sievert

May 2nd, 2017

Slides: https://bit.ly/plotcon17-talk

Twitter: @cpsievert
GitHub: @cpsievert
Email: cpsievert1@gmail.com
Web: http://cpsievert.github.io/

1 / 22

Data Science Workflow

2 / 22

I love this diagram from the R for Data Science book.

Concisely captures the main components.

Expository vis

plotly.js is awesome for expository/scientific vis!

3 / 22

The web has become the preferred medium for communicating results.

Once you know what you want to show, plotly.js is a great choice!!

Exploratory vis

Data scientists have to juggle many technolgies (R, Python, JavaScript)

4 / 22

JavaScript lacks tools for iteration (necessary for exploration/discovery!)

It is all too easy for statistical thinking to be swamped by programming tasks.

Quote from Brian D. Ripley

5 / 22

So, this is me, in my 2nd year of grad school, deciding to learn D3 & JavaScript.

It took me 6+ months to implement a single interactive visualization.

And let me tell you, you guys, no joke, believe me, I arose from the swamp, and decide I alone will...

☝ 🍊

6 / 22

My mission

A single (R) interface that:

Doesn't require knowledge of web technologies.
Works seamlessly with other "tidy" tools in R.
Easy¹ to declare interactive techniques that support common data analysis tasks².

[1]: 80% should be easy (i.e., don't require extra knowledge), but the remaining 20% should be possible.
[2]: Analysts usually have different needs from the terminal audience.

7 / 22

Interactivity augments data exploration!

Identify structure that otherwise goes missing (Tukey 1972).
Interactive techniques foster data analysis tasks (Cook et al 1996).
- Finding Gestalt, posing queries, and making comparisons.
Better understand/diagnose models (Wickham, Cook, & Hofmann 2015).

8 / 22

Let's not forget -- statisticians have been thinking about the problem for 50 years!
Easy to get lost in a sea of techniques -- easier if you motivate via data analysis tasks.
Not everyone has a need to diagnose models, but everyone has a need to get stuff done

Interactivity augments data exploration!

Identify structure that otherwise goes missing (Tukey 1972).
Interactive techniques foster data analysis tasks (Cook et al 1996).
- Finding Gestalt, posing queries, and making comparisons.
Better understand/diagnose models (Wickham, Cook, & Hofmann 2015).
Generate insights faster (Hofmann & Unwin 1999).

9 / 22

This is especially true as data becomes more accessible...less formal mathematical models testing exact questions, more flexible tools for posing graphical queries about data

No matter how complex and polished the individual operations are, it is often

the quality of the glue that most directly determines the power of the system.

Quote from Hal Abelson -- part of the tidyverse manifesto

10 / 22

Generating faster insights requires good glue. This comes in two parts:

Works seamlessly with other programming interfaces (iteration time!)
Works seamlessly with other graphical interfaces (i.e., can link components from independent systems).

11 / 22

library(tidyverse)
library(plotly)
d <- read_csv('GEOSTAT_grid_POP_1K_2011_V2_0_1.csv') %>%
  rbind(read_csv('JRC-GHSL_AIT-grid-POP_1K_2011.csv') %>%
          mutate(TOT_P_CON_DT = '')) %>%
  mutate(
    lat = as.numeric(gsub('.*N([0-9]+)[EW].*', '\\1', GRD_ID))/100,
    lng = as.numeric(gsub('.*[EW]([0-9]+)', '\\1', GRD_ID)) * ifelse(gsub('.*([EW]).*', '\\1', GRD_ID) == 'W', -1, 1) / 100
  ) %>%
  filter(lng > 25, lng < 60) %>%
  group_by(lat = round(lat, 1), lng = round(lng, 1)) %>%
  summarize(value = sum(TOT_P, na.rm = T))  %>%
  ungroup() %>%
  tidyr::complete(lat, lng)
# make each latitude "highlight-able"
sd <- crosstalk::SharedData$new(d, ~lat)
p <- ggplot(sd, aes(lng, lat + 5*(value / max(value, na.rm = T)))) +
  geom_line(
    aes(group = lat, text = paste("Population:", value)),
    size = 0.4, alpha = 0.8, color = '#5A3E37', na.rm = T
  ) +
  coord_equal(0.9) + 
  ggthemes::theme_map()
ggplotly(p) %>% highlight(persistent = TRUE)

12 / 22

13 / 22

Linking multiple views

14 / 22

Customize/transform the selection

highlight(
  gg, persistent = TRUE, dynamic = TRUE, selectize = TRUE,
  selected = attrs_selected(mode = "markers+lines", marker = list(symbol="x"))
)

15 / 22

Linking animated views

16 / 22

Link with other widgets 17 / 22

library(leaflet)
library(crosstalk)
library(plotly)
sd <- SharedData$new(quakes)
stations <- filter_slider("station", "Number of Stations", sd, ~stations)
p <- plot_ly(sd, x = ~depth, y = ~mag) %>% 
  add_markers(alpha = 0.5) %>% 
  highlight("plotly_selected", dynamic = TRUE)
map <- leaflet(sd) %>% 
  addTiles() %>% 
  addCircles()
bscols(p, map, stations)

18 / 22

library(leaflet)
library(crosstalk)
library(plotly)
# Input data for every view!
sd <- SharedData$new(quakes)
stations <- filter_slider("station", "Number of Stations", sd, ~stations)
p <- plot_ly(sd, x = ~depth, y = ~mag) %>% 
  add_markers(alpha = 0.5) %>% 
  highlight("plotly_selected", dynamic = TRUE)
map <- leaflet(sd) %>% 
  addTiles() %>% 
  addCircles()
bscols(p, map, stations)

TAKE HOME MESSAGE: Build upon uniform data structures!

19 / 22

Standing on the shoulders of giants20 / 22

Thank you! Questions?

More resources:

https://bit.ly/plotcon17workshop
https://cpsievert.github.io/plotly_book

21 / 22

Slides released under Creative Commons

22 / 22

Help

Keyboard shortcuts

↑, ←, Pg Up, k

Go to previous slide

↓, →, Pg Dn, Space, j

Go to next slide

Home

Go to first slide

End

Go to last slide

Number + Return

Go to specific slide

b / m / f

Toggle blackout / mirrored / fullscreen mode

Clone slideshow

Toggle presenter mode

Restart the presentation timer

?, h

Toggle this help

Practical tools for exploratory visualization

Carson Sievert

May 2nd, 2017 Slides: https://bit.ly/plotcon17-talk Twitter: @cpsievert GitHub: @cpsievert Email: cpsievert1@gmail.com Web: http://cpsievert.github.io/

Data Science Workflow

Expository vis

plotly.js is awesome for expository/scientific vis!

Exploratory vis

It is all too easy for statistical thinking to be swamped by programming tasks.

☝ 🍊

My mission

Interactivity augments data exploration!

Interactivity augments data exploration!

No matter how complex and polished the individual operations are, it is often

the quality of the glue that most directly determines the power of the system.

Linking multiple views

Customize/transform the selection

Linking animated views

Link with other widgets

TAKE HOME MESSAGE: Build upon uniform data structures!

Standing on the shoulders of giants

Thank you! Questions?

Data Science Workflow

Help

May 2nd, 2017

Slides: https://bit.ly/plotcon17-talk

Twitter: @cpsievert
GitHub: @cpsievert
Email: cpsievert1@gmail.com
Web: http://cpsievert.github.io/