R snippets: Visualizing tables in ggplot2

Saturday, May 5, 2012

Visualizing tables in ggplot2

Recently I wanted to recreate assocplot using ggplot2. In the end I propose a simple way to visualize data arranged two-way tables using geom_tile.

I used Titanic data set as an example combining age and sex dimensions to get two-way data.

I plot residuals of Chi-squared test (like in assocplot) on the left and probability of survival on the right. A nice feature of geom_tile is that nicely highlights missing data (children were not crew members). Here is a code generating the plots:

library(ggplot2)

library(grid)

library(reshape2)

m <- acast(melt(unclass(Titanic)), Class ~ Age + Sex, sum)

names(dimnames(m)) <- c("Class", "Age_Sex")

df <- melt(unclass(chisq.test(m)$res), value.name = "residuals")

g1 <- ggplot(df, aes(x = Class, y = Age_Sex)) +

geom_tile(aes(fill = residuals)) +

scale_fill_gradientn(colours=c("blue","white","red")) +

theme_bw()

m <- acast(melt(unclass(Titanic)), Class~Age+Sex,

function(x) {x[2] / sum(x)})

names(dimnames(m)) <- c("Class","Age_Sex")

df <- melt(m, value.name = "survived")

g2 <- ggplot(df, aes(x = Class, y = Age_Sex)) +

geom_tile(aes(fill = survived)) +

scale_fill_gradient(low = "blue", high = "red")+theme_bw()

grid.newpage()

pushViewport(viewport(layout = grid.layout(1, 2)))

print(g1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))

print(g2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))

And the result:

9 comments:

SalilMay 6, 2012 at 10:56 AM
Can you please explain what the 'survived' scale represents.

Thanks, I really like this idea.
ReplyDelete
Replies
Bogumił KamińskiMay 6, 2012 at 12:39 PM
"survived" scale represents probability that a member of given group (intersection of Class/Age/Sex) survived in Titanic accident.

melt(unclass(Titanic)) results in a data frame with 5 columns:

> head(melt(unclass(Titanic)))
Class Sex Age Survived value
1 1st Male Child No 0
2 2nd Male Child No 0
3 3rd Male Child No 35
4 Crew Male Child No 0
5 1st Female Child No 0
6 2nd Female Child No 0

Then if we use formula "Class~Age+Sex" a two element vector is passed to function containing counts for "survived" levels "No" and "Yes".

You can find it out by inspecting the following code:

> acast(melt(unclass(Titanic)), Class~Age+Sex,
+ function(x) { str(x); x[2] / sum(x)})
num(0)
num [1:2] 4 140
num [1:2] 13 80
num [1:2] 89 76
num [1:2] 3 20
num [1:2] 118 57
num [1:2] 154 14
num [1:2] 387 75
num [1:2] 670 192
num [1:2] 0 1
num [1:2] 0 13
num [1:2] 17 14
num [1:2] 0 0
num [1:2] 0 5
num [1:2] 0 11
num [1:2] 35 13
num [1:2] 0 0
Adult_Female Adult_Male Child_Female Child_Male
1st 0.9722222 0.32571429 1.0000000 1.0000000
2nd 0.8602151 0.08333333 1.0000000 1.0000000
3rd 0.4606061 0.16233766 0.4516129 0.2708333
Crew 0.8695652 0.22273782 NaN NaN

Interestingly the first "num(0)" in the output is due to additional call of function in vaggregate because no "fill" argument is passed to acast.
ReplyDelete
Replies
AnonymousJune 3, 2012 at 9:40 AM
Very cool.

I'm having trouble following some of the code, though. In the acast function, why does x[2] return the number of survivors for each group?
ReplyDelete
Replies
AnonymousJune 4, 2012 at 5:51 AM
Thanks for the quick response!

I now understand why [2] would yield all the "Yes" values, but I'm still a bit confused as to why x corresponds to the survived column. Is this because acast looks for the first argument not specified in the formula?

Thanks again.
ReplyDelete
Replies
bi2open.frDecember 13, 2012 at 2:43 AM
Hello, I tried your code but it can not display the graphical class you think you know the problem?
it displays black.
Thank you in advance
ReplyDelete
Replies
Bogumił KamińskiDecember 13, 2012 at 3:16 PM
Maybe your default graphic device is not wide enough.
Try widening it using the mouse.
ReplyDelete
Replies

Add comment

Note: Only a member of this blog may post a comment.