R to the rescue

I wanted to create a graph for a paper I was writing. A rather complex graph that would show the difference between six different techniques on five different types of storms on three different criteria. Representing 6x5x3=90 interactions in an easily interpretable way is hard, especially if you want to do it with confidence bounds and the like. I had a general idea of how to create the graph and was wincing internally at the thought of doing it in Excel. Chances are that I would have to rerun the cases if I discover problems and will have to regenerate the graphs quite often. Enter the statistical package, R.

This is the final graph, so you know where I'm going with this:
and this is the description of the graph in the paper so that you understand the comparisons made possible by the organization of the graph:
Each row of graphs consists of the evaluation of a case on the three criteria described in Section 1b. In each graph, the best two methods are shown in black. If several techniques tied for second place (within the bounds of statistial significance shown by the error bars) as in the case of mismatches for the first case, there may be more than two black bars in a graph. Similarly, the worst two methods (with a rank of 5 or 6) are shown with white bars. Gray bars indicate middling (rank of 3 or 4) performance.
So, here are the issues: (1) create a figure with multiple graphs (2) arrange them (3) have Greek symbols for the y-axes (4) The barplot should have different colors indicating rank (5) Should have confidence intervals (error bars) on all the graphs.

I first put all the numbers into a text file. Then, I ran the R script, snippets of which are below.

First, read the data and set up margins for the plot. The output will go into a PNG image:

png( filename="allscores.png", width=1200, height=1500, pointsize=20 );

par(mar=c(3.1, 4.3, 4.1, 1.1))

data <- read.table("allscores.csv", header=TRUE, sep=",");
We want 5 rows and 3 columns. Set up column names:
par( mfrow = c(5,3))
technique <- data[1:6,2];
ylabels <- c( expression(sigma[size]~km^2), expression(e[xy]~km), "Median duration (s)")
shortstat <- c( "Mismatches", "Jumps", "Length")
Loop through and pull out the data:
for (caseno in 1:5) {
startrow <- caseno*6 - 5;
case <- data[startrow,1];
for (statno in 0:2) {
values <- data[startrow:(startrow+5) , 3+statno*4];
valuesLB <- data[startrow:(startrow+5) , 4+statno*4];
valuesUB <- data[startrow:(startrow+5) , 5+statno*4];
rank <- data[startrow:(startrow+5) , 6+statno*4];
Set up the colors for the bars in the barplot:
colors <- c("gray", "gray", "gray", "gray", "gray", "gray" );
for (i in 1:6){
if ( rank[i] == 1 || rank[i] == 2 ){ colors[i] = "black"; }
if ( rank[i] == 5 || rank[i] == 6 ){ colors[i] = "white"; }
Draw the bar plot:
xpoints <- barplot(height=values, col=colors, ylab="", names=technique, xlab="
", ylim=c(0,max(valuesUB)) )
Draw the error bars:
lh <- 0.2;
segments(xpoints, valuesLB, xpoints, valuesUB, col="red")
segments(xpoints-lh, valuesLB, xpoints+lh, valuesLB, col="red")
segments(xpoints-lh, valuesUB, xpoints+lh, valuesUB, col="red")
Set up the title, increasing the font of the y-axis by 40%:
title(ylab=ylabels[1+statno], cex.lab=1.4)
title(main=paste(case,": ", shortstat[1+statno]," by technique") )
and voila ... The neat thing is that I can now run this script on my data file and get the graph. No pointing and clicking required ...

No comments:

Post a Comment