Saturday, August 25, 2012

Exporting ctree object to Asymptote

When producing regression or classification trees (standard rpart or ctree from party package) in GNU R I am often unsatisfied with the default plots they produce. One of many possible solutions is to export a tree plot to Asymptote.
The code I have prepared generates an Asymptote file based on generated ctree object. Here is the procedure that does the conversion.

treeAsy <- function(tree,       # ctree to be plotted
                    off.f,      # tree plot fixed shift
                    off.v,      # tree plot variable shift
                    file.name,  # output file name
                    preamble) { # preamble for asy
  minv <- +Inf
  maxv <- -Inf
  response <- names(tree@responses@variables)

  plot.node <- function(root,
                 nest = 0,            # level in a tree
                 pOffset = 0,         # plotting offset
                 condition = "root"# split condition text
                 id = "root") {       # block name in asy
    if (length(root$prediction) > 1) {
        stop("Only single prediction value supported")
    }
    if (root$prediction < minv) minv <<- root$prediction
    if (root$prediction > maxv) maxv <<- root$prediction
    child.l <- ""
    child.r <- ""
    if (!root$terminal) {
        if (class(root$psplit) == "orderedSplit") {
            varN <- root$psplit$variableName
            point <- root$psplit$splitpoint
            left <- paste(varN, "$\\leq$", point, sep="")
            right <- paste(varN, "$>$", point, sep="")
        } else {
            stop("Only orderedSplit supported")
        }       

        add <- "add(new void(picture pic, transform t) {
  blockconnector operator --=blockconnector(pic,t);\n  "
        child.l <- paste(plot.node(root$left, nest + 1,
            pOffset - off.f - off.v ^ nest,
            left, paste(id,"l",sep="")),
          add,id,"--Down--Left--Down--",id,"l;\n});\n\n", sep="")
        child.r <- paste(plot.node(root$right, nest + 1,
            pOffset + off.f + off.v nest,
            right, paste(id,"r",sep="")),
          add,id,"--Down--Right--Down--",id,"r;\n});\n\n", sep="")
    }
    paste("block ", id, " = rectangle(Label(\"",
      condition, "\"),
  pack(Label(\"n=", sum(root$weights), "\"),
       Label(\"", response, "=",
       format(root$prediction), "\")),
  (", pOffset, ",", -nest, "), lightgray, col(",
      root$prediction, "));",
      "\ndraw(", id,");\n\n",
      child.l, child.r, sep="")
  }

  treestruct <- plot.node(tree@tree)
 
  cat(file=file.name,
      preamble,
      "\nimport flowchart;\n",
      "pen col(real x) {
  real minv = ", minv, ";
  real maxv = ", maxv, ";
  real ratio = 1 - (x - minv) / (maxv - minv);
  return rgb(1, ratio, ratio);
}\n\n",
      treestruct, "\n", sep="")
  shell(paste("asy -f png", file.name))
}

Each node on the plot contains: the condition leading to it, number of observations and response variable prediction (also intensity of red indicates its relative value).

In order to keep the example simple it is very simplified. Currently handles only regression trees with continuous predictors and will generate errors if variable names contain TeX special characters (like &). Additionally you can control the tree layout only manually by setting variables off.f and off.v or by manipulating picture size in the preamble (and one could write a code to layout the plot automatically).

The code produces png output as I needed this format to show the picture on blog, but of course you can generate eps or pdf file which is probably a more suitable option.

And there is the example of the code use based on standard ctree example:

library(party)

airq <- subset(airquality, !is.na(Ozone))
airct <- ctree(Ozone ~ ., data = airq)
treeAsy(airct, -0.25, 1.4, "tree.asy",
  "size(22cm,12cm, keepAspect=false);")

It gives the following output:



Which is much nicer for me in comparison to default plot generated by plot(airct):


2 comments:

  1. You could try

    plot(airct, type = "simple")

    which is rather similar to what you do in Asymptote. Also, if you want to refine these panel functions for the terminal nodes, you can plug in your own. See the source code of the party package for a look at how these are constructed (using "grid").

    Finally, the partykit package has a rewritten class for tree objects that makes a few things easier.

    Hope that helps,
    Z

    ReplyDelete
  2. Thanks for the tips - especially about partykit package I did not know it and will have to check it out.

    As for using standard party functionality. For me it was easier to write an export to Asymptote than trying to write my own plug in for party plotting (I have considered it at start).

    Two main things I wanted to get were: (1) identical look of terminal and split nodes and (2) consistent graphics when I embed the picture in LaTeX document.

    Of course you can convert plot(airct) to LaTeX using tikzDevice but the output file is not OK. After export one has to manually correct it.

    ReplyDelete