Saturday, November 19, 2011

randu dataset, part 2

In my last post I have plotted randu dataset to show that all its points lie on 15 parallel planes. But I was not fully satified with the solution and decided to show this numerically.

It can be done in four steps:
  1. identifying four points lying on the same plane and finding its equation (we know that we have 15 planes so it is enough to check 15*3+1 points to find them);
  2. applying the equation to all points in randu dataset to divide them into 15 classes;
  3. veryfying that the classes separate points into 15 parallel planes;
  4. plotting the solution changing colors for points on different planes.
Here is the code I used to do this:

library(rgl)
library(caTools)

# STEP 1
all.combs <- combs(1:46, 4)
i <- 1
repeat {
      model <- lm(z ~ x + y, data = randu[all.combs[i, ], ])
      if (summary(model)$r.squared > 0.99999) {
            break
      }
      i <- i + 1
}

# STEP 2
line.class <- predict(model, randu) - randu$z
line.class <- factor(round(line.class) + 10)

# STEP 3
summary(lm(z ~ x + y + line.class, data = randu))

# STEP 4
with(randu, plot3d(x, y, z, axes = FALSE, col = line.class,
       xlab = "", ylab = "", zlab = ""))
rgl.viewpoint(theta = -3.8, phi = 3.8, fov = 0, zoom = 0.7)

At step three we can see that the model obtains perfect fit. The final figure is plotted below:

2 comments:

  1. It would be a cool follow-up to show analytically that some rotation of randu generated data lie on a series of planes. Maybe this had already been shown...

    ReplyDelete
  2. You can find it at http://en.wikipedia.org/wiki/RANDU.
    However, I thought that the code is a nice example of how one can numerically categorize and plot data located on parallel planes:
    1) without knowing the exact relationship formula;
    2) taking into consideration rounding of observations.

    ReplyDelete