Friday, September 7, 2012

Simulation metamodeling with GNU R

I am one of the organizers of ESSA2013 conference that will take place in September 2013 in Warsaw, Poland. The conference scope is social simulation and in particular methods of statistical analysis of simulation output (metamodeling). As we have just issued Call for Papers for the conference so I decided to post a simple example of a metamodel.

Recently I had to calculate variance of a random variable given as:
q(p-2+q/N),
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. After some lengthily calculations I came up with the following formula for the variance:
However the result is so long that I was not sure that it was correct, so I decided to verify it using metamodeling.

First I have generated the data for the simulation with the following code and visualized it. The data covers a grid where p is from interval [1;2] and N spans from 1 to 10. In each grid point 100000 simulations are performed and variance of q(p-2+q/N) is calculated:

library(plyr)
library(lattice)

sim <- function(param) {
  q <- rbinom(100000, param$N, 2 - param$p)
  c(v = var((param$p - 2 + q / param$N) * q))
}

set.seed(1)
data.set <- expand.grid(p = seq(1, 2, len = 201), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim, .progress = "text")
levelplot(v ~ p + N, data = data.set,
          col.regions = terrain.colors, xlab = "p", ylab = "N")

Here is the resulting plot:
Next I have estimated the linear regression model with parameters reflecting analytical specification given above:

summary(lm(v~(p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
             * (N + I(1 / N)), data = data.set))

The output looks as follows:

Call:
lm(formula = v ~ (p + I(p^2) + I(p^3) + I(p^4)) * (N + I(1/N)),
    data = data.set)

Residuals:
       Min         1Q     Median         3Q        Max
-0.0116801 -0.0007195  0.0000063  0.0007608  0.0098659

Coefficients:
                Estimate Std. Error t value Pr(>|t|)   
(Intercept)    32.089136   0.235380  136.33   <2e-16 ***
p             -88.232888   0.655760 -134.55   <2e-16 ***
I(p^2)         88.225183   0.674590  130.78   <2e-16 ***
I(p^3)        -38.095417   0.303854 -125.37   <2e-16 ***
I(p^4)          6.014949   0.050597  118.88   <2e-16 ***
N              -8.012673   0.027781 -288.42   <2e-16 ***
I(1/N)        -26.082527   0.303365  -85.98   <2e-16 ***
p:N            20.034053   0.077398  258.84   <2e-16 ***
p:I(1/N)       75.213337   0.845166   88.99   <2e-16 ***
I(p^2):N      -18.033947   0.079621 -226.50   <2e-16 ***
I(p^2):I(1/N) -79.203708   0.869434  -91.10   <2e-16 ***
I(p^3):N        7.014852   0.035863  195.60   <2e-16 ***
I(p^3):I(1/N)  36.085047   0.391618   92.14   <2e-16 ***
I(p^4):N       -1.002405   0.005972 -167.85   <2e-16 ***
I(p^4):I(1/N)  -6.013089   0.065212  -92.21   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.002197 on 1995 degrees of freedom
Multiple R-squared: 0.9999,     Adjusted R-squared: 0.9999
F-statistic: 1.976e+06 on 14 and 1995 DF,  p-value: < 2.2e-16

As we can see the model fit is almost perfect and the estimates are very close to analytical results. This reassured me that my calculation was correct.

In this way you could use simulation to verify analytical results. Of course simulation metamodeling has much broader applications - especially when we do not have analytical results.

I hope to organize a special session during ESSA2013 dedicated to simulation metamodeling using GNU R. So if you are interested in such topics and willing to promote GNU R in simulation society you are welcome to come to Warsaw.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.