Recently I had to calculate variance of a random variable given as:
q(p-2+q/N),
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. After some lengthily calculations I came up with the following formula for the variance:However the result is so long that I was not sure that it was correct, so I decided to verify it using metamodeling.
First I have generated the data for the simulation with the following code and visualized it. The data covers a grid where p is from interval [1;2] and N spans from 1 to 10. In each grid point 100000 simulations are performed and variance of q(p-2+q/N) is calculated:
library(plyr)
library(lattice)
sim <- function(param) {
q <- rbinom(100000, param$N, 2 - param$p)
c(v = var((param$p - 2 + q / param$N) * q))
}
set.seed(1)
data.set <- expand.grid(p = seq(1, 2, len = 201), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim, .progress = "text")
levelplot(v ~ p + N, data = data.set,
col.regions = terrain.colors,
xlab = "p", ylab = "N")
Here is the resulting plot:
Next I have estimated the linear regression model with parameters reflecting analytical specification given above:
summary(lm(v~(p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set))
The output looks as follows:
Call:
lm(formula = v ~ (p + I(p^2) + I(p^3) + I(p^4)) * (N +
I(1/N)),
data =
data.set)
Residuals:
Min 1Q
Median 3Q Max
-0.0116801 -0.0007195
0.0000063 0.0007608 0.0098659
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept)
32.089136 0.235380 136.33
<2e-16 ***
p
-88.232888 0.655760 -134.55 <2e-16 ***
I(p^2)
88.225183 0.674590 130.78
<2e-16 ***
I(p^3)
-38.095417 0.303854 -125.37 <2e-16 ***
I(p^4)
6.014949 0.050597 118.88
<2e-16 ***
N
-8.012673 0.027781 -288.42 <2e-16 ***
I(1/N)
-26.082527 0.303365 -85.98
<2e-16 ***
p:N
20.034053 0.077398 258.84
<2e-16 ***
p:I(1/N) 75.213337 0.845166
88.99 <2e-16 ***
I(p^2):N -18.033947 0.079621 -226.50 <2e-16 ***
I(p^2):I(1/N)
-79.203708 0.869434 -91.10
<2e-16 ***
I(p^3):N 7.014852 0.035863
195.60 <2e-16 ***
I(p^3):I(1/N) 36.085047
0.391618 92.14 <2e-16 ***
I(p^4):N -1.002405 0.005972 -167.85 <2e-16 ***
I(p^4):I(1/N)
-6.013089 0.065212 -92.21
<2e-16 ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.002197 on 1995 degrees of
freedom
Multiple R-squared: 0.9999, Adjusted R-squared: 0.9999
F-statistic: 1.976e+06 on 14 and 1995 DF, p-value: < 2.2e-16
As we can see the model fit is almost perfect and the estimates are very close to analytical results. This reassured me that my calculation was correct.
In this way you could use simulation to verify analytical results. Of course simulation metamodeling has much broader applications - especially when we do not have analytical results.
I hope to organize a special session during ESSA2013 dedicated to simulation metamodeling using GNU R. So if you are interested in such topics and willing to promote GNU R in simulation society you are welcome to come to Warsaw.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.