The story is about calculation of the variance of the random variable given by formula:
q(p-2+q/N)
where p and N are model parameters and q is a random variable that has binomial distribution with N Bernoulli trials and probability of success 2-p. As I have written the properly specified formula for the metamodel has the following form:
v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4)) * (N
+ I(1 /
N))
where v is an estimate of the variance. Such a model was discussed in my last post (it is described in more detail there so I omit the repetition in this post).
However one can easily notice that for p=1 and p=2 the variance of the random variable is equal to 0. This is because in such case q has no variance (it is equal to 1 and 0 respectively).
In estimation of the metamodel one can take this into account in two ways:
- the weight of cases where p equals 1 or 2 might be increased (so we use Weighted Least Squares)
- a restriction on values of prediction in points where p equals 1 or 2 can be made to ensure that it is exactly 0 (so we use Constrained Least Squares)
The code given below estimates: original model and two versions of corrected models:
library(plyr)
library(limSolve)
sim <- function(param) {
q <- rbinom(1000, param$N, 2 - param$p)
c(v = var((param$p - 2 + q / param$N) * q))
}
set.seed(1)
data.set <- expand.grid(p = seq(1, 2, len = 101), N = 1:10)
data.set <- ddply(data.set, .(p,N), sim,.progress = "text")
lm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set))
weights <- ifelse(data.set$v==0, 100000, 1)
wlm.coef <- coef(lm(v ~ (p + I(p ^ 2) + I(p ^ 3) + I(p ^ 4))
* (N + I(1 / N)), data = data.set,
weights = weights))
A <- with(data.set, cbind(1, p, p^2, p^3, p^4, N, 1 / N,
p * N, p / N, p ^ 2 * N, p ^ 2 / N,
p ^ 3 * N, p ^ 3 / N, p ^ 4 * N, p ^ 4 / N))
colnames(A) <- names(lm.coef)
B <- data.set$v
E <- A[B == 0,]
F <- B[B == 0]
lsei.coef <- lsei(A, B, E, F)$X
Now we can compare the coefficients from these models to their analytical values given in the last post:
true.coef <- c(32, -88, 88, -38, 6, -8, -26, 20,
75, -18, -79, 7, 36, -1, -6)
coef.dev <- cbind(lm.coef,
wlm.coef, lsei.coef) - true.coef
print(coef.dev, digits = 2)
dotchart(coef.dev[,1], pch = 19)
points(coef.dev[,2], 1:nrow(coef.dev), col = "red", pch = 19)
abline(v = 0, col = "gray", lty = 2)
The code first tabulates the deviation of estimates from true values:
lm.coef wlm.coef lsei.coef
(Intercept) -0.6056
0.060 0.060
p 1.5061 -0.235
-0.235
I(p^2) -1.3576 0.323
0.323
I(p^3) 0.5243 -0.185
-0.185
I(p^4) -0.0731 0.038
0.038
N 0.0088 -0.060
-0.060
I(1/N) 1.1510 0.354
0.354
p:N -0.0065 0.173
0.173
p:I(1/N) -3.0646
-0.965 -0.965
I(p^2):N -0.0102
-0.182 -0.182
I(p^2):I(1/N) 2.9942
0.953 0.953
I(p^3):N 0.0115
0.084 0.084
I(p^3):I(1/N)
-1.2730 -0.404 -0.404
I(p^4):N
-0.0029 -0.014 -0.014
I(p^4):I(1/N)
0.1991 0.062 0.062
and next plots the visualization of the differences (on the plot black points represent OLS estimates and red - WLS):
It can be seen that:
- the precision of estimation is improved by using additional information on problem structure;
- the estimates from WLS and CLS are identical to the third decimal place.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.