Dear all,
A few days ago I posted a question on B-splines and predictions with gls
(original question below). I got useful replies from Javier Seoane who
pointed out that the prediction problem was related to data-dependent
transformation in bs(), Thomas Yee (see below) who provided me with a
nice solution to this problem (smartpred library, see
http://www.stat.auckland.ac.nz/~yee/) and José Pinheiro who indicated
that "prediction standard errors is probably the most pressing and
recurrent request for an update of NLME".
Following Thomas Yee suggestions (use of smartpred library), I was able
to get the predictions (with the modified versions of gls and
predict.gls available in smartpred):
> fm <- gls(poids ~ bs(age, df = 4),
+ correlation = corExp(form = ~ age | numani),
+ weights = varPower(form = ~ fitted(.)),
data = Data, method = "ML", smart = T)
> ## Fitted values
> ## age = 300 is NOT an observed value
> Data$Fit <- fitted(fm)
> unique(Data[is.element(Data$age, c(30, 60, 90, 300)),
+ match(c("age", "Fit"), table = names(Data))])
age Fit
180 90 13.231687
409 60 10.770752
670 30 7.519429
> ## Predicted values
> ## including age = 300
> data.frame(age = c(300, 90, 60, 30),
+ Pred = predict(fm,
+ newdata = data.frame(age = c(300, 90, 60, 30))))
age Pred
Conflicting definitions of "predict.gls" on databases "D:\These"
and "nlme3" in: UseMethod("predict")
1 300 20.304121
2 90 13.231687
3 60 10.770752
4 30 7.519429
Many thanks to all, Merry Christmas and Happy New Year from sunny
Senegal !
Renaud
************** Original question
I have fitted a gls model where the response is a B-spline function of a
continuous variable (age), and variance is modelled with an exponential
correlation structure and a power function of the fitted values. I have
no problem to get the fitted values and I can compute the standard error
of the fitted mean. However, prediction fails: predict.gls does not seem
to work with B-spline transformations of the predictor. Is there a way
to solve this ? Furthermore, I would like to compute the standard error
of the predicted values (se of new observations in the range of the
observed values). How can I do this ?
Below are some code and results.
Thank you for your attention,
Renaud
> version
Professional Edition Version 6.0.3 Release 2 for Microsoft Windows :
2001
>
> summary(Data[, match(c("numani", "age", "poids"), table = names(Data))])
numani age poids
SNLOOV15809: 17 Min.: 0 Min.: 2.30
SNLOOV14696: 17 1st Qu.: 44 1st Qu.: 8.60
SNLOOV14053: 17 Median:115 Median:13.60
SNLOOV02497: 17 Mean:146 Mean:14.06
SNLOOV15464: 16 3rd Qu.:239 3rd Qu.:18.90
SNLOOV15177: 16 Max.:398 Max.:36.40
(Other):1545
> fm <- gls(poids ~ bs(age, df = 4),
+ correlation = corExp(form = ~ age | numani),
+ weights = varPower(form = ~ fitted(.)),
+ data = Data, method = "ML")
>
> summary(fm)
Generalized least squares fit by maximum likelihood
Model: poids ~ bs(age, df = 4)
Data: Data
AIC BIC logLik
7863.04 7906.284 -3923.52
Correlation Structure: Exponential spatial correlation
Formula: ~ age | numani
Parameter estimate(s):
range
0.9156445
Variance function:
Structure: Power of variance covariate
Formula: ~ fitted(.)
Parameter estimates:
power
1.152723
Coefficients:
Value Std.Error t-value p-value
(Intercept) 3.30796 0.0776681 42.59097 <.0001
bs(age, df = 4)1 6.06705 0.2474282 24.52045 <.0001
bs(age, df = 4)2 17.48736 0.6028137 29.00956 <.0001
bs(age, df = 4)3 14.85172 0.8443603 17.58931 <.0001
bs(age, df = 4)4 21.44970 0.7188427 29.83921 <.0001
Correlation:
(Intr) b(,d=4)1 b(,d=4)2 b(,d=4)3
bs(age, df = 4)1 -0.762
bs(age, df = 4)2 0.241 -0.615
bs(age, df = 4)3 -0.309 0.508 -0.779
bs(age, df = 4)4 0.010 -0.160 0.422 -0.640
Standardized residuals:
Min Q1 Med Q3 Max
-2.638735 -0.6878759 -0.09464405 0.5930081 4.360848
Residual standard error: 0.1418703
Degrees of freedom: 1645 total; 1640 residual
>
>## Fitted values
> Data$Fit <- fitted(fm)
> unique(Data[is.element(Data$age, c(30, 60, 90)),
+ match(c("age", "Fit"), table = names(Data))])
age Fit
180 90 13.231687
409 60 10.770752
670 30 7.519429
>
>## Predicted values
> data.frame(age = c(90, 60, 30),
Pred = predict(fm, newdata = data.frame(age = c(90, 60,
30))))
age Pred
1 90 24.75766
2 60 17.28133
3 30 3.30796
******************** Thomas Yee's reply
If you use bs and ns only, then specifying both Boundary.knots and
knots will solve your problem. More generally, smart prediction can
solve your problem. Have a look at the smartpred library starting from
http://www.stat.auckland.ac.nz/~yee/ It would be quite easy to make
"gls" smart so that it can handle poly() and scale() etc.
Below is code from Splus 6.0 (I don't have Splus for Windows) that I've
smartened. It only has a few lines added. If you source the files in
the smartpred library then it should hopefully work. Alternatively, if
you email me your gls function, I can add those few lines (or you can
do it yourself).
Cheers
Thomas
# A bad test example
fm <- gls(follicles ~ bs(2*pi*Time, 6), Ovary,
correlation = corAR1(form = ~ 1 | Mare))
small <- Ovary[1:4,]
predict(fm, small) - fitted(fm)[1:4] # Should be all 0's
gls <-
## fits linear model with serial correlation and variance functions,
## by maximum likelihood using a Newton-Raphson algorithm.
function(model, data = sys.parent(), correlation = NULL, weights = NULL,
subset,
method = c("REML", "ML"), na.action = na.fail, control = list(),
verbose = F, smart=T)
{
Call <- match.call()
## control parameters
controlvals <- glsControl()
controlvals[names(control)] <- control
if(smart) setup.smart("write") # This is new
##
## checking arguments
##
if(!inherits(model, "formula") || length(model) != 3) {
stop("\nModel must be a formula of the form \"resp ~ pred\"")
}
method <- match.arg(method)
REML <- method == "REML"
## check if correlation is present and has groups
if(!is.null(correlation)) {
groups <- getGroupsFormula(correlation)
corStr <- getStrataFormula(correlation)
}
else {
groups <- NULL
corStr <- NULL
}
## create a gls structure containing the plug-ins
glsSt <- glsStruct(corStruct = correlation, varStruct =
varFunc(weights
))
## extract a data frame with enough information to evaluate
## formula, groups, corStruct, and varStruct
mfArgs <- list(formula = asOneFormula(formula(glsSt), model, groups,
corStr), data = data, na.action = na.action)
if(!missing(subset)) {
mfArgs[["subset"]] <- asOneSidedFormula(Call[["subset"]])[[
2]]
}
dataMod <- do.call("model.frame", mfArgs)
origOrder <- row.names(dataMod)
# preserve the original order
if(!is.null(groups)) {
## sort the model.frame by groups and get the matrices and
parameters
## used in the estimation procedures
## always use innermost level of grouping
groups <- eval(parse(text = paste("~1", deparse(groups[[2]]),
sep = "|")))
grps <- getGroups(dataMod, groups, level = length(
getGroupsFormula(groups, asList = T)))
## ordering data by groups
ord <- order(grps)
grps <- grps[ord]
dataMod <- dataMod[ord, , drop = F]
revOrder <- match(origOrder, row.names(dataMod))
}
else grps <- NULL
## obtaing basic model matrices
X <- model.frame(model, dataMod)
## keeping the contrasts for later use in predict
contr <- lapply(X, function(el)
if(inherits(el, "factor")) contrasts(el))
contr <- contr[!unlist(lapply(contr, is.null))]
X <- model.matrix(model, X)
y <- eval(model[[2]], dataMod)
N <- nrow(X)
p <- ncol(X)
# number of coefficients
parAssign <- attr(X, "assign")
## creating the condensed linear model
attr(glsSt, "conLin") <- list(Xy = array(c(X, y), c(N, ncol(X) + 1),
list(row.names(dataMod), c(dimnames(X)[[2]], deparse(model[[
2]])))), dims = list(N = N, p = p, REML = as.integer(REML)),
logLik = 0, sigma = controlvals$sigma)
## initialization
glsEstControl <- controlvals[c("singular.ok", "qrTol")]
glsSt <- initialize(glsSt, dataMod, glsEstControl)
parMap <- attr(glsSt, "pmap")
##
## getting the fitted object, possibly iterating for variance
functions
##
numIter <- numIter0 <- 0
attach(controlvals)
repeat {
oldPars <- c(attr(glsSt, "glsFit")[["beta"]], coef(glsSt))
if(length(coef(glsSt))) {
# needs ms()
aMs <- ms( ~ - logLik(glsSt, glsPars), start = list(
glsPars = c(coef(glsSt))), control = list(
rel.tolerance = msTol, maxiter = msMaxIter,
scale = msScale), trace = msVerbose)
coef(glsSt) <- aMs$parameters
numIter0 <- aMs$numIter <- aMs$flags[31]
}
attr(glsSt, "glsFit") <- glsEstimate(glsSt, control =
glsEstControl)
## checking if any updating is needed
if(!needUpdate(glsSt)) break
## updating the fit information
numIter <- numIter + 1
glsSt <- update(glsSt, dataMod)
## calculating the convergence criterion
aConv <- c(attr(glsSt, "glsFit")[["beta"]], coef(glsSt))
conv <- abs((oldPars - aConv)/ifelse(aConv == 0, 1, aConv))
aConv <- c(beta = max(conv[1:p]))
conv <- conv[ - (1:p)]
for(i in names(glsSt)) {
if(any(parMap[, i])) {
aConv <- c(aConv, max(conv[parMap[, i]]))
names(aConv)[length(aConv)] <- i
}
}
if(verbose) {
cat("\nIteration:", numIter)
if(length(coef(glsSt)) > 0) {
cat("\nObjective:", format(aMs$value),
", ms iterations:", aMs$numIter, "\n")
print(glsSt)
}
cat("\nConvergence:\n")
print(aConv)
}
if(max(aConv) <= tolerance) {
break
}
if(numIter > maxIter) {
stop("Maximum number of iterations reached without
convergence."
)
}
}
detach()
## wrapping up
glsFit <- attr(glsSt, "glsFit")
namBeta <- names(glsFit$beta)
attr(parAssign, "varBetaFact") <- varBeta <- glsFit$sigma * glsFit$
varBeta * sqrt((N - REML * p)/(N - p))
varBeta <- crossprod(varBeta)
dimnames(varBeta) <- list(namBeta, namBeta)
##
## fitted.values and residuals (in original order)
##
Fitted <- fitted(glsSt)
## putting groups back in original order, if present
if(!is.null(grps)) {
grps <- grps[revOrder]
Fitted <- Fitted[revOrder]
Resid <- y[revOrder] - Fitted
attr(Resid, "std") <- glsFit$sigma/(varWeights(glsSt)[revOrder]
)
}
else {
Resid <- y - Fitted
attr(Resid, "std") <- glsFit$sigma/(varWeights(glsSt))
}
## getting the approximate var-cov of the parameters
if(controlvals$apVar) {
apVar <- glsApVar(glsSt, glsFit$sigma, .relStep = controlvals[[
".relStep"]], minAbsPar = controlvals[["minAbsParApVar"
]], natural = controlvals[["natural"]], natUncons =
controlvals[["natUnconstrained"]])
}
else {
apVar <- "Approximate variance-covariance matrix not available"
}
## getting rid of condensed linear model and fit
dims <- attr(glsSt, "conLin")[["dims"]]
dims[["p"]] <- p
attr(glsSt, "conLin") <- NULL
attr(glsSt, "glsFit") <- NULL
attr(glsSt, "fixedSigma") <- (controlvals$sigma > 0)
##
## creating the gls object
##
estOut <- list(modelStruct = glsSt, dims = dims, contrasts = contr,
coefficients = glsFit[["beta"]], varBeta = varBeta, sigma =
glsFit$sigma, apVar = apVar, logLik = glsFit$logLik, numIter =
if(needUpdate(glsSt)) numIter else numIter0, groups = grps,
call = Call, method = method, fitted = Fitted, residuals =
Resid, parAssign = parAssign)
if(inherits(data, "groupedData")) {
## saving labels and units for plots
attr(estOut, "units") <- attr(data, "units")
attr(estOut, "labels") <- attr(data, "labels")
}
if(!is.null(grps) && any(diff(ord) != 1)) {
attr(estOut, "order") <- list(order = ord, revOrder = revOrder)
}
if(smart) {
# print("in here")
estOut$smart.prediction <- get.smart.prediction() # This is new
# print(estOut$smart.prediction)
}
attr(estOut, "namBetaFull") <- dimnames(X)[[2]]
oldClass(estOut) <- "gls"
estOut
}
predict.gls <-
function(object, newdata, na.action = na.fail)
{
##
## method for predict() designed for objects inheriting from
oldClass gls
##
if(missing(newdata) || is.null(newdata)) {
# will return fitted values
return(fitted(object))
}
# Smart prediction: handle the prediction flaw
setup.smart("read", smart.prediction=object$smart.prediction)
form <- getCovariateFormula(object)
mfArgs <- list(formula = form, data = newdata, na.action =
na.action)
dataMod <- do.call("model.frame", mfArgs)
## making sure factor levels are the same as in contrasts
contr <- object$contrasts
for(i in names(dataMod)) {
if(inherits(dataMod[, i], "factor") && !is.null(contr[[i]])) {
levs <- levels(dataMod[, i])
levsC <- dimnames(contr[[i]])[[1]]
if(any(wch <- is.na(match(levs, levsC)))) {
stop(paste("Levels", paste(levs[wch], collapse
= ","), "not allowed for", i))
}
# attr(dataMod[,i], "contrasts") <- contr[[i]][levs, ,
drop = FALSE]
contr[[i]] <- contr[[i]][levs, , drop = FALSE]
}
}
N <- nrow(dataMod)
if(length(all.vars(form)) > 0) {
X <- model.matrix(form, dataMod, contr)
}
else {
X <- array(1, c(N, 1), list(row.names(dataMod), "(Intercept)"))
}
cf <- coef(object)
val <- c(X[, names(cf), drop = F] %*% cf)
attr(val, "label") <- "Predicted values"
if(!is.null(aux <- attr(object, "units")$y)) {
attr(val, "label") <- paste(attr(val, "label"), aux)
}
wrapup.smart() # not needed actually
val
}
--
Dr Renaud Lancelot, vétérinaire
CIRAD, Département Elevage et Médecine Vétérinaire (CIRAD-Emvt)
Programme Productions Animales
http://www.cirad.fr/presentation/programmes/prod-ani.shtml (Français)
http://www.cirad.fr/presentation/en/program-eng/prod-ani.shtml (English)
ISRA-LNERV tel (221) 832 49 02
BP 2057 Dakar-Hann fax (221) 821 18 79 (CIRAD)
Senegal e-mail renaud.lancelot@cirad.fr
|