s-news
[Top] [All Lists]

Summary: B-splines and predictions with gls

To: NLME help <nlme-help@franz.stat.wisc.edu>
Subject: Summary: B-splines and predictions with gls
From: Renaud Lancelot <lancelot@sentoo.sn>
Date: Fri, 21 Dec 2001 12:34:54 +0000
Cc: S-news <s-news@wubios.wustl.edu>, "José Pinheiro" <jose.pinheiro@pharma.novartis.com>, yee@aitken.scitec.auckland.ac.nz, Dave Atkins <datkins@u.washington.edu>, Javier Seoane <seoane@ebd.csic.es>
Organization: CIRAD
Dear all,

A few days ago I posted a question on B-splines and predictions with gls
(original question below). I got useful replies from Javier Seoane who
pointed out that the prediction problem was related to data-dependent
transformation in bs(), Thomas Yee (see below) who provided me with a
nice solution to this problem (smartpred library, see
http://www.stat.auckland.ac.nz/~yee/) and José Pinheiro who indicated
that "prediction standard errors is probably the most pressing and
recurrent request for an update of NLME".

Following Thomas Yee suggestions (use of smartpred library), I was able
to get the predictions (with the modified versions of gls and
predict.gls available in smartpred):

> fm <- gls(poids ~ bs(age, df = 4),
+           correlation = corExp(form =  ~ age | numani),
+           weights = varPower(form =  ~ fitted(.)),
            data = Data, method = "ML", smart = T)
> ## Fitted values
> ## age = 300 is NOT an observed value
> Data$Fit <- fitted(fm)
> unique(Data[is.element(Data$age, c(30, 60, 90, 300)),
+        match(c("age", "Fit"), table = names(Data))])
    age       Fit 
180  90 13.231687
409  60 10.770752
670  30  7.519429
> ## Predicted values
> ## including age = 300
> data.frame(age = c(300, 90, 60, 30),
+            Pred = predict(fm,
+                   newdata = data.frame(age = c(300, 90, 60, 30))))
  age      Pred 
    Conflicting definitions of "predict.gls" on databases "D:\These"
and  "nlme3" in: UseMethod("predict")
1 300 20.304121
2  90 13.231687
3  60 10.770752
4  30  7.519429

Many thanks to all, Merry Christmas and Happy New Year from sunny
Senegal !

Renaud


************** Original question

I have fitted a gls model where the response is a B-spline function of a
continuous variable (age), and variance is modelled with an exponential
correlation structure and a power function of the fitted values. I have
no problem to get the fitted values and I can compute the standard error
of the fitted mean. However, prediction fails: predict.gls does not seem
to work with B-spline transformations of the predictor. Is there a way
to solve this ? Furthermore, I would like to compute the standard error
of the predicted values (se of new observations in the range of the
observed values). How can I do this ?

Below are some code and results.

Thank you for your attention,

Renaud

> version
Professional Edition Version 6.0.3 Release 2 for Microsoft Windows :
2001 
>
> summary(Data[, match(c("numani", "age", "poids"), table = names(Data))])
          numani          age          poids      
 SNLOOV15809:  17      Min.:  0      Min.: 2.30  
 SNLOOV14696:  17   1st Qu.: 44   1st Qu.: 8.60  
 SNLOOV14053:  17    Median:115    Median:13.60  
 SNLOOV02497:  17      Mean:146      Mean:14.06  
 SNLOOV15464:  16   3rd Qu.:239   3rd Qu.:18.90  
 SNLOOV15177:  16      Max.:398      Max.:36.40  
     (Other):1545                                
> fm <- gls(poids ~ bs(age, df = 4),
+           correlation = corExp(form =  ~ age | numani),
+           weights = varPower(form =  ~ fitted(.)),
+           data = Data, method = "ML")
>
> summary(fm)
Generalized least squares fit by maximum likelihood
  Model: poids ~ bs(age, df = 4) 
  Data: Data 
      AIC      BIC   logLik 
  7863.04 7906.284 -3923.52

Correlation Structure: Exponential spatial correlation
 Formula:  ~ age | numani 
 Parameter estimate(s):
     range 
 0.9156445
Variance function:
 Structure: Power of variance covariate
 Formula:  ~ fitted(.) 
 Parameter estimates:
    power 
 1.152723

Coefficients:
                    Value Std.Error  t-value p-value 
     (Intercept)  3.30796 0.0776681 42.59097  <.0001
bs(age, df = 4)1  6.06705 0.2474282 24.52045  <.0001
bs(age, df = 4)2 17.48736 0.6028137 29.00956  <.0001
bs(age, df = 4)3 14.85172 0.8443603 17.58931  <.0001
bs(age, df = 4)4 21.44970 0.7188427 29.83921  <.0001

 Correlation: 
                 (Intr) b(,d=4)1 b(,d=4)2 b(,d=4)3 
bs(age, df = 4)1 -0.762                           
bs(age, df = 4)2  0.241   -0.615                  
bs(age, df = 4)3 -0.309    0.508   -0.779         
bs(age, df = 4)4  0.010   -0.160    0.422   -0.640

Standardized residuals:
       Min         Q1         Med        Q3      Max 
 -2.638735 -0.6878759 -0.09464405 0.5930081 4.360848

Residual standard error: 0.1418703 
Degrees of freedom: 1645 total; 1640 residual
>
>## Fitted values
> Data$Fit <- fitted(fm)
> unique(Data[is.element(Data$age, c(30, 60, 90)),
+             match(c("age", "Fit"), table = names(Data))])
    age       Fit 
180  90 13.231687
409  60 10.770752
670  30  7.519429
>
>## Predicted values
> data.frame(age = c(90, 60, 30),
             Pred = predict(fm, newdata = data.frame(age = c(90, 60,
30))))
  age     Pred 
1  90 24.75766
2  60 17.28133
3  30  3.30796


******************** Thomas Yee's reply

If you use bs and ns only, then specifying both Boundary.knots and
knots will solve your problem.  More generally, smart prediction can
solve your problem. Have a look at the smartpred library starting from
http://www.stat.auckland.ac.nz/~yee/ It would be quite easy to make
"gls" smart so that it can handle poly() and scale() etc.

Below is code from Splus 6.0 (I don't have Splus for Windows) that I've
smartened. It only has a few lines added. If you source the files in
the smartpred library then it should hopefully work.  Alternatively, if
you email me your gls function, I can add those few lines (or you can
do it yourself).

Cheers

Thomas


# A bad test example 
fm <- gls(follicles ~ bs(2*pi*Time, 6), Ovary,
          correlation = corAR1(form = ~ 1 | Mare))
small <- Ovary[1:4,]
predict(fm, small) - fitted(fm)[1:4]   # Should be all 0's 


gls <- 
## fits linear model with serial correlation and variance functions, 
## by maximum likelihood using a Newton-Raphson algorithm.
function(model, data = sys.parent(), correlation = NULL, weights = NULL,
subset,
    method = c("REML", "ML"), na.action = na.fail, control = list(), 
    verbose = F, smart=T)
{
    Call <- match.call()
    ## control parameters
    controlvals <- glsControl()
    controlvals[names(control)] <- control

    if(smart) setup.smart("write")   # This is new

    ##
    ## checking arguments
    ##
    if(!inherits(model, "formula") || length(model) != 3) {
        stop("\nModel must be a formula of the form \"resp ~ pred\"")
    }
    method <- match.arg(method)
    REML <- method == "REML"
    ## check if correlation is present and has groups
    if(!is.null(correlation)) {
        groups <- getGroupsFormula(correlation)
        corStr <- getStrataFormula(correlation)
    }
    else {
        groups <- NULL
        corStr <- NULL
    }
    ## create a gls structure containing the plug-ins
    glsSt <- glsStruct(corStruct = correlation, varStruct =
varFunc(weights
        ))
    ## extract a data frame with enough information to evaluate
    ## formula, groups, corStruct, and varStruct
    mfArgs <- list(formula = asOneFormula(formula(glsSt), model, groups,
        corStr), data = data, na.action = na.action)
    if(!missing(subset)) {
        mfArgs[["subset"]] <- asOneSidedFormula(Call[["subset"]])[[
            2]]
    }
    dataMod <- do.call("model.frame", mfArgs)
    origOrder <- row.names(dataMod)
    # preserve the original order
    if(!is.null(groups)) {
        ## sort the model.frame by groups and get the matrices and
parameters
        ## used in the estimation procedures
        ## always use innermost level of grouping
        groups <- eval(parse(text = paste("~1", deparse(groups[[2]]),
            sep = "|")))
        grps <- getGroups(dataMod, groups, level = length(
            getGroupsFormula(groups, asList = T)))
        ## ordering data by groups
        ord <- order(grps)
        grps <- grps[ord]
        dataMod <- dataMod[ord,  , drop = F]
        revOrder <- match(origOrder, row.names(dataMod))
    }
    else grps <- NULL
    ## obtaing basic model matrices
    X <- model.frame(model, dataMod)
    ## keeping the contrasts for later use in predict
    contr <- lapply(X, function(el)
    if(inherits(el, "factor")) contrasts(el))
    contr <- contr[!unlist(lapply(contr, is.null))]
    X <- model.matrix(model, X)
    y <- eval(model[[2]], dataMod)
    N <- nrow(X)
    p <- ncol(X)
    # number of coefficients
    parAssign <- attr(X, "assign")
    ## creating the condensed linear model
    attr(glsSt, "conLin") <- list(Xy = array(c(X, y), c(N, ncol(X) + 1),
        list(row.names(dataMod), c(dimnames(X)[[2]], deparse(model[[
        2]])))), dims = list(N = N, p = p, REML = as.integer(REML)),
        logLik = 0, sigma = controlvals$sigma)
    ## initialization
    glsEstControl <- controlvals[c("singular.ok", "qrTol")]
    glsSt <- initialize(glsSt, dataMod, glsEstControl)
    parMap <- attr(glsSt, "pmap")
    ##
    ## getting the fitted object, possibly iterating for variance
functions
    ##
    numIter <- numIter0 <- 0
    attach(controlvals)
    repeat {
        oldPars <- c(attr(glsSt, "glsFit")[["beta"]], coef(glsSt))
        if(length(coef(glsSt))) {
            # needs ms()
            aMs <- ms( ~  - logLik(glsSt, glsPars), start = list(
                glsPars = c(coef(glsSt))), control = list(
                rel.tolerance = msTol, maxiter = msMaxIter,
                scale = msScale), trace = msVerbose)
            coef(glsSt) <- aMs$parameters
            numIter0 <- aMs$numIter <- aMs$flags[31]
        }
        attr(glsSt, "glsFit") <- glsEstimate(glsSt, control = 
            glsEstControl)
        ## checking if any updating is needed
        if(!needUpdate(glsSt)) break
        ## updating the fit information
        numIter <- numIter + 1
        glsSt <- update(glsSt, dataMod)
        ## calculating the convergence criterion
        aConv <- c(attr(glsSt, "glsFit")[["beta"]], coef(glsSt))
        conv <- abs((oldPars - aConv)/ifelse(aConv == 0, 1, aConv))
        aConv <- c(beta = max(conv[1:p]))
        conv <- conv[ - (1:p)]
        for(i in names(glsSt)) {
            if(any(parMap[, i])) {
                aConv <- c(aConv, max(conv[parMap[, i]]))
                names(aConv)[length(aConv)] <- i
            }
        }
        if(verbose) {
            cat("\nIteration:", numIter)
            if(length(coef(glsSt)) > 0) {
                cat("\nObjective:", format(aMs$value), 
                    ", ms iterations:", aMs$numIter, "\n")
                print(glsSt)
            }
            cat("\nConvergence:\n")
            print(aConv)
        }
        if(max(aConv) <= tolerance) {
            break
        }
        if(numIter > maxIter) {
            stop("Maximum number of iterations reached without
convergence."
                )
        }
    }
    detach()
    ## wrapping up
    glsFit <- attr(glsSt, "glsFit")
    namBeta <- names(glsFit$beta)
    attr(parAssign, "varBetaFact") <- varBeta <- glsFit$sigma * glsFit$
        varBeta * sqrt((N - REML * p)/(N - p))
    varBeta <- crossprod(varBeta)
    dimnames(varBeta) <- list(namBeta, namBeta)
    ##
    ## fitted.values and residuals (in original order)
    ##
    Fitted <- fitted(glsSt)
    ## putting groups back in original order, if present
    if(!is.null(grps)) {
        grps <- grps[revOrder]
        Fitted <- Fitted[revOrder]
        Resid <- y[revOrder] - Fitted
        attr(Resid, "std") <- glsFit$sigma/(varWeights(glsSt)[revOrder]
            )
    }
    else {
        Resid <- y - Fitted
        attr(Resid, "std") <- glsFit$sigma/(varWeights(glsSt))
    }
    ## getting the approximate var-cov of the parameters 
    if(controlvals$apVar) {
        apVar <- glsApVar(glsSt, glsFit$sigma, .relStep = controlvals[[
            ".relStep"]], minAbsPar = controlvals[["minAbsParApVar"
            ]], natural = controlvals[["natural"]], natUncons = 
            controlvals[["natUnconstrained"]])
    }
    else {
        apVar <- "Approximate variance-covariance matrix not available"
    }
    ## getting rid of condensed linear model and fit
    dims <- attr(glsSt, "conLin")[["dims"]]
    dims[["p"]] <- p
    attr(glsSt, "conLin") <- NULL
    attr(glsSt, "glsFit") <- NULL
    attr(glsSt, "fixedSigma") <- (controlvals$sigma > 0)
    ##
    ## creating the  gls object
    ##
    estOut <- list(modelStruct = glsSt, dims = dims, contrasts = contr,
        coefficients = glsFit[["beta"]], varBeta = varBeta, sigma = 
        glsFit$sigma, apVar = apVar, logLik = glsFit$logLik, numIter = 
        if(needUpdate(glsSt)) numIter else numIter0, groups = grps,
        call = Call, method = method, fitted = Fitted, residuals = 
        Resid, parAssign = parAssign)
    if(inherits(data, "groupedData")) {
        ## saving labels and units for plots
        attr(estOut, "units") <- attr(data, "units")
        attr(estOut, "labels") <- attr(data, "labels")
    }
    if(!is.null(grps) && any(diff(ord) != 1)) {
        attr(estOut, "order") <- list(order = ord, revOrder = revOrder)
    }

    if(smart) {
      # print("in here")
        estOut$smart.prediction <- get.smart.prediction() # This is new
      # print(estOut$smart.prediction)
    }


    attr(estOut, "namBetaFull") <- dimnames(X)[[2]]
    oldClass(estOut) <- "gls"
    estOut
}


predict.gls <- 
function(object, newdata, na.action = na.fail)
{
    ##
    ## method for predict() designed for objects inheriting from
oldClass gls
    ##
    if(missing(newdata) || is.null(newdata)) {
        # will return fitted values
        return(fitted(object))
    }

        # Smart prediction: handle the prediction flaw
        setup.smart("read", smart.prediction=object$smart.prediction)

    form <- getCovariateFormula(object)
    mfArgs <- list(formula = form, data = newdata, na.action =
na.action)
    dataMod <- do.call("model.frame", mfArgs)
    ## making sure factor levels are the same as in contrasts
    contr <- object$contrasts
    for(i in names(dataMod)) {
        if(inherits(dataMod[, i], "factor") && !is.null(contr[[i]])) {
            levs <- levels(dataMod[, i])
            levsC <- dimnames(contr[[i]])[[1]]
            if(any(wch <- is.na(match(levs, levsC)))) {
                stop(paste("Levels", paste(levs[wch], collapse
                     = ","), "not allowed for", i))
            }
            #      attr(dataMod[,i], "contrasts") <- contr[[i]][levs, ,
drop = FALSE]
            contr[[i]] <- contr[[i]][levs,  , drop = FALSE]
        }
    }
    N <- nrow(dataMod)
    if(length(all.vars(form)) > 0) {
        X <- model.matrix(form, dataMod, contr)
    }
    else {
        X <- array(1, c(N, 1), list(row.names(dataMod), "(Intercept)"))
    }
    cf <- coef(object)
    val <- c(X[, names(cf), drop = F] %*% cf)
    attr(val, "label") <- "Predicted values"
    if(!is.null(aux <- attr(object, "units")$y)) {
        attr(val, "label") <- paste(attr(val, "label"), aux)
    }

        wrapup.smart()    # not needed actually

    val
}

-- 
Dr Renaud Lancelot, vétérinaire
CIRAD, Département Elevage et Médecine Vétérinaire (CIRAD-Emvt)
Programme Productions Animales
http://www.cirad.fr/presentation/programmes/prod-ani.shtml (Français)
http://www.cirad.fr/presentation/en/program-eng/prod-ani.shtml (English)

ISRA-LNERV                      tel    (221) 832 49 02
BP 2057 Dakar-Hann              fax    (221) 821 18 79 (CIRAD)
Senegal                         e-mail renaud.lancelot@cirad.fr

<Prev in Thread] Current Thread [Next in Thread>
  • Summary: B-splines and predictions with gls, Renaud Lancelot <=