s-news
[Top] [All Lists]

[S] Multiple imputation capability added to transcan in Hmisc

To: s-news <s-news@wubios.wustl.edu>
Subject: [S] Multiple imputation capability added to transcan in Hmisc
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Thu, 23 Jul 1998 12:13:14 -0400
Reply-to: Frank E Harrell Jr <fharrell@virginia.edu>
Sender: owner-s-news@wubios.wustl.edu
The transcan function in the Hmisc library, which is a function for
developing imputation models among other things, now handles
multiple imputation.  An example from the help file follows this note.
There is a new function fit.mult.impute that will run S-PLUS regression
modeling functions separately for each imputation, computing a
new coefficient vector and imputation-corrected variance-covariance
matrix.

On another note:
The document "An Introduction to S-PLUS and the Hmisc and Design
Libraries" by Alzola & Harrell has been much improved.   Thanks to the
Y & Y LaTeX system the document also has hyperreferences and bookmarks.
Libraries and documents are available from our web page, and will be 
updated on StatLib tonight.
---------------------------------------------------------------------------
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Director, Division of Biostatistics and Epidemiology
Dept of Health Evaluation Sciences
University of Virginia School of Medicine
http://www.med.virginia.edu/medicine/clinical/hes/biostat.htm

> # Example with completely random missing data
> set.seed(1)
> x1 <- factor(sample(c('a','b','c'),100,T))
> x2 <- (x1=='b') + 3*(x1=='c') + rnorm(100)
> y  <- x2 + 1*(x1=='c') + rnorm(100)
> x1[1:20] <- NA
> x2[18:23] <- NA
> n <- naclus(data.frame(x1,x2,y))
> plot(n); naplot(n)  # Show patterns of NAs
> f  <- transcan(~y + x1 + x2, n.impute=10, shrink=F)
> options(digits=3)
> summary(f)

transcan(x =  ~ y + x1 + x2, n.impute = 10, shrink = F)

R-squared achieved in predicting each variable:

     y   x1    x2 
 0.905 0.74 0.839

Adjusted R-squared:

     y    x1    x2 
 0.898 0.718 0.826

Coefficients of canonical variates for predicting each (row) variable

       y    x1    x2 
 y        0.37  0.61
x1  1.23       -0.03
x2  1.06 -0.02      

Summary of imputed values

x1 
   n missing unique Mean 
 200 0       3      2.12

1 (55, 28%), 2 (66, 33%), 3 (79, 40%) 

x2 
  n missing unique Mean      .05      .10      .25      .50      .75      .90   
   .95 
 60 0       55     1.55 -1.62653 -1.38461  0.04995  1.66340  3.35163  3.81074  
4.12569

lowest : -1.6265 -1.5601 -1.3651 -0.9406 -0.9054
highest:  3.9874  4.1248  4.1420  4.7891  5.0992 

Starting estimates for imputed values:

     y x1 x2 
 0.814  1  1

> attr(f,'imputed')

$y:
NULL

$x1:
   1 2 3 4 5 6 7 8 9 10 
 1 1 2 1 2 2 2 2 2 2  2
 2 2 1 2 2 1 2 2 2 2  1
 3 3 2 2 3 2 1 3 2 3  2
 4 2 1 1 2 2 1 2 2 2  2
 5 2 2 1 1 2 1 1 2 1  1
 6 3 3 2 3 3 2 2 3 3  3
 7 1 1 1 1 1 1 1 1 1  1
 8 3 3 3 3 3 3 3 3 3  3
 9 1 2 2 2 2 2 2 1 2  2
10 1 1 2 3 1 1 2 2 1  1
11 3 3 3 3 3 3 3 2 3  3
12 3 3 3 3 3 3 3 3 3  3
13 1 1 2 1 1 1 1 1 1  1
14 3 3 3 3 3 3 3 3 3  3
15 2 2 2 2 2 2 2 2 2  3
16 1 1 3 2 2 1 1 1 1  2
17 3 3 3 3 2 3 3 3 2  3
18 3 3 3 2 3 2 3 3 3  3
19 1 1 3 1 1 1 2 1 2  1
20 3 3 3 3 3 2 3 3 3  3

$x2:
        1     2     3      4     5      6      7      8      9     10 
18  1.866  2.37  1.79  1.995 2.517  2.153  2.668  2.259  3.527  1.786
19 -0.330 -1.56 -1.63  1.537 0.711  1.251 -1.365  0.217 -0.568 -0.144
20  4.142  3.72  5.10  3.840 3.371  3.987  3.345  2.779  3.424  2.573
21  0.115  1.15 -1.63  0.518 0.947  1.078 -1.627  0.935  1.448 -0.905
22  3.605  4.79  2.40  4.125 3.657  2.790  3.456  3.807  3.306  3.675
23  0.793 -1.63  1.54 -0.586 0.612 -0.941  0.731 -0.533 -0.324 -1.627

> f  <- transcan(~y + x1 + x2, n.impute=10, shrink=T)
> summary(f)

transcan(x =  ~ y + x1 + x2, n.impute = 10, shrink = T)

R-squared achieved in predicting each variable:

     y    x1   x2 
 0.904 0.739 0.84

Adjusted R-squared:

     y    x1    x2 
 0.897 0.718 0.826

Shrinkage factors:

     y    x1    x2 
 0.937 0.952 0.939

Coefficients of canonical variates for predicting each (row) variable

       y    x1    x2 
 y       -0.35  0.57
x1  1.17       -0.03
x2  1.00  0.02      

Summary of imputed values

x1 
   n missing unique  Mean 
 200 0       3      2.195

1 (47, 24%), 2 (67, 34%), 3 (86, 43%) 

x2 
  n missing unique  Mean     .05     .10     .25     .50     .75     .90     
.95 
 60 0       53     1.651 -1.6265 -0.5036  0.3479  1.7213  3.2052  3.7244  3.9763

lowest : -1.6265 -1.1599 -1.0086 -0.4474 -0.4165
highest:  3.8195  3.9647  4.1976  4.6530  4.7187 

Starting estimates for imputed values:

     y x1 x2 
 0.814  1  1

> h <- fit.mult.impute(y ~ x1 + x2, lm, f)

Variance Inflation Factors Due to Imputation:

 (Intercept)  x11  x12   x2 
        1.24 1.27 1.26 1.28

> h

Coefficients:
 (Intercept)    x11   x12    x2 
       0.366 0.0892 0.494 0.954 #AVERAGE OVER IMPUTATIONS

Degrees of freedom: 100 total; 96 residual
Residual standard error: 0.888  #NOTE: 0.888 is from last imputation

> diag(Varcov(h))
[1] 0.02306 0.01699 0.01188 0.00897

> h.complete <- lm(y ~ x1 + x2, na.action=na.omit)
> h.complete

Coefficients:
 (Intercept)    x11   x12    x2 
        0.35 0.0689 0.465 0.934

Degrees of freedom: 77 total; 73 residual
Dropped 23 cases due to missing values 
Residual standard error: 0.928 

> diag(Varcov(h.complete))
[1] 0.0276 0.0182 0.0141 0.0108  # NOTE: larger than from imputing

# Note: had Design's ols function been used in place of lm, any
# function run on h (anova, summary, etc.) would have automatically
# used imputation-corrected variances and covariances


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] Multiple imputation capability added to transcan in Hmisc, Frank E Harrell Jr <=