s-news
[Top] [All Lists]

Re: bootstrapping of multiple datasets

To: Tim Hesterberg <timh@insightful.com>
Subject: Re: bootstrapping of multiple datasets
From: Peng Huang <huangp@musc.edu>
Date: Thu, 29 Jun 2006 14:14:19 -0400
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <SE2KEXCH01jE5aHaZVr00000276@se2kexch01.insightful.com>
References: <6.0.1.1.2.20060616090403.01bcc0a8@email.med.yale.edu> <SE2KEXCH01jE5aHaZVr00000276@se2kexch01.insightful.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.1) Gecko/20040707
Hi Tim,

How can we find a complete information about what libraries are available and what routines are there within each library? Thanks!

Peng

Tim Hesterberg wrote:

From your description, it sounds like you want to just run a calculation
for each of the 20 data sets.  This could be done using miApply() or
miEval() in the missing data library (use library(missing)).

I would recommend that you not use the bootstrap() function to do the 20
calculations, because that could lead to confusion.  From my understanding
of your intent, it actually has nothing to do with bootstrapping.
This would capture only one of two source of variation (would capture
variation due to multiple imputations, would not capture variation due
to the original sample being randomly sampled from a population,
which is what bootstrapping is for), and hence would lead to
confidence intervals that are much too short if you use bootstrap
confidence interval routines.

There are routines in the missing data library that incorporate
both sources of variation, provided that the results of each imputation
include standard errors that reflect the sampling variability.

Tim Hesterberg

Dear S folks,

I occasionally use S plus with the menu and wonder if anyone's willing to share a few lines of code to solve the following task, which is essentially data management.

I have twenty imputed datasets because my original data had 20% missing data which looks to be MAR using the ISNI index of Troxell et al. I want to bootstrap the coefficients of a model that will be fit to each of the twenty imputed datasets. The logical approach seems to be to bootstrap each of the 20 datasets and then combine all the output and take the grand mean and grand percentiles to construct the final bootstrap estimate and confidence intervals.

The bootstrap command from the point and click menu works like this:

coef(lm(response~predictor, data.frame))

What I want to to do is run the bootstrap function for twenty different data sets that are combined in one data file, and indexed with a variable called _imputation_ = 1,2, ..., 20.

Any ideas in how I might code this in S plus?

Thanks in advance for your help.

Terrence E. Murphy, Ph.D.
Program on Aging
Yale University
1 Church St., 7th Floor
New Haven, CT 06437
terrence.murphy@yale.edu
phone: 203-764-9805
fax: 203-764-9831

--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news




<Prev in Thread] Current Thread [Next in Thread>