s-news
[Top] [All Lists]

Re: impute.transcan from the Hmisc library

To: f.harrell@vanderbilt.edu
Subject: Re: impute.transcan from the Hmisc library
From: gerald.jean@dgag.ca
Date: Fri, 29 May 2009 13:49:24 -0400
Cc: s-news@wubios.wustl.edu
In-reply-to: <4A1F1256.2050304@vanderbilt.edu>
Hello all,

yes, updating the "transcan", and associated functions, did the trick.

Thanks again to Frank Harrel for his kind support,

Gérald Jean
Conseiller senior en statistiques,
VP Planification et Développement des Marchés,
Desjardins Groupe d'Assurances Générales
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming

Frank E Harrell Jr <f.harrell@vanderbilt.edu> a écrit sur 2009/05/28
18:38:14 :

> gerald.jean@dgag.ca wrote:
> > Hello Frank,
> >
> > Frank E Harrell Jr <f.harrell@vanderbilt.edu> a écrit sur 2009/05/26
> > 18:05:29 :
> >
> >> gerald.jean@dgag.ca wrote:
> >>> Hello there,
> >>>
> >>> TIBCO Spotfire S+ Version 8.1.1 for Linux 2.6.9-34.EL, 64-bit : 2008
> >>>
> >>> I am trying to impute missing values from roughly 100 variables in a
> > data.
> >>> frame.  Running "transcan" is no problem, is converges and gives
> > sensible
> >>> results.  The problem arises when trying to impute the missing values
> > from
> >>> the original data set.  The goal is getting a new data frame from the
> > old
> >>> one, with all missing values imputed.  Due to the large number of
> > variables
> >>> to be imputed it is not practical to use, as in the help file,
> >>>
> >>>> var.name <- impute(x.trans, var.name)
> >>> I know that I can use, to impute all variables at once:
> >>>
> >>>> impute(x.trans)
> >>> But with a large number of variables and a large number of objects in
> > the
> >>> working directory it clutters this directory with too many objects.
> > The
> >>> method suggested in the help file:
> >>>
> >>> ## Not run:
> >>> xt <- transcan(~. , data=mine,
> >>>                imputed=TRUE, shrink=TRUE, n.impute=10, trantab=TRUE)
> >>> attach(mine, pos=1, use.names=FALSE)
> >>> impute(xt, imputation=1) # use first imputation
> >>> detach(1, 'mine2')
> >>> ## End(Not run)
> >> Gerald,
> >>
> >> # Here is how to create a completed dataset
> >> d <- data.frame(x1, x2)
> >> z <- transcan(~x1 + I(x2), n.impute=5, data=d)
> >> imputed <- impute(z, imputation=1, data=d,
> >>                    list.out=TRUE, pr=FALSE, check=FALSE)
> >> sapply(imputed, function(x)sum(is.imputed(x)))
> >> sapply(imputed, function(x)sum(is.na(x)))
> >>
> >> In my test NAs are imputed.  I don't know if there is a difference
> >> between S+ and R here.  Before doing anything else give a data=
argument
> >> to transcan and see what happens.
> >>
> >> Frank
> >>
> > first thanks for your reply, I tested your example and it works like a
> > charm in S+.  This puzzled me since I was using the "data" argument of
the
> > impute function.  Hence I searched what was different in your use of
> > transcan and impute than my own use.  I saw that you were using the
> > "n.impute" argument of transcan and the "imputation" argument of
impute.  I
> > didn't use those since my application didn't require multiple
imputation, I
> > then tried to re-run my problem with those arguments and it worked no
> > problem.  I then re-run your example without doing multiple imputation
and
> > it did not work?  Here is exactly what I did:
> >
> > ttt.x1 <- rnorm(20, 0, 1)
> > ttt.x2 <- rnorm(20, 0.25, 1)
> > ttt.x2[c(2, 8, 11, 20)] <- NA
> > ttt.x2
> > library(section = "Hmisc", pos = 3)
> > ttt.d <- data.frame(ttt.x1, ttt.x2)
> > ttt.z <- transcan(~ttt.x1 + I(ttt.x2), imputed = TRUE, data = ttt.d)
> > ttt.imputed <- impute(ttt.z, data = ttt.d,
> >                    list.out=TRUE, pr=FALSE, check=FALSE)
> > sapply(ttt.imputed, function(x)sum(is.imputed(x)))
> >  ttt.x1 ttt.x2
> >       0      0
> > sapply(ttt.imputed, function(x)sum(is.na(x)))
> >  ttt.x1 ttt.x2
> >       0      4
> >
> > Hence, if I get this right, the "n.impute" argument to transcan MUST be
> > used for this type of application?
>
> I tried your example in R and it worked fine without n.impute.  If you
> want to see if there were any updates to R that fixed an old problem,
> run
> source('http://biostat.mc.vanderbilt.edu/cgi-bin/viewvc.
> cgi/*checkout*/Hmisc/trunk/R/transcan.s?rev=611')
> and then re-try your code.
>
> Next time preface your code with set.seed(1).
>
>
> >
> > Thanks again, and thanks for such great librairies, I use them,
> > particularly Hmisc, all the time!
>
> You're welcome Gerald.
> Frank
>
> >
> > Gérald
> >
> >>> may work in R but not in S+, the "save" argument to "detach" is not
> >>> currently supported.
> >>>
> >>> I then tried setting the "list.out" argument to TRUE.
> >>>
> >>> ttt.impute <- impute(RTA.Socio.transcan, data = TelePV.RTA.Socio,
> > list.out
> >>> = TRUE)
> >>>
> >>> According to the help file:
> >>>
> >>>    list.out
> >>>           If var is not specified, you can set list.out=TRUE to have
> >>>           impute.transcan return a list containing variables with
> > needed
> >>>           values imputed. This list will contain a single imputation.
> >>>
> >>> It runs OK but the values are not imputed:
> >>>
> >>>> lapply(ttt.impute, FUN = function(x) sum(is.na(x)))
> >>> $adhesion:
> >>> [1] 161
> >>> $X1.c06poparea:
> >>> [1] 8
> >>> $X2.c06dwlgarea:
> >>> [1] 8
> >>> $X3.pctpop0.14:
> >>> [1] 9
> >>> $X40.avg.persfam:
> >>> [1] 8
> >>> $X41.pct.move1an:
> >>> [1] 11
> >>> ...
> >>>
> >>> exactly the original missing values?  Am I missing something, I am
> > banging
> >>> my head since yesterday trying to do this, to no avail?  I can do it
> >>> manually through a loop but I thought this was the purpose of
"impute".
> >>>
> >>> Thanks for any insights,
> >>>
> >>> Gérald Jean
> >>> Conseiller senior en statistiques,
> >>> VP Planification et Développement des Marchés,
> >>> Desjardins Groupe d'Assurances Générales
> >>> télephone            : (418) 835-4900 poste (7639)
> >>> télecopieur          : (418) 835-6657
> >>> courrier électronique: gerald.jean@dgag.ca
> >>>
> >>> "In God we trust, all others must bring data"  W. Edwards Deming
> >>>
> >>>
> >> --
> >> Frank E Harrell Jr   Professor and Chair           School of Medicine
> >>                       Department of Biostatistics   Vanderbilt
University
> >>
> > Gérald Jean
> > Conseiller senior en statistiques,
> > VP Planification et Développement des Marchés,
> > Desjardins Groupe d'Assurances Générales
> > télephone            : (418) 835-4900 poste (7639)
> > télecopieur          : (418) 835-6657
> > courrier électronique: gerald.jean@dgag.ca
> >
> > "In God we trust, all others must bring data"  W. Edwards Deming
> >
> >
> > Le message ci-dessus, ainsi que les documents l'accompagnant, sont
> destinés uniquement aux personnes identifiées et peuvent contenir
> des informations
> > privilégiées, confidentielles ou ne pouvant être divulguées. Si
> vous avez reçu ce message par erreur, veuillez le détruire.
> >
> > This communication ( and/or the attachments ) is intended for
> named recipients only and may contain privileged or confidential
> information which is
> > not to be disclosed. If you received this communication by mistake
> please destroy all copies.
> >
> >
> >
> >
> > Faites bonne impression et imprimez seulement au besoin !
> > Think green before you print !
> >
> > Le message ci-dessus, ainsi que les documents l'accompagnant, sont
> destinés uniquement aux personnes identifiées et peuvent contenir
> des informations privilégiées, confidentielles ou ne pouvant être
> divulguées. Si vous avez reçu ce message par erreur, veuillez le
détruire.
> >
> > This communication (and/or the attachments) is intended for named
> recipients only and may contain privileged or confidential
> information which is not to be disclosed. If you received this
> communication by mistake please destroy all copies.
> >
>
>
> --
> Frank E Harrell Jr   Professor and Chair           School of Medicine
>                       Department of Biostatistics   Vanderbilt University



Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.

This communication ( and/or the attachments ) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.




Faites bonne impression et imprimez seulement au besoin !
Think green before you print !

Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés 
uniquement aux personnes identifiées et peuvent contenir des informations 
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu 
ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named recipients 
only and may contain privileged or confidential information which is not to be 
disclosed. If you received this communication by mistake please destroy all 
copies.

<Prev in Thread] Current Thread [Next in Thread>