s-news
[Top] [All Lists]

Re: Factor handling question.

To: s-news@wubios.wustl.edu, tplate@acm.org
Subject: Re: Factor handling question.
From: gerald.jean@dgag.ca
Date: Wed, 19 Dec 2007 10:43:35 -0500
In-reply-to: <47684BEC.8080909@acm.org>
Thanks to Tony Plate, Lawrence Hunsicker and Andreas Krause for their
reply,

Tony's solution was the most appropriate, and elegant in my opinion, for my
problem.

Thanks again,

Gérald Jean
Conseiller senior en statistiques, Actuariat
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca

Gérald Jean
Conseiller senior en statistiques, Actuariat
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming


                                                                           
             Tony Plate                                                    
             <tplate@acm.org>                                              
                                                                         A 
             2007/12/18 17:38          gerald.jean@dgag.ca                 
                                                                        cc 
                                       s-news@wubios.wustl.edu             
                                                                     Objet 
                                       Re: [S] Factor handling question.   
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           
                                                                           




Try using factor(..., exclude=NULL), e.g.:

 > b <- c(-Inf, 0, 1, 3, Inf)
 > x <- c(-2:8,NA)
 > factor(cut(x, breaks=b), levels=c(1:4,NA), labels=LETTERS[1:5],
exclude=NULL)
  [1] A A A B C C D D D D D E
 >

-- Tony Plate


gerald.jean@dgag.ca wrote:
> Hello there,
>
> I am creating a factor variable from a continuous variables with lots of
> "NA".  The approach I would most naturally use doesn't work and I don't
> understand why, here is the code:
>
>> ttt.visa <- factor(na.include(cut(my.data$var1,
> +                                   breaks = c(-Inf, 500, 1000, 1500,
2000,
> 3000,
> +                                   4000, 5000, 7500, 10000, Inf),
> +                                   labels =c("  500 et moins", "  501 -
> 1000 ",
> +                                     " 1001 -  1500 ", " 1501 -  2000 ",
> +                                     " 2001 -  3000 ", " 3001 -  4000 ",
> +                                     " 4001 -  5000 ", " 5001 -  7500 ",
> +                                     " 7501 - 10000 ", " Plus de
10000"),
> +                                   include.lowest = T, factor.result =
> T)),
> +              levels = c("  501 -  1000 ", "  500 et moins", " 1001 -
> 1500 ",
> +                " 1501 -  2000 ", " 2001 -  3000 ", " 3001 -  4000 ",
> +                " 4001 -  5000 ", " 5001 -  7500 ", " 7501 - 10000 ",
> +                " Plus de 10000", "NA"),
> +              labels = c("  501 -  1000 ", "  500 et moins", " 1001 -
> 1500 ",
> +                " 1501 -  2000 ", " 2001 -  3000 ", " 3001 -  4000 ",
> +                " 4001 -  5000 ", " 5001 -  7500 ", " 7501 - 10000 ",
> +                " Plus de 10000", " Manquant     "))
> Problem in factor(na.include(cut(FMD.final.OK$mnt.limi.vi..: invalid
labels
> argument, length 11 should be 10 or 1
> Use traceback() to see the call stack
>
> The call to "factor" is just to re-assign the reference level and give a
> significant name to the missing values.
>
> On the other hand the following code works perfectly fine although I
don't
> do any merging??
>
> ttt.visa <- merge.levels(na.include(cut(my.data$var1,
>                                   breaks = c(-Inf, 500, 1000, 1500, 2000,
> 3000,
>                                   4000, 5000, 7500, 10000, Inf),
>                                   labels =c("  500 et moins", "  501 -
> 1000 ",
>                                     " 1001 -  1500 ", " 1501 -  2000 ",
>                                     " 2001 -  3000 ", " 3001 -  4000 ",
>                                     " 4001 -  5000 ", " 5001 -  7500 ",
>                                     " 7501 - 10000 ", " Plus de 10000"),
>                                   include.lowest = T, factor.result =
T)),
>                          k = list("  501 -  1000 " = "  501 -  1000 ",
>                                   "  500 et moins" = "  500 et moins",
>                                   " 1001 -  1500 " = " 1001 -  1500 ",
>                                   " 1501 -  2000 " = " 1501 -  2000 ",
>                                   " 2001 -  3000 " = " 2001 -  3000 ",
>                                   " 3001 -  4000 " = " 3001 -  4000 ",
>                                   " 4001 -  5000 " = " 4001 -  5000 ",
>                                   " 5001 -  7500 " = " 5001 -  7500 ",
>                                   " 7501 - 10000 " = " 7501 - 10000 ",
>                                   " Plus de 10000" = " Plus de 10000",
>                                   " Manquants    " = "NA"))
>
> Any insights on why "factor" doesn't work and "merge.levels" do work?
> Also, is there other ways to do this more efficiently, requiring less
> coding maybe?
>
> Thanks,
>
> Gérald Jean
> Conseiller senior en statistiques, Actuariat
> télephone            : (418) 835-4900 poste (7639)
> télecopieur          : (418) 835-6657
> courrier électronique: gerald.jean@dgag.ca
>
> "In God we trust, all others must bring data"  W. Edwards Deming
>
> Le message ci-dessus, ainsi que les documents l'accompagnant, sont
destinés uniquement aux personnes identifiées et peuvent contenir des
informations
> privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.
>
> This communication (and/or the attachments) is intended for named
recipients only and may contain privileged or confidential information
which is not
> to be disclosed. If you received this communication by mistake please
destroy all copies.
>
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>




Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez
reçu ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named
recipients only and may contain privileged or confidential information
which is not to be disclosed. If you received this communication by mistake
please destroy all copies.



<Prev in Thread] Current Thread [Next in Thread>