s-news
[Top] [All Lists]

Factor handling question.

To: s-news@wubios.wustl.edu
Subject: Factor handling question.
From: gerald.jean@dgag.ca
Date: Tue, 18 Dec 2007 16:07:20 -0500
Hello there,

I am creating a factor variable from a continuous variables with lots of
"NA".  The approach I would most naturally use doesn't work and I don't
understand why, here is the code:

> ttt.visa <- factor(na.include(cut(my.data$var1,
+                                   breaks = c(-Inf, 500, 1000, 1500, 2000,
3000,
+                                   4000, 5000, 7500, 10000, Inf),
+                                   labels =c("  500 et moins", "  501 -
1000 ",
+                                     " 1001 -  1500 ", " 1501 -  2000 ",
+                                     " 2001 -  3000 ", " 3001 -  4000 ",
+                                     " 4001 -  5000 ", " 5001 -  7500 ",
+                                     " 7501 - 10000 ", " Plus de 10000"),
+                                   include.lowest = T, factor.result =
T)),
+              levels = c("  501 -  1000 ", "  500 et moins", " 1001 -
1500 ",
+                " 1501 -  2000 ", " 2001 -  3000 ", " 3001 -  4000 ",
+                " 4001 -  5000 ", " 5001 -  7500 ", " 7501 - 10000 ",
+                " Plus de 10000", "NA"),
+              labels = c("  501 -  1000 ", "  500 et moins", " 1001 -
1500 ",
+                " 1501 -  2000 ", " 2001 -  3000 ", " 3001 -  4000 ",
+                " 4001 -  5000 ", " 5001 -  7500 ", " 7501 - 10000 ",
+                " Plus de 10000", " Manquant     "))
Problem in factor(na.include(cut(FMD.final.OK$mnt.limi.vi..: invalid labels
argument, length 11 should be 10 or 1
Use traceback() to see the call stack

The call to "factor" is just to re-assign the reference level and give a
significant name to the missing values.

On the other hand the following code works perfectly fine although I don't
do any merging??

ttt.visa <- merge.levels(na.include(cut(my.data$var1,
                                  breaks = c(-Inf, 500, 1000, 1500, 2000,
3000,
                                  4000, 5000, 7500, 10000, Inf),
                                  labels =c("  500 et moins", "  501 -
1000 ",
                                    " 1001 -  1500 ", " 1501 -  2000 ",
                                    " 2001 -  3000 ", " 3001 -  4000 ",
                                    " 4001 -  5000 ", " 5001 -  7500 ",
                                    " 7501 - 10000 ", " Plus de 10000"),
                                  include.lowest = T, factor.result = T)),
                         k = list("  501 -  1000 " = "  501 -  1000 ",
                                  "  500 et moins" = "  500 et moins",
                                  " 1001 -  1500 " = " 1001 -  1500 ",
                                  " 1501 -  2000 " = " 1501 -  2000 ",
                                  " 2001 -  3000 " = " 2001 -  3000 ",
                                  " 3001 -  4000 " = " 3001 -  4000 ",
                                  " 4001 -  5000 " = " 4001 -  5000 ",
                                  " 5001 -  7500 " = " 5001 -  7500 ",
                                  " 7501 - 10000 " = " 7501 - 10000 ",
                                  " Plus de 10000" = " Plus de 10000",
                                  " Manquants    " = "NA"))

Any insights on why "factor" doesn't work and "merge.levels" do work?
Also, is there other ways to do this more efficiently, requiring less
coding maybe?

Thanks,

Gérald Jean
Conseiller senior en statistiques, Actuariat
télephone            : (418) 835-4900 poste (7639)
télecopieur          : (418) 835-6657
courrier électronique: gerald.jean@dgag.ca

"In God we trust, all others must bring data"  W. Edwards Deming

Le message ci-dessus, ainsi que les documents l'accompagnant, sont destinés 
uniquement aux personnes identifiées et peuvent contenir des informations
privilégiées, confidentielles ou ne pouvant être divulguées. Si vous avez reçu 
ce message par erreur, veuillez le détruire.

This communication (and/or the attachments) is intended for named recipients 
only and may contain privileged or confidential information which is not
to be disclosed. If you received this communication by mistake please destroy 
all copies.



<Prev in Thread] Current Thread [Next in Thread>