s-news
[Top] [All Lists]

RE: Columns of data.frame converted from numeric to character .

To: "'Barnali Das'" <DASB@WESTAT.com>, "'Gerald.Jean@spgdag.ca'" <Gerald.Jean@spgdag.ca>, s-news@wubios.wustl.edu
Subject: RE: Columns of data.frame converted from numeric to character .
From: "Gunter, Bert" <bert_gunter@merck.com>
Date: Wed, 29 Nov 2000 12:47:51 -0500
Actually, it's a bit more interesting than that. The behavior you report is
EXPECTED: Here's why:

          z_data.frame(1:3,2:4)
        > z
          X1 X2 
        1  1  2
        2  2  3
        3  3  4
        > z[,2]_letters[1:3]
        > z
          X1 X2 
        1  1  a
        2  2  b
        3  3  c
        > apply(z,2,mode)
                  X1          X2 
         "character" "character"

        #### This is what we expect because (I assume) apply() works on
arrays and therefore
        ### must implicitly coerce everything to character to get an
array,which must be of one mode
        ### only.

        #### However, what's going on here?

        > lapply(z,mode)
        $X1:
        [1] "numeric"

        $X2:
        [1] "numeric"

        ### One should have gotten "numeric" and "character", right?
        ## Here's the reason:

        > lapply(z,is.factor)
        $X1:
        [1] F

        $X2:
        [1] T

        ### the data.frame constructor by default coerces character data to
a factor -- which is an
        ### objects of mode "numeric" (integer actually) with a levels
attribute which is "character".
        ### Thus:

        > lapply(z,levels)
        $X1:
        NULL

        $X2:
        [1] "a" "b" "c"

        ### the array constructor that is implicitly called by the use of
apply coerced the factor to "character" and
        ### then the whole array (both columns) to character to minimize
information loss.

        ## So everything is actually working as it's supposed to. If you
want to avoid the automatic factor coercion
        ## you should do the following

        > z_data.frame(1:3,4:6)

        > z[,2]_I(letters[1:3])  ## Note the use of the identity function
that prevents auto coercion
        > lapply(z,mode)
        $X1:
        [1] "numeric"

        $X2:
        [1] "character"


        Cheers,

Bert Gunter
Biometrics Research
Merck & Company
PO Box 200, Rahway, NJ 07065-0900
Ph: (732) 594-7765    Fax: 594-1565

"The business of the statistician is to catalyze the scientific learning
process."    --  George E.P. Box


<Prev in Thread] Current Thread [Next in Thread>