s-news
[Top] [All Lists]

Re: Burned by factors

To: alan.hochberg@prosanos.com
Subject: Re: Burned by factors
From: Terry Therneau <therneau@mayo.edu>
Date: Mon, 17 Mar 2008 09:09:06 -0500 (CDT)
Cc: s-news@lists.biostat.wustl.edu
Reply-to: Terry Therneau <therneau@mayo.edu>
I had to chuckle at Doug's synopsis

> Yes.  This tends to be a religious issue with fundamentalists in both
> camps.  I believe that the "factors are beneficial" camp is larger
> than the "factors are the work of the devil" camp.

  I think factors are useful, but that not every character variable should be a 
factor.  
  In some releases of Splus and R, however, the "turn characters into factors" 
behavior has been so ingrained that it was very hard to have a character 
variable in a data frame at all, even if you wanted it that way.  This is the 
behavior that led to the most impassioned factor arguments, I think.  But that 
was a transient design flaw, now fixed and behind us.  (Not as transient as it 
should have been, but the difficulty in convincing certain people that it was a 
design flaw is another story).
  
  In choosing between two behaviors
    a. Splus/R automatically turns characters into factors, the user turns some 
of these back into strings
    b. No automatic conversion, the user turns appropriate strings into factors.
    
I vote strongly for (b).  It's been my experience that I get a "gotcha" less 
often, and that nothing important is broken by taking that road.  But with 
current releases of the code (a) works well too, and seems to be preferred by 
more people.  

  Alan's comments on better warnings are an interesting idea.  The frequency of 
the 'factor bit me' questions on the newslists argues that several of the 
anomalies that arise are far from obvious.  
  
        Terry T.
        

    

 


<Prev in Thread] Current Thread [Next in Thread>