s-news
[Top] [All Lists]

Re: Removing levels from a factor

To: "Austin, Matt" <maustin@amgen.com>
Subject: Re: Removing levels from a factor
From: Frank E Harrell Jr <f.harrell@vanderbilt.edu>
Date: Wed, 05 Mar 2008 12:41:31 -0600
Cc: "s-news@lists.biostat.wustl.edu" <s-news@lists.biostat.wustl.edu>
In-reply-to: <A413DCB0A7390F41B86A3EF6E2C9743A37BDAC6C69@usto-pmsg-mbs02.am.corp.amgen.com>
References: <A413DCB0A7390F41B86A3EF6E2C9743A37BDAC6C69@usto-pmsg-mbs02.am.corp.amgen.com>
User-agent: Thunderbird 2.0.0.6 (X11/20071022)
Austin, Matt wrote:
I'm not Terry, but I'll give my opinion.  I find factors quite useful and very 
natural--but I started programming using S and didn't migrate from another 
language/application.

When a modeling application implicitly coerces a vector from character to 
factor, how does it chose the reference level when using treatment style 
contrasts? It sorts alphabetically, which means in treatment contrasts you 
would be comparing all other levels to the level that sorts alphabetically 
first. In most applications of medical research, we have a preconceived notion 
of what should be the reference (placebo, active comparator, lowest dose).  By 
using factors we can explicitly state the ordering.  This is also very useful 
for displaying data in either tables of figures--ordering is important and this 
is taken care of when constructing the factor objects.

I suppose it's a tradeoff of having to explicitly dump levels when needed or 
take care of the ordering at the appropriate time just before the 
modeling/display.  I chose the earlier first method.

--Matt

Matt Austin
Global Statistical Lead, PMO
Director, Biostatistics
Amgen, Inc.
maustin@amgen.com

I have to agree with Matt. In my experience and in the experience of a large number of R users in our department, the advantages of factors far outway the disadvantages.

Frank Harrell




-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu 
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Thompson, David (MNR)
Sent: Wednesday, March 05, 2008 7:07 AM
To: Terry Therneau; s-news@lists.biostat.wustl.edu; Mark.Hearnden@nt.gov.au
Subject: Re: [S] Removing levels from a factor

Terry,

Regarding the 'factors are only occasionally useful' comment, what are the 
situations where factors are _actually_ useful?
Do not (most?) modelling functions that require factors usually coerce 
character values as required?

Thanks, DaveT.
*************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
david.john.thompson@ontario.ca
http://ofri.mnr.gov.on.ca
*************************************
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Terry
Therneau
Sent: March 5, 2008 09:05 AM
To: s-news@lists.biostat.wustl.edu; Mark.Hearnden@nt.gov.au
Subject: Re: [S] Removing levels from a factor

The easiest thing is to turn off factors:

options(stringsAsFactors=F)
data$Treatment <- as.character(data$Treatment)
Now you can subset the data frame and things will work as you would
anticipate.
Comment: factors are occassionaly useful, but only occasionally.  Much
grief
can be avoided by turning them off by default.  Our biostat group (>100
people,
over 1200 projects a year) has had the above option as a part of our
global
defaults for many years, and has not yet seen a downside to the
decision.
      Terry Therneau
      Mayo Clinic
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To unsubscribe 
send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the 
message:  unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



--
Frank E Harrell Jr   Professor and Chair           School of Medicine
                     Department of Biostatistics   Vanderbilt University


<Prev in Thread] Current Thread [Next in Thread>