I'm not Terry, but I'll give my opinion. I find factors quite useful and very
natural--but I started programming using S and didn't migrate from another
language/application.
When a modeling application implicitly coerces a vector from character to
factor, how does it chose the reference level when using treatment style
contrasts? It sorts alphabetically, which means in treatment contrasts you
would be comparing all other levels to the level that sorts alphabetically
first. In most applications of medical research, we have a preconceived notion
of what should be the reference (placebo, active comparator, lowest dose). By
using factors we can explicitly state the ordering. This is also very useful
for displaying data in either tables of figures--ordering is important and this
is taken care of when constructing the factor objects.
I suppose it's a tradeoff of having to explicitly dump levels when needed or
take care of the ordering at the appropriate time just before the
modeling/display. I chose the earlier first method.
--Matt
Matt Austin
Global Statistical Lead, PMO
Director, Biostatistics
Amgen, Inc.
maustin@amgen.com
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Thompson, David (MNR)
Sent: Wednesday, March 05, 2008 7:07 AM
To: Terry Therneau; s-news@lists.biostat.wustl.edu; Mark.Hearnden@nt.gov.au
Subject: Re: [S] Removing levels from a factor
Terry,
Regarding the 'factors are only occasionally useful' comment, what are the
situations where factors are _actually_ useful?
Do not (most?) modelling functions that require factors usually coerce
character values as required?
Thanks, DaveT.
*************************************
Silviculture Data Analyst
Ontario Forest Research Institute
Ontario Ministry of Natural Resources
david.john.thompson@ontario.ca
http://ofri.mnr.gov.on.ca
*************************************
-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Terry
Therneau
Sent: March 5, 2008 09:05 AM
To: s-news@lists.biostat.wustl.edu; Mark.Hearnden@nt.gov.au
Subject: Re: [S] Removing levels from a factor
The easiest thing is to turn off factors:
options(stringsAsFactors=F)
data$Treatment <- as.character(data$Treatment)
Now you can subset the data frame and things will work as you would
anticipate.
Comment: factors are occassionaly useful, but only occasionally. Much
grief
can be avoided by turning them off by default. Our biostat group (>100
people,
over 1200 projects a year) has had the above option as a part of our
global
defaults for many years, and has not yet seen a downside to the
decision.
Terry Therneau
Mayo Clinic
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe
send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the
message: unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news