| To: | s-news@lists.biostat.wustl.edu |
|---|---|
| Subject: | Burned by factors |
| From: | "Kevin Wright" <kw.statr@gmail.com> |
| Date: | Wed, 5 Mar 2008 12:12:18 -0600 |
| Dkim-signature: | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; bh=7xAJ9LzYknWFK9XJTtQc65L+3rcDmVcQC0UecYd+6KA=; b=HD43leDoVNB8zvtzZ0W22bjpyxHLJhB8laaZQ8MR5aL1WcyvAh5UyCMz//BkK3QSSHwfYVc9JTv6vigOn66EIGaNkbvViO25EeQune1/UHuGDLAU9cOioNJTAUghwIFIb+JxAdWhVZQl+6no+fOrH94y4CSwgWEaQ5IOcMtjdSA= |
| Domainkey-signature: | a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=s/l2/k0Md7cHRPZkBHEtdWEuZYnXy/41FKIpMW95WeFtJMWBpX6Gh9IqYN9wFNCFtAVYQTGsOgKtlTD3nOEZ/VKnIJAH3RaKUHILgX4Gxvkyp2FIOusr76R1UgiCYL4QAlLzM0m+u2IBpdh045WPhRcanegs8PRjcBAjdI55JHs= |
One place I find factors nice is when I create trellis plots of
subsets of data. Factors keep the panels in the same place on the
page across different subsets of data, even in the situation that one
subset might have a panel with no data.
On the whole, however, I spend a significant amount of time fighting
factors and my conclusion is that S is far too eager to convert data
to factors.
Here's an example that burned me badly and cost me about six hours of
work. In essence, what happened was that data was read from two
different files. In one file, 'age' was read as numeric, while in the
second file 'age' had an unexpected, non-numeric value that caused a
behind-the-scenes conversion of age to a factor (instead of a numeric
with a missing value). Later in the code, merging these caused
unexpected results. Here is the essence of what happened:
age1 <- factor(c("20", "21", "22"))
age2 <- c(20, 21, 22)
ifelse(c(T, T, F), age1, age2)
[1] 1 2 22
The desired result was: 20, 21, 22.
Give a report to a client with erroneous results that traces back to
this phenomenon and you'll become paranoid about conversions to
factors.
K Wright
|
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: Removing levels from a factor, Austin, Matt |
|---|---|
| Next by Date: | Re: Factors, Terry Therneau |
| Previous by Thread: | Removing levels from a factor, Mark . Hearnden |
| Next by Thread: | Re: Burned by factors, Douglas Bates |
| Indexes: | [Date] [Thread] [Top] [All Lists] |