Actually, it's a bit more interesting than that. The behavior you report is
EXPECTED: Here's why:
z_data.frame(1:3,2:4)
> z
X1 X2
1 1 2
2 2 3
3 3 4
> z[,2]_letters[1:3]
> z
X1 X2
1 1 a
2 2 b
3 3 c
> apply(z,2,mode)
X1 X2
"character" "character"
#### This is what we expect because (I assume) apply() works on
arrays and therefore
### must implicitly coerce everything to character to get an
array,which must be of one mode
### only.
#### However, what's going on here?
> lapply(z,mode)
$X1:
[1] "numeric"
$X2:
[1] "numeric"
### One should have gotten "numeric" and "character", right?
## Here's the reason:
> lapply(z,is.factor)
$X1:
[1] F
$X2:
[1] T
### the data.frame constructor by default coerces character data to
a factor -- which is an
### objects of mode "numeric" (integer actually) with a levels
attribute which is "character".
### Thus:
> lapply(z,levels)
$X1:
NULL
$X2:
[1] "a" "b" "c"
### the array constructor that is implicitly called by the use of
apply coerced the factor to "character" and
### then the whole array (both columns) to character to minimize
information loss.
## So everything is actually working as it's supposed to. If you
want to avoid the automatic factor coercion
## you should do the following
> z_data.frame(1:3,4:6)
> z[,2]_I(letters[1:3]) ## Note the use of the identity function
that prevents auto coercion
> lapply(z,mode)
$X1:
[1] "numeric"
$X2:
[1] "character"
Cheers,
Bert Gunter
Biometrics Research
Merck & Company
PO Box 200, Rahway, NJ 07065-0900
Ph: (732) 594-7765 Fax: 594-1565
"The business of the statistician is to catalyze the scientific learning
process." -- George E.P. Box
|