I still think funny things are going on --- if there aren't bugs then
there are annoyingly misleading ``features''.
Consider the following:
# Create a silly data frame, and investigate the modes of
# its columns.
> junk <- data.frame(x=1:10,y=11:20,z=21:30)
> print(lapply(junk,mode))
$x:
[1] "numeric"
$y:
[1] "numeric"
$z:
[1] "numeric" # All is in harmony.
# Now do the conversion to character, using the ``$'' syntax
# to refer to components:
> junk$z <- ifelse(junk$z<26,"a","b")
> print(junk)
x y z
1 1 11 a
2 2 12 a
3 3 13 a
4 4 14 a
5 5 15 a
6 6 16 b
7 7 17 b
8 8 18 b
9 9 19 b
10 10 20 b # It looks like it worked.
> print(lapply(junk,mode))
$x:
[1] "numeric"
$y:
[1] "numeric"
$z:
[1] "character" # We get character here, not numeric; apparently
# junk$z has ***not*** been coerced to a factor.
# Restore ``junk'' to its orginal beauty:
> junk$z <- 21:30
> print(junk)
x y z
1 1 11 21
2 2 12 22
3 3 13 23
4 4 14 24
5 5 15 25
6 6 16 26
7 7 17 27
8 8 18 28
9 9 19 29
10 10 20 30 # Ommmmmmmmmm.
# Now do the conversion to character, using the [,'name'] syntax
# to refer to components:
> junk[,'z'] <- ifelse(junk$z<26,"a","b")
> print(junk)
x y z
1 1 11 a
2 2 12 a
3 3 13 a
4 4 14 a
5 5 15 a
6 6 16 b
7 7 17 b
8 8 18 b
9 9 19 b
10 10 20 b # Same appearance as before ... but
> print(lapply(junk,mode))
$x:
[1] "numeric"
$y:
[1] "numeric"
$z:
[1] "numeric" # Here junk$z ***has*** been coerced to a factor.
Why the difference?
Whatever the reason, this business of coercing character to factor
in data frames is a real annoyance; it misleads and confuses and
disrupts and ought to be done away with.
cheers,
Rolf Turner
|