Rolf et al:
I believe I have an explanation for the following behavior that you
described: I intersperse my remarks with %% signs to delineate them:
%% ( Following quoted from Rolf Turner's posting )
> I still think funny things are going on --- if there aren't bugs then
> there are annoyingly misleading ``features''.
>
> Consider the following:
>
> # Create a silly data frame, and investigate the modes of
> # its columns.
> > junk <- data.frame(x=1:10,y=11:20,z=21:30)
> > print(lapply(junk,mode))
>
> $x:
> [1] "numeric"
>
> $y:
> [1] "numeric"
>
> $z:
> [1] "numeric" # All is in harmony.
>
> # Now do the conversion to character, using the ``$'' syntax
> # to refer to components:
> > junk$z <- ifelse(junk$z<26,"a","b")
> > print(junk)
>
> x y z
> 1 1 11 a
> 2 2 12 a
> 3 3 13 a
> 4 4 14 a
> 5 5 15 a
> 6 6 16 b
> 7 7 17 b
> 8 8 18 b
> 9 9 19 b
> 10 10 20 b # It looks like it worked.
>
> > print(lapply(junk,mode))
> $x:
> [1] "numeric"
>
> $y:
> [1] "numeric"
>
> $z:
> [1] "character" # We get character here, not numeric; apparently
> # junk$z has ***not*** been coerced to a factor.
>
%%% Yes, the reason being that using the list syntax junk$z invokes list
methods that do not look at the dim attribute of junk and therefore do not
consider it a dataframe and therefore do not do automatic coercion.
> # Restore ``junk'' to its orginal beauty:
> > junk$z <- 21:30
> > print(junk)
> x y z
> 1 1 11 21
> 2 2 12 22
> 3 3 13 23
> 4 4 14 24
> 5 5 15 25
> 6 6 16 26
> 7 7 17 27
> 8 8 18 28
> 9 9 19 29
> 10 10 20 30 # Ommmmmmmmmm.
>
> # Now do the conversion to character, using the [,'name'] syntax
> # to refer to components:
> > junk[,'z'] <- ifelse(junk$z<26,"a","b")
> > print(junk)
> x y z
> 1 1 11 a
> 2 2 12 a
> 3 3 13 a
> 4 4 14 a
> 5 5 15 a
> 6 6 16 b
> 7 7 17 b
> 8 8 18 b
> 9 9 19 b
> 10 10 20 b # Same appearance as before ... but
>
> > print(lapply(junk,mode))
> $x:
> [1] "numeric"
>
> $y:
> [1] "numeric"
>
> $z:
> [1] "numeric" # Here junk$z ***has*** been coerced to a factor.
>
> Why the difference?
%%% Using the junk[,'z'] syntax now explicitly forces use of a dataframe
method for the substitution, since the dim attribute of the list is
explicitly invoked. Hence the coercion is automatically done.
> Whatever the reason, this business of coercing character to factor
> in data frames is a real annoyance; it misleads and confuses and
> disrupts and ought to be done away with.
>
%%% I don't think so. Using the junk$z syntax explicitly asks that
the junk object be considered as a list without reference to its other
attributes (like "dim") and that list operations, here list substitution, be
used. Therefore you get a change to character mode.
Use of the junk{,'z'] syntax specifically asks that the "dim"
attribute be considered and that dataframe methods, here dataframe
substitution, be used. Hence the automatic coercion is done. The problem
here is the subtlety of object orientation -- the two syntaxes are NOT in
fact equivalent for the reasons described, although we are perhaps
insufficiently warned about the difference.
The coercion to factors exacerbates the problem, of course, but
there are good reasons for doing this in many -- probably most -- cases. In
any case, changing this automatic coercion would break a lot of code and
therefore is out of the question. You are always free to write your own
versions of the functions that do not do this coercion.
S-Plus is a programming language, and like all programming languages
has subtle features/traps that programmers must heed. This example is one of
them. I think careful reflection shows that, annoying as it may seem, the
behavior that you have noted is entirely proper and consistent with the
languages syntax and semantics and therefore is most certainly NOT a bug. As
always, those programmers who wish to take advantage of the power that these
features provide must also be cognicent of ALL the cosequences.
Thanks for bringing these issues to our attention and, in
particularly, forcing me to think about them carefully.
Cheers,
> Bert Gunter
> Biometrics Research RY 70-38
> Merck & Company
> P.O. Box 2000
> Rahway, NJ 07065-0900
> Phone: (732) 594-7765
> mailto: bert_gunter@merck.com
>
> "The business of the statistician is to catalyze the scientific learning
> process." -- George E.P. Box
|