S-PLUS 2000 R3 on a Dell Wintel box running W2000.
I've run across something very obscure that I could use some net wisdom to
address. This occurred when hmisc's cut2 suddenly stopped working,
complaining of NA's. After much chasing down and isolating, I've localized
the problem to a failure in "duplicated". As that calls an internal
function, I'm now stuck.
The 12 word summary is that *sometimes* duplicated doesn't recognize all
NA's as being duplicates of each other. The usual behavior is as follows:
> duplicated(c(NA, NA, NA))
[1] F T T
>
However, I have a mode numeric vector (called "bad.vector") where two NA's
are not recognized as being the same:
> bad.vector
[1] NA 1
> mode(bad.vector)
[1] "numeric"
> bad.vector <- c(bad.vector, NA, NA)
> bad.vector
[1] NA 1 NA NA
> mode(bad.vector)
[1] "numeric"
> duplicated(bad.vector)
[1] F F F T
I tried to dump the vector to see if there was a clue, but the dump looked
normal. Not only that, upon restoring, the vector ceased to be "bad":
> bad.vector.2 <- bad.vector
> data.dump("bad.vector")
[1] "dumpdata"
> rm(bad.vector)
> data.restore("dumpdata", print = T)
"bad.vector": 4 values of mode "numeric"
[1] "dumpdata"
> bad.vector
[1] NA 1 NA NA
> duplicated(bad.vector)
[1] F F T T
> duplicated(bad.vector.2)
[1] F F F T
So, four questions:
1. Anyone know what's happening here?
2. What is the internal representation of numeric NA's?
3. How can I see if the internal representations differ?
4. Could this be due to a corrupted file? How can I tell?
It should be noted that the final vector (bad.vector) is a small piece of a
larger vector. In that larger vector, *some* of the NA's were recognized as
being dups, and *some* were not.
Any ideas?
David O Nelson, Ph.D. (daven@llnl.gov)
Lawrence Livermore National Laboratory
Box 808, L-441
Livermore CA 94551
ph: +1.925.423.8898
fax: +1.925.422.2282
|