s-news
[Top] [All Lists]

Re: Behavior of cor() with na.methods "omit" and

To: S-News Mail List <s-news@lists.biostat.wustl.edu>
Subject: Re: Behavior of cor() with na.methods "omit" and
From: "Kim Elmore" <Kim.Elmore@noaa.gov>
Date: Wed, 09 Nov 2005 21:08:14 -0600
Cc: Tim Hesterberg <timh@insightful.com>
In-reply-to: <SE2KEXCH01PrvCWO6hm000004f5@se2kexch01.insightful.com>
References: <437228B0.2050800@noaa.gov> <SE2KEXCH01PrvCWO6hm000004f5@se2kexch01.insightful.com>
This is wild...

I did this with S-Plus 7.0.0 for windows under XP. I'm now at home, with S-Plus 6.2 under Windows XP and get exactly the same results I got on my work machine: cor() using na.method = "available" yields exactly 0.5 the value of cor() using na.method = "omit".

> .First
function()
{
        library(resample, first = T)
        library(maps)
}

But wait! I failed to tell everyone that I'm loading the resample and maps libraries with a .First function when I start the session:

> .First
function()
{
        library(resample, first = T)
        library(maps)
}


I detached the resample library and now cor() behaves identically with na.method = "omit" and na.method = "available". Obviously, a different cor() is used under the resample library...

Hence, I can now refine my question: why do these two NA methods yield different results when the resample library is attached?

Kim Elmore

At 11:01 AM 11/9/2005, you wrote:
I can't reproduce that.  I get:

Enterprise Developer Version 7.0.0  for Microsoft Windows : 2005
Working data will be in d:/timh
> rn1 <- c(0.52170, -0.34945,  0.76141)
> rn2 <- c(0.44834, -2.10544, -0.91762)
> cor(rn1, rn2)
[1] 0.7176714
> cor(rn1, rn2, na.method = "omit")
[1] 0.7176714
> cor(rn1, rn2, na.method = "available")
[1] 0.7176714
> cor(cbind(rn1, rn2))
          rn1       rn2
rn1 1.0000000 0.7176714
rn2 0.7176714 1.0000000


What version of S-PLUS are you using?

I also note that the last answer you got is 0.5 times the other answer.

>I can't quite figure out what the cor() function does with the different
>na.methods "omit" and "available". I find that each of these methods
>yields a different answer, even if there is no missing data.
>
>example:
> > rn1 <- rnorm(3)
> > rn1
>[1]  0.52170 -0.34945  0.76141
> > rn2 <- rnorm(3)
> > rn2
>[1]  0.44834 -2.10544 -0.91762
> > cor(rn1, rn2)
>[1] 0.71767
> > cor(rn1, rn2, na.method = "omit")
>[1] 0.71767
>
>Which is exactly what I'd expect. But when
>
> > cor(rn1, rn2, na.method = "available")
>[1] 0.35884
>
>is the result.
>
>Why is this, when there is no missing data? If there is missing data,
>the two methods also yield different answers, though "omit" does what
>I'd expect, which is compute the correlation based on whatever pairs are
>available. I can't figure out what "available" does. Any insight will be
>deeply appreciated
>
>Kim Elmore
>
>
>--
>            Kim Elmore, OU/CIMMS/NSSL
>   "All of weather is divided into three parts:
>Yes, No and Maybe. The greatest of these is Maybe."


<Prev in Thread] Current Thread [Next in Thread>