Paul Matthias Diderichsen <paulmatthias.diderichsen@abbott.com> wrote:
> Dear S-plus users.
> I learned more or less all I know about Q-Q plots by reading
> http://en.wikipedia.org/wiki/Qq_plot and related ressources on the web. I
> think that I now understand more or less how these beasts work.
>
> Of course, there's still one issue: There seems to be a discrepancy
> between what wikipedia says and what S-plus does.
>
> > a<-data.frame(NORM=rnorm(3))
> # a$NORM could be any 3 values since I'm only interested in x below...
> > qqmath(~NORM,data=a,panel=function(x,y)print(sort(x)))
> [1] -0.9674216 0.0000000 0.9674216
> > qnorm(c(1/6,3/6,5/6))
> [1] -0.9674216 0.0000000 0.9674216
>
> versus: "k/(n + 1)-quantiles of the comparison distributon (e.g. the
> normal distribution) on the horizontal axis (for k = 1, ..., n)"
>
> In the example above, I create a qqplot with three points. As I understand
> wikipedia, these should be (on the x-axis) at the 1/4, 2/4, and 3/4
> quantiles of the normal distribution. However, instead they are at the
> 1/6, 3/6, and 5/6 quantiles.
>
> I'd appreciate it a lot, if anybody can explain to me what is "right". (Of
> course: Gurus - feel free to correct/add to the wikipedia entry in case
> it's not accurate!)
And then Paul followed up with:
> Apparently, S-plus selects the midpoint of n intervals between 0 and 1 for
> it's horizontal values in qqmath. Of course, for n->infinity the sets used
> by S-plus and wikipedia converge. But which one is "most right" for
> finite/small n?
And Timothy Li added:
> S-plus appears to be using the formula (k-0.375)/(n+0.25) that is
> advocated elsewhere on the internet. See
> http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap55/sect44.htm
>
> and
>
> http://www.okstate.edu/sas/v8/saspdf/qc/chap10.pdf#search=%22construction%20of%20q-q%20plot%22
By default, S-PLUS uses the f(i) = (i-0.5)/n formula. This is described
in Cleveland, W. (1993) Visualizing Data where it is noted that the
precise form of f(i) is not important for a quantile plot (it may make
a difference when n = 3 but you probably should not be looking at
a quantile plot with only three data points).
The probability values used in S-PLUS for qqmath and qqnorm are computed
from the ppoints function. The function has an optional argument a that
controls the precise values of f(i). The values returned are: (seq(n) -
a)/(n + 1 - 2*a). The default value of a=0.5 leads to the (i-0.5)/n
values. Setting a=0.375 leads to the formula noted by Timothy Li and
setting a=0 gives the results noted in Wikipedia.
The probability values used in the qqmath function are specified in the
f.value argument to qqmath. The default for f.value is ppoints. To
create a qqmath call using the Wikipedia probability values you
could do:
x <- rnorm(12)
myppoints <- function(n) ppoints(n, a=0)
qqmath(~ x, f.value=myppoints)
To use different probability values in qqnorm you need to define your
own ppoints with the appropriate default value of a that you want
since qqnorm.default calls ppoints using the default value of a.
-Stephen Kaluzny
Insightful Corp.
|