s-news
[Top] [All Lists]

Re: QQ plot x-axis values ['Watchdog': checked]

To: s-news@lists.biostat.wustl.edu
Subject: Re: QQ plot x-axis values ['Watchdog': checked]
From: Stephen Kaluzny <spk@insightful.com>
Date: Tue, 10 Oct 2006 16:37:36 -0700
In-reply-to: <C3C1925D945BF844911C8EEA752666AE031F290C@psbexmb2.psb.bls.gov>; from Li.Timothy@bls.gov on Tue, Oct 10, 2006 at 10:22:14AM -0400
References: <OFC4C90050.37AAF9AF-ONC1257203.004C4655-C1257203.004CA4AD@abbott.com> <C3C1925D945BF844911C8EEA752666AE031F290C@psbexmb2.psb.bls.gov>
User-agent: Mutt/1.2.5.1i
Paul Matthias Diderichsen <paulmatthias.diderichsen@abbott.com> wrote:

> Dear S-plus users.
> I learned more or less all I know about Q-Q plots by reading 
> http://en.wikipedia.org/wiki/Qq_plot and related ressources on the web. I 
> think that I now understand more or less how these beasts work.
> 
> Of course, there's still one issue: There seems to be a discrepancy 
> between what wikipedia says and what S-plus does.
> 
> > a<-data.frame(NORM=rnorm(3))
> # a$NORM could be any 3 values since I'm only interested in x below...
> > qqmath(~NORM,data=a,panel=function(x,y)print(sort(x)))
> [1] -0.9674216  0.0000000  0.9674216
> > qnorm(c(1/6,3/6,5/6))
> [1] -0.9674216  0.0000000  0.9674216
> 
> versus: "k/(n + 1)-quantiles of the comparison distributon (e.g. the 
> normal distribution) on the horizontal axis (for k = 1, ..., n)"
> 
> In the example above, I create a qqplot with three points. As I understand 
> wikipedia, these should be (on the x-axis) at the 1/4, 2/4, and 3/4 
> quantiles of the normal distribution. However, instead they are at the 
> 1/6, 3/6, and 5/6 quantiles.
> 
> I'd appreciate it a lot, if anybody can explain to me what is "right". (Of 
> course: Gurus - feel free to correct/add to the wikipedia entry in case 
> it's not accurate!)

And then Paul followed up with:

> Apparently, S-plus selects the midpoint of n intervals between 0 and 1 for 
> it's horizontal values in qqmath. Of course, for n->infinity the sets used 
> by S-plus and wikipedia converge. But which one is "most right" for 
> finite/small n?

And Timothy Li added:

> S-plus appears to be using the formula (k-0.375)/(n+0.25) that is
> advocated elsewhere on the internet. See
> http://www.tau.ac.il/cc/pages/docs/sas8/stat/chap55/sect44.htm
> 
> and
> 
> http://www.okstate.edu/sas/v8/saspdf/qc/chap10.pdf#search=%22construction%20of%20q-q%20plot%22

By default, S-PLUS uses the f(i) = (i-0.5)/n formula.  This is described
in Cleveland, W. (1993) Visualizing Data where it is noted that the
precise form of f(i) is not important for a quantile plot (it may make
a difference when n = 3 but you probably should not be looking at
a quantile plot with only three data points).

The probability values used in S-PLUS for qqmath and qqnorm are computed
from the ppoints function. The function has an optional argument a that
controls the precise values of f(i). The values returned are: (seq(n) -
a)/(n + 1 - 2*a). The default value of a=0.5 leads to the (i-0.5)/n
values. Setting a=0.375 leads to the formula noted by Timothy Li and
setting a=0 gives the results noted in Wikipedia.

The probability values used in the qqmath function are specified in the
f.value argument to qqmath. The default for f.value is ppoints. To
create a qqmath call using the Wikipedia probability values you
could do:

    x <- rnorm(12)
    myppoints <- function(n) ppoints(n, a=0)
    qqmath(~ x, f.value=myppoints)

To use different probability values in qqnorm you need to define your
own ppoints with the appropriate default value of a that you want
since qqnorm.default calls ppoints using the default value of a.

-Stephen Kaluzny
 Insightful Corp.

<Prev in Thread] Current Thread [Next in Thread>