s-news
[Top] [All Lists]

Re: Quartiles

To: ripley@stats.ox.ac.uk (Prof Brian Ripley)
Subject: Re: Quartiles
From: "Edward Malthouse" <ecm@casbah.it.northwestern.edu>
Date: Fri, 7 Dec 2001 13:55:30 -0600 (CST)
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <Pine.LNX.4.31.0112071905480.1544-100000@gannet.stats> from "Prof Brian Ripley" at Dec 07, 2001 07:09:18 PM
> 
> On Fri, 7 Dec 2001, Edward Malthouse wrote:
> 
> > See Hyndman and Fan (1996), "Sample quantiles in statistical
> > packages,"  American Statistician, Vol 50, number 4, pp-361-365.
> >
> > If I recall correctly, they give about 9 different definitions of
> > quantiles that have been proposed in the literature and a list of
> > desirable properties.  They show which definitions have which
> > properties.  As I recall, the method they recommend isn't in SAS,
> > SPSS, or S-plus.  The default in SAS is one of the better ones.
> 
> Better for what? (Not that we were discussing SAS.)

On page 364 of the Hyndman and Fan they state
"Splus:  the quantile() command of Splus 3.1 uses [definition 7]
(although S-PLUS(1991) states that [definition 5] is used)."
Definition 5 satisfies all 6 of their desirable properties and is
they one the recommend while Definition 7 does not satisfy
properties 3 and 5.  

The default in proc univariate is PCTLDEF=5, which corresponds to
definition 2 in Hyndman and Fan.  This one is not very good (my
memory was wrong).  PCTLDEF=4 corresponds to definition 6 in
Hyndman and Fan, which satisfies all but desirable property 3.

Desirable Property 3 is
Freq(X_k \le \hat{Q}_i(p)) = Freq(X_k \ge \hat{Q}_i(1-p))
meaning the number of cases that are <= the p quantile should equal
the number of cases that are >= the 1-p quantile.

FYI:  on SPSS they state
"The frequencies commend of SPSS appears to use [definition 6],
although this is nowhere documented."

> 
> > Functions that compute quantiles in S-plus are (were not)
> > consistent (Here is an example using S-plus 2000.  It may have been
> > fixed as there was discussion on this list a few years ago.):
> >
> > > x <- c(8,9,10,11,14,20)
> > > quantile(x)
> >    0%  25%  50%   75% 100%
> >     8 9.25 10.5 13.25   20
> > > summary(x)
> >  Min. 1st Qu. Median Mean 3rd Qu. Max.
> >     8    9.25   10.5   12    13.2   20
> 
> Nope: they *were* and are consistent. They are printed to different
> numbers of significant digits.  See
> 
> > print(summary(x), digits=5)
>   Min. 1st Qu. Median  Mean 3rd Qu.  Max.
>   8.00  9.25   10.50  12.00 13.25   20.00

You are correct.  It is the stem function that is inconsistent.

> stem(x)

N = 6   Median = 10.5
Quartiles = 9, 14

Ed Malthouse

Dr. Edward C. Malthouse
Assistant Professor
Integrated Marketing Communications Department
Medill School of Journalism
1908 Sheridan Road
Evanston, IL  60208-1290
Tele:  847-467-3376
Fax:  847-491-5925

<Prev in Thread] Current Thread [Next in Thread>