>
> On Fri, 7 Dec 2001, Edward Malthouse wrote:
>
> > See Hyndman and Fan (1996), "Sample quantiles in statistical
> > packages," American Statistician, Vol 50, number 4, pp-361-365.
> >
> > If I recall correctly, they give about 9 different definitions of
> > quantiles that have been proposed in the literature and a list of
> > desirable properties. They show which definitions have which
> > properties. As I recall, the method they recommend isn't in SAS,
> > SPSS, or S-plus. The default in SAS is one of the better ones.
>
> Better for what? (Not that we were discussing SAS.)
On page 364 of the Hyndman and Fan they state
"Splus: the quantile() command of Splus 3.1 uses [definition 7]
(although S-PLUS(1991) states that [definition 5] is used)."
Definition 5 satisfies all 6 of their desirable properties and is
they one the recommend while Definition 7 does not satisfy
properties 3 and 5.
The default in proc univariate is PCTLDEF=5, which corresponds to
definition 2 in Hyndman and Fan. This one is not very good (my
memory was wrong). PCTLDEF=4 corresponds to definition 6 in
Hyndman and Fan, which satisfies all but desirable property 3.
Desirable Property 3 is
Freq(X_k \le \hat{Q}_i(p)) = Freq(X_k \ge \hat{Q}_i(1-p))
meaning the number of cases that are <= the p quantile should equal
the number of cases that are >= the 1-p quantile.
FYI: on SPSS they state
"The frequencies commend of SPSS appears to use [definition 6],
although this is nowhere documented."
>
> > Functions that compute quantiles in S-plus are (were not)
> > consistent (Here is an example using S-plus 2000. It may have been
> > fixed as there was discussion on this list a few years ago.):
> >
> > > x <- c(8,9,10,11,14,20)
> > > quantile(x)
> > 0% 25% 50% 75% 100%
> > 8 9.25 10.5 13.25 20
> > > summary(x)
> > Min. 1st Qu. Median Mean 3rd Qu. Max.
> > 8 9.25 10.5 12 13.2 20
>
> Nope: they *were* and are consistent. They are printed to different
> numbers of significant digits. See
>
> > print(summary(x), digits=5)
> Min. 1st Qu. Median Mean 3rd Qu. Max.
> 8.00 9.25 10.50 12.00 13.25 20.00
You are correct. It is the stem function that is inconsistent.
> stem(x)
N = 6 Median = 10.5
Quartiles = 9, 14
Ed Malthouse
Dr. Edward C. Malthouse
Assistant Professor
Integrated Marketing Communications Department
Medill School of Journalism
1908 Sheridan Road
Evanston, IL 60208-1290
Tele: 847-467-3376
Fax: 847-491-5925
|