s-news
[Top] [All Lists]

Re: [S] Histograms.

To: Rolf Turner <rolf@math.unb.ca>
Subject: Re: [S] Histograms.
From: Prof Brian D Ripley <ripley@stats.ox.ac.uk>
Date: Sun, 31 Oct 1999 21:33:54 +0000 (GMT)
Cc: s-news@wubios.wustl.edu
In-reply-to: <199910311834.OAA09137@tanner.math.unb.ca>
Sender: owner-s-news@wubios.wustl.edu
On Sun, 31 Oct 1999, Rolf Turner wrote:

> 
> 
> In response to a question from Kin Cheung, Albyn Jones writes:
> 
> >  a histogram, or in general a density does not have to be bounded 
> >  by 1, the total area has to be 1.  Consider for example the normal
> >  with small standard deviation:
> >  
> >  > dnorm(0,0,.001)
> >  [1] 398.9423
> 
> All very true, but I think the argument name ``probability'' in
> hist() is misleading.  If I set ``probability = T'' I'd expect to get
> probabilities, numbers in the range 0 to 1.  In other words, what the
> elementary stats texts call a ``relative frequency'' histogram.
> (Admittedly the documentation for hist() does indeed make it clear
> --- if one reads it --- that one is getting a p.d.f. and not relative
> frequencies.)

Depends on the nationality of the stats texts, we find. Bill Venables and I
have pointed out here a few times that in British and Australian usage, a
histogram always has area one.  Everything else is not a histogram.  This
is important if the bin widths differ (as they can with hist) when
frequency or relative frequency plots can be seriously misleading.

> If the histogram is to be thought of as a probability ***density***
> function, the argument should ``say so'' --- i.e. the argument should
> be ``probability.density = T''.  Or, perhaps to shorten things down,
> ``pdf = T''.

Well, it does say so in the help page.

> One does, from time to time, want ``relative frequency'' histograms;
> it would be nice if hist() provided a facility for this.  As things
> stand, one has to modify the code of hist() locally, or fiddle about
> with barplot().  Neither is difficult to do, but it's a mild pain
> in the pohutukawa.

Perhaps, just give it a name other than hist or histogram to avoid
confusing those who were taught in other traditions, and give a warning
that they can be seriously misleading if variable bin widths are used.

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>