>
>
> In response to a question from Kin Cheung, Albyn Jones writes:
>
> > a histogram, or in general a density does not have to be bounded
> > by 1, the total area has to be 1. Consider for example the normal
> > with small standard deviation:
> >
> > > dnorm(0,0,.001)
> > [1] 398.9423
>
> All very true, but I think the argument name ``probability'' in
> hist() is misleading. If I set ``probability = T'' I'd expect to get
> probabilities, numbers in the range 0 to 1. In other words, what the
> elementary stats texts call a ``relative frequency'' histogram.
> (Admittedly the documentation for hist() does indeed make it clear
> --- if one reads it --- that one is getting a p.d.f. and not relative
> frequencies.)
>
> If the histogram is to be thought of as a probability ***density***
> function, the argument should ``say so'' --- i.e. the argument should
> be ``probability.density = T''. Or, perhaps to shorten things down,
> ``pdf = T''.
Oh dear, not again. I fear the battle over this one is virtually lost,
unfortunately. The misunderstanding is so ingrained and so persistent it may
be impossible now to eradicate.
I'm even inclined to suggest a radical solution. Leave hist() (and
histogram(), where the position is even worse) strictly alone for, ahhh, less
sophisticated people to play with as they wish, and channel all serious
estimators of the pdf through density(), including what is now produced by
hist(..., prob=T). This might be accommodated by something like density(x,
type = "hist"). The object produced by density() should carry a special class
which would have a plot method. If the plot method coincided with the print
(or show) method it could even be made to behave in much the same way as
hist(), that is, as a plotting function, (but it would need to have some
facility for allowing it to be added to an existing plot - I don't see a
problem with this). None of this would be at all likely to break existing code.
I think this would make the whole subject easier to teach and encourage
students to view (true) histograms and other simple density estimators in
exactly the same light, as they should be. As it is now the two seem to be
permanently separated in most students' minds (if kernel density estimators
ever make it, that is!)
> One does, from time to time, want ``relative frequency'' histograms;
Well, this one doesn't. :-)
Bill Venables.
--
-----------------------------------------------------------------
Bill Venables, Statistician, CMIS Environmetrics Project.
Physical address: Postal address:
CSIRO Marine Laboratories, PO Box 120,
233 Middle St, Cleveland, Queensland Cleveland, Qld, 4163
AUSTRALIA AUSTRALIA
Telephone: +61 7 3826 7251 Email: Bill.Venables@cmis.csiro.au
Fax: +61 7 3826 7304
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|