s-news
[Top] [All Lists]

PDFs and CDFs

To: <s-news@lists.biostat.wustl.edu>
Subject: PDFs and CDFs
From: "Lambert.Winnie" <lambert.winifred@ensco.com>
Date: Wed, 27 Feb 2002 15:05:19 -0500
Thread-index: AcG/yhUZwmhyURB1TCqVzcYJ3yzOqQ==
Thread-topic: PDFs and CDFs
All,

I am using S-PLUS 6 R2 on Windows XP

I have a probability density function (mypdf) from a large dataset ( > 10000 
observations) calculated using the density() function.  It looks like this:

> mypdf
$x:
 [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39

$y:
 [1] 0.000746826 0.007841673 0.006721434 0.012696042 0.014936520 0.028752800 
0.032113519 0.051530994 0.061613142 0.075802840 0.081404030
[12] 0.093726665 0.079910383 0.070575058 0.057505600 0.049290515 0.029126214 
0.029126214 0.022031367 0.022031367 0.026138909 0.020911127
[23] 0.019790888 0.015683345 0.011575803 0.011575803 0.012322629 0.009708738 
0.009708738 0.008215086 0.008215086 0.007094847 0.003360717
[34] 0.001493652 0.002987304 0.001867065 0.001120239 0.000373413 0.000000000 
0.000373413

The values in mypdf$x are speeds.  I looked for a function to calculate the 
CDF, and ended up using cumsum() as a proxy.  I found cdf.compare, but I don't 
want to compare anything at this point, just look at it.  I also want to 
calculate the probablility of exceeding a certain value in $x, say 20.  In 
order to do this, I had to brute-force it using the cumsum() results and 
matching them with a value of the same index in $x.  The S-PLUS probability 
functions that I found in the Guide to Statistics all require the designation 
of a theoretical distribution.  Is there a function that will calculate the CDF 
of a dataset and the probability of occurrence of a value in that dataset 
without having to have a priori knowledge of a theoretical distribution?  Or 
without having to manually write code to do it?  Something that I can input a 
value of $x into (like '20') and have it automatically correspond to a CDF 
value?

I would think that with this many data-points, I would not need to fit the data 
to a distribution since the sample is large enough to reveal the 'true' 
distribution of the data.  Not being a statistician, I could be wrong on that 
point.  Nonetheless, a function like I describe above would still be useful.  
S-PLUS being the wonderful software that it is, I assume it exists, I just 
can't find it.  Thanks.


***********************************************************************
Winifred C. Lambert                Senior Scientist/Meteorologist
ENSCO, Inc.
Aerospace Sciences and Engineering Division
1980 N. Atlantic Ave, Suite 230
Cocoa Beach, FL  32931
VOICE: 321.853.8130  FAX: 321.853.8415
lambert.winifred@ensco.com

AMU Quarterly Reports are available online:
http://science.ksc.nasa.gov/amu/home.html
***********************************************************************

<Prev in Thread] Current Thread [Next in Thread>