On Wed, 27 Feb 2002, Lambert.Winnie wrote:
[snip]
> The values in mypdf$x are speeds. I looked for a function to
> calculate the CDF, and ended up using cumsum() as a proxy. I found
> cdf.compare, but I don't want to compare anything at this point, just
> look at it. I also want to calculate the probablility of exceeding a
> certain value in $x, say 20. In order to do this, I had to
> brute-force it using the cumsum() results and matching them with a
> value of the same index in $x. The S-PLUS probability functions that
> I found in the Guide to Statistics all require the designation of a
> theoretical distribution. Is there a function that will calculate the
> CDF of a dataset and the probability of occurrence of a value in that
> dataset without having to have a priori knowledge of a theoretical
> distribution? Or without having to manually write code to do it?
> Something that I can input a value of $x into (like '20') and have it
> automatically correspond to a CDF value?
>
> I would think that with this many data-points, I would not need to fit
> the data to a distribution since the sample is large enough to reveal
> the 'true' distribution of the data. Not being a statistician, I
> could be wrong on that point. Nonetheless, a function like I describe
> above would still be useful. S-PLUS being the wonderful software that
> it is, I assume it exists, I just can't find it. Thanks.
One quick way to just get the plot is:
> cdf.compare(x,x)
This compares the cdf to itself, essentially creating the plot twice, not
very efficient, but quick and easy.
Another aproach is to load in the "hmisc" library and use ecdf.
If you want functionality like dnorm, pnorm, qnorm, and rnorm then here
are some quick (and still very dirty, you can use them to get similar
plots to cdf.compare and ecdf, but there are slight differences) versions
for empiricle distributions, the raw data is the second argument:
demp <- function(x, data, ...){
temp <- density(data, ...)
approx(temp, xout=x)$y
}
pemp <- function(q, data, smooth=F, ...){
if (smooth){
return(approx( x=sort(data), y=seq(0,1,
length=length(data)), xout=q, ...)$y)
} else {
if (length(q)==1){
return( mean(data<q) )
} else {
return(sapply( q, function(x,d){ mean(d < x) },
d=data))
}
}
}
qemp <- function(p, data, ...){
quantile(data, p, ...)
}
remp <- function(n, data, replace=T, ...){
sample(data, n, replace=replace, ...)
}
Hope this helps,
--
Greg Snow, PhD Office: 223A TMCB
Department of Statistics Phone: (801) 378-7049
Brigham Young University Dept.: (801) 378-4505
Provo, UT 84602 email: gls@byu.edu
|