s-news
[Top] [All Lists]

Re: confidence intervals for proportions (LONG)

To: s-news@lists.biostat.wustl.edu
Subject: Re: confidence intervals for proportions (LONG)
From: "Chris Evans" <chris1@psyctc.org>
Date: Wed, 25 Apr 2001 00:00:07 +0100
In-reply-to: <3AE2A53F.14671.11F7851B@localhost>
Organization: PSYCTC.org
A number of people have asked me to post the responses I got to 
this.  Here they are, with the permissions of the authors.  Many 
thanks again to everyone.

On 22 Apr 2001, at 9:32, Chris Evans wrote:

> Thanks to Frank Harrell, Timothy J. Wade, Terry Therneau, Brad J. 
> Bickerstaff and David Parkhurst who all sent me extremely useful 
> answers.  I'm much the wiser about exact and approximated 
> binomial confidence intervals now and have some references I may 
> find time and energy to chase.  However, the simplest answer for 
> those with the Hmisc library is its binconf function which was clear 
> in the "Categorical Data" part of the help on the library.  
> 
> Turned out the differences were small for my data, not problematical 
> for my purposes but very clear if you needed high precision. 
> However, it's lovely to know that definitively now. Interesting to see 
> that the method I implemented omitted a continuity correction in 
> some other "normal approximation" methods.  

and my original question was:


> A simple question.  I have written a simple bit of S+ to give me
> confidence intervals for proportions based on the normal approximation
> based on: Gardner, M. J., S. B. Gardner and P. D. Winter (1989).
> Confidence Interval Analysis (C.I.A.) Microcomputer Program Manual.
> London, British Medical Journal.
> 
> That refers to a better approximation for small n or min(p,1-p) being
> small and gives a reference to Armitage and Berry (?) but I won't get
> access to that for some time and I can't find anything to do this from
> the obvious places (statlib...)  It strikes me that someone somewhere
> has probably written this or could point me to some WWW source that
> will tell me what I need to do.
> 
> TIA,

From here on should cut and paste into S+ (or R?) fine:

##This arose from a request I put the s-news list.  I 
was asking for pointers to 
##the exact confidence interval for a proportion.  I 
had written the following
##crude function that uses the Gaussian approximation 
without a continuity correction
ci.prop<- function(p,n,int=.95) {
# Confidence interval for an observed proportion p 
in sample size n
# Method from Gardner, M. J. and D. G. Altman 
(1989). 
#    "Statistics with confidence.  Confidence 
intervals and statistical guidelines"
#    London, British Medical Journal., p.29
# Uses crude normal approximation method.  Must 
replace with accurate method.

if (n != as.integer(n)) stop("n passed to function 
ci.prop must be an integer")
if (n < 2) stop("n passed to function ci.prop must 
be at least 2")
if (p < 0 || p > 1) stop("p passed to function 
ci.prop must be between 0 and 1 incl.")
se.p <- sqrt(p*(1-p)/n)
int <- .5 + int/2       
qn <- qnorm(int)
half <- qn*se.p
if (p == 0) lwr <- 0
else lwr <- p - half
if (lwr < 0) lwr <- 0
if (p == 1) upr <- 1
else upr <- p + half
if (upr > 1) upr <- 1
c(lwr,upr)
}

##The first response I got made this

exact.binom.CI<- function(P, N, k)
{
#Date sent:             Fri, 20 Apr 2001 12:20:38 -0700
#To:                    "Chris Evans" <chris1@psyctc.org>
#From:                  "Timothy J. Wade" 
<tjwade@uclink4.berkeley.edu>
#Subject:               Re: [S] confidence intervals for 
proportions

#I didn't write this, I asked the same question a 
while back and was kindly
#given the following function. Unfortunately I forget 
who sent this one to
#me, but I received several different user-written 
functions.

#Exact Binomial Condfidence limits. ;
#To get 95% limits, enter (0.95, trials, successes);
#Ref: Clopper & Pearson, Biometrika (1934), v26, 
p.404 ;
if(N > k & k >= 1)return(c(pl = qbeta((1 - P)/2, k, 
N - k + 1), pu = qbeta(1 - (1 - P)/2, k + 1, N - 
k)))
if(N > 0 & N == k)return(c(pl = P^(1/n), pu = 1))
if(N > 0 & k == 0)return(c(pl = 0, pu = (1 - 
P^(1/N)))) 
}

##Then David Parkhurst pointed me straight!
#From:"David Parkhurst" 
<parkhurs@indiana.edu>
#To:"Chris Evans" <chris1@psyctc.org>
#Subject:Re: [S] confidence intervals for 
proportions
#Datesent:              Fri, 20 Apr 2001 14:38:49 -0500

#Otherreferences that may be of use are:
#Box,G. E. P., W. G. Hunter, et al. (1978). 
Statistics for Experimenters.
#NewYork, John Wiley & Sons.  (They provide a graph 
you can use.)

#Agresti,A. and B. A. Coull (1998). "Approximate is 
better than "exact"
#forinterval estimation of binomial proportions." 
American Statistician 52:119--126.

#Clopper,C. J. and E. S. Pearson (1934). "The use of 
confidence or
#fiduciallimits illustrated in the case of the 
binomial." Biometrika 26: 404-413.

#Iwould NOT recommend using the normal distribution 
for tail probabilities
#unlessboth N*pi and N*(1-pi) are bigger than 15 or 
more.  (Pi here is
#thebinomial proportion, of course, and not 3.14.....)

#DaveParkhurst


##Then Brad Bickerstaff pointed me to the binconf 
function built into the Hmisc library. 
##Frank Harrell did too!
##The version I have is:
binconf<- function(x, n, alpha = 0.05)
{
nu1 <- 2 * (n - x + 1)
nu2 <- 2 * x
ll <- if(x > 0) x/(x + qf(1 - alpha/2, nu1, nu2) * 
(n - x + 1)) else 0
nu1p <- nu2 + 2
nu2p <- nu1 - 2
pp <- if(x < n) qf(1 - alpha/2, nu1p, nu2p) else 1
ul <- ((x + 1) * pp)/(n - x + (x + 1) * pp)
zcrit <-  - qnorm(alpha/2)
z2 <- zcrit * zcrit
p <- x/n
cl <- (p + z2/2/n + c(-1, 1) * zcrit * sqrt((p * (1 
- p) + z2/4/n)/n))/(1 + z2/n)
if(x == 1)
cl[1] <-  - log(1 - alpha)/n
if(x == (n - 1))
cl[2] <- 1 + log(1 - alpha)/n
res <- rbind(c(ll, ul), cl)
dimnames(res) <- list(c("Exact", "Wilson"), 
c("Lower", "Upper"))
res
}


##Next I got this comprehensive function from Terry 
Therneau
cibinom<- function(k, n, p=.95, 
method=c('exact', 'arcsin', 'normal', 
'anscombe')) {
#From:                  Terry Therneau <therneau@mayo.edu>
#Date sent:             Fri, 20 Apr 2001 15:18:57 -0500 
(CDT)
#To:                    chris1@psyctc.org
#Subject:               Better binomial CI

#These routines have been used for several years at 
our institution:

#
#Exact binomial confidence limits
#Based on equation 10.7 of Feller volume 1
#The other approximations make use of the continuity 
correction


nn <- max(length(k), length(n), length(p))
if (nn>1) {
k <- rep(k, length=nn)
n <- rep(n, length=nn)
p <- rep(p, length=nn)
}
p2 <- (1-p)/2
method <- match.arg(method)
if (method=='exact') {
dummy1 <- ifelse(k==0, 1, k)  #avoid an error 
message of qbeta
dummy2 <- ifelse(k==n, n-1, k)
upper <- ifelse(k==n, 1, 1- qbeta(p2, n-dummy2, 
k+1))
lower <- ifelse(k==0, 0, 1- qbeta(1-p2, n+1-k, 
dummy1))
}

else if (method=='normal'){ #normal approx, poorest
pp  <- k/n
std <- sqrt(pp * (1-pp) /n)
upper <- pp + .5/n - qnorm(p2)*std      
lower <- pp + qnorm(p2)*std - .5/n
}

else if (method=='arcsin'){ # use the arcsin 
transform 
ap <- asin(sqrt(k/n))
ap1<- asin(sqrt(ifelse(k==n, k, k+.5)/n))
ap2<- asin(sqrt(ifelse(k==0, k, k-.5)/n))
std<- sqrt(.25/n)
upper <- (sin(ap1 - qnorm(p2)*std))^2
lower <- (sin(ap2 + qnorm(p2)*std))^2
}

else if (method=='anscombe'){ # anscombe's method
con <- 3/8
ap <- asin(sqrt((k+con)/(n + 2*con)))
ap1<- asin(sqrt((con+ifelse(k==n, k, k+.5))/(n + 
2*con)))
ap2<- asin(sqrt((con+ifelse(k==0, k, k-.5))/(n + 
2*con)))
std<- sqrt(.25/(n+.5))
upper <- ((sin(ap1 - qnorm(p2)*std))^2) 
lower <- ((sin(ap2 + qnorm(p2)*std))^2) 
}
else stop("Invalid method")

if (nn==1) c(lower=lower, upper=upper)
else       cbind(lower=lower, upper=upper)
}


##I got this message from Richard Howarth
#From:"Richard J. Howarth" 
<r.howarth@ucl.ac.uk>
#To:Chris Evans <chris1@psyctc.org>
#Subject:Re: [S] confidence intervals for 
proportions
#Datesent:              Mon, 23 Apr 2001 18:03:01 +0100

#Blyth(1986) showed that accurate values of the one-
sided lower and
#upperbounds on the proportion 100(n/N), where n is 
the number of occurrences in
#atotal count of N items, is given by:

#p(n)u= 100[Beta(1-a, n+1, N-n)]
#p(n)l= 100[1-Beta(1-a, N-n+1, n)]

#whereBeta(1-a, v1,v2) is the 100(1-a)th percentile 
of the inverse beta
#distributionwhith shape parameters v1 and v2, the 
qbeta() function in
#Splus.Use (a/2) in the above for two-sided bounds.

#Inthe special case where x might be expected to  be 
present, but is not
#observed,

#p(0)u= 100[1-a^(1/N)]

#SeeBlyth, CR (1986) Approximate Binomial Confidence 
Limits, JASA, 81:
#843-855.

#Regards,
#RichardHowarth


##The first bit of that seemed to square with what 
the other "exact" methods
##were offering (which makes me puzzled by the 
"Approximate" in the title
##of Blyth's article but I'll follow that up.

##The second bit turns out to be a simple definition 
of qbeta(1-a,1,N) and 
##related to a "3/n" rule of thumb I'd read 
somewhere.  

##So I knocked up:
upr.not.seen<- function(n,ci=.95) {
alpha <- 1-ci
upr.blyth.prop <- 1-alpha^(1/n)
upr.blyth.perc <- 100*upr.blyth.prop
upr.blyth.n <- floor(1/upr.blyth.prop)
upr.JV.JL.JZ.k <- upr.blyth.prop*(n+1) 
as.data.frame(cbind(upr.blyth.prop,upr.blyth.perc,up
r.blyth.n,upr.JV.JL.JZ.k))
}

upr.not.seen(40)
upr.not.seen(30)
upr.not.seen(20)
upr.not.seen(10)
print("## ")
##Uglybut confirmed the utility of the rule of thumb


##Isaid that to Richard 
##while seeking his permission to circulate his 
response and got this too:

#DearChris,

#Thanksfor your message. Jovanovic & Viana (1986), 
Jovanovic & Levy (1987)
#andJovanovic & Zalenski (1997) showed that a useful 
(and slightly
#conservative)approximations for p(0)u 90%, 95%, 99% 
and 99.9% when n=0,
#N>20are given by k/(N+1) where k = 2.25, 3, 4.5 and 
7 respectively.

##so here goes:
print("## ")
upr.not.seen(40,ci=.9)
upr.not.seen(30,ci=.9)
upr.not.seen(20,ci=.9)
upr.not.seen(10,ci=.9)

upr.not.seen(40,ci=.99)
upr.not.seen(30,ci=.99)
upr.not.seen(20,ci=.99)
upr.not.seen(10,ci=.99)

upr.not.seen(40,ci=.999)
upr.not.seen(30,ci=.999)
upr.not.seen(20,ci=.999)
upr.not.seen(10,ci=.999)

## Hm.  Doesn't suggest that 2.25 or 7 are particularly well chosen but perhaps
## I've got something wrong here.
print(" ## ")

#Jovanovic BD & Levy PS 1997, A look at the rule of three. American
#Statistician 51: 137-139.

#-- & Viana MAG 1996, Upper confidence bounds for binomial probability in
#safety evaluations. Proc 1996 ASA Ann Conf., Chicago, Biopharm.
#Section., ASA, Alexandria, Virginia.

#-- & Zalenski RJ 1997, Safety evaluation and confidence intervals when the
#number of observed events is small or zero.  Ann. Emergency Medicine, 30;
#301-306.  [that could be where you saw it!].

## Certainly sounds nearer my journal reading than many of the refs. so far
## but I think I saw it in a Martin Bland short article in the BMJ (British
## Medical Journal, but I haven't made a note of it.

## That's everything I've had and stunning it's been.  Thanks to everyone.

ci.prop(0/100,100)
cibinom(0,100)
cibinom(0,1000,method="normal")
cibinom(0,100,method="arcsin")
cibinom(0,100,method="normal")
cibinom(0,100,method="anscombe")

binconf(0,100)# from hmisc library

exact.binom.CI(.95,100,0)
#
#
#data from Gardner & Altman, results check with the 
book:
ci.prop(81/263,263)
#Terry Therneau's function showing different result
#owing to inclusion of a continuity correction:
cibinom(81,263,method="normal")





Chris Evans <chris@psyctc.org>
Consultant Psychiatrist in Psychotherapy,
Rampton Hospital; Associate R&D Director,
Tavistock & Portman NHS Trust;
Hon. SL Institute of Psychiatry
*** My views are my own and not representative 
of those institutions ***

<Prev in Thread] Current Thread [Next in Thread>