s-news
[Top] [All Lists]

[S] reasonable p-values for Fisher exact's test - WAS strange ...

To: Snews <s-news@wubios.wustl.edu>
Subject: [S] reasonable p-values for Fisher exact's test - WAS strange ...
From: "Charles C. Berry" <cberry@tajo.ucsd.edu>
Date: Tue, 24 Mar 1998 16:38:18 -0800
References: <Pine.SGI.3.95.980324144546.24547A-100000@orca.akctr.noaa.gov>
Sender: owner-s-news@wubios.wustl.edu
Before this thread enters an infinite loop, a few observations:

First, class(fisher.test(etc) ) == "htest"

So, print.htest() will format the results of fisher.test(). This is done
as follows

  cat("p-value =", format(round(x$p.value, 4)), "\n")

(on Version 3.4 Release 1 for Sun SPARC, SunOS 4.1.3_U1 : 1996)

So the reports that fisher.test() *seemed* to work OK only imply that
the first 5 digits were OK.

Also, note that fisher.test() uses an algorithm which allows R x C
tables. This isn't required in simple 2 x 2 tables (and it wouldn't be
too hard to put in a switch for such tables), but this is what gets
used. 

Getting to the point:

This algorithm usually yields answers that differ numerically from the
exact hypergeometric probability, viz  the result of:

> fisher.test(matrix(c(0,2,2,2),nc=2))$p
[1] 0.4666666

 differs from 

> dhyper(0:2,2,4,2)
[1] 0.40000000 0.53333333 0.06666667

by an amount

> fisher.test(matrix(c(0,2,2,2),nc=2))$p-sum(dhyper(c(0,2),2,4,2))
[1] -2.78155e-08
> 

And this isn't an isolated case. The following summaries are of numbers
that all equal zero under exact (and obvious) arithmetic:

> summary(sapply(1:20,function(x) fisher.test(matrix(c(1,1,x,x),nc=2))$p-1.0))
       Min.    1st Qu.     Median       Mean   3rd Qu.      Max. 
 -1.407e-05 -1.997e-06 -8.941e-08 -3.189e-07 1.192e-06 1.562e-05
> summary(sapply(1:20,function(x) fisher.test(matrix(c(2,2,x,x),nc=2))$p-1.0))
       Min.    1st Qu.    Median       Mean   3rd Qu.      Max. 
 -2.325e-05 -2.295e-06 2.384e-07 -7.927e-07 2.712e-06 7.868e-06

Only 1 of 40 , c(1,1,5,5) ,  gives exactly 0.0 as the result.

So, fisher.test() apparently uses an approximation which gives a correct
answer for the first 5 or 6 significant digits most of the time.

Even though the table

        matrix(1,nr=2,nc=2)

would obviously lead to a p-value of exactly 1.0, it seems of little
practical import that fisher.test() reports it as 

> print(fisher.test(matrix(1,nr=2,nc=2))$p,digits=10)
[1] 0.9999998808
> 

If this is a problem, then dhyper() can be used in 2 x 2 tables. It
seems to generate results that are close to machine accuracy.

-- 

Charles C. Berry                        (619) 534-2098 
                                         Dept of Family/Preventive
Medicine
E mailto:cberry@tajo.ucsd.edu            UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0622
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>