Frank E Harrell Jr wrote:
>
> The Hosmer-Lemeshow test has limited power and is dependent on
> choice of cutpoints in predicted probabilities. A newer test by
> Hosmer et al that is better in most ways may be obtained using
> the Design library's lrm and resid.lrm functions. -Frank Harrell
Pedram, for the article see an earlier post also by Frank Harrell
http://www5.biostat.wustl.edu/s-news/s-news-archive/199904/msg00150.html
Small changes in the cutpoint choices have been shown to
lead to significantly different results. In case you still need
the HL tests (e.g. for comparisons) and you're willing to
accept code of an Splus newbie you can find below my attempt at it.
Code, meant for self consumption, was written in the last 2 days and
although not extensively tested i was able to verify that results on a
large dataset were identical to an independent established method.
Improvements, bugs etc are appreciated.
--ameen.
--
Dr. A. Abu-Hanna Meibergdreef 15
Medical Informatics 1105 AZ Amsterdam
AMC - U of Amsterdam e-mail: a.abu-hanna@amc.uva.nl
tel: +31-20-566 5959 fax: +31-20-6919840
#-----------------------------------------------------------------------
# Code could be more compact and neater as data conversions have been
used
# too opportunistically here. Improvements are appreciated
# (a.abu-hanna@amc.uva.nl). You may want to parametrize the function to
work
# with N bins instead of 10 or to add warnings when the number of
expected
# events is too small (e.g. < 5).
#
# Example of use:
# logit <- -7.7631 + 0.0737*SOCRE + 0.9971*(log(SCORE+1))
# prob <- exp(logit)/(1+exp(logit))
# result <- hos.lem.test(prob, event)
# result[[1]] # summary table of the H test (based on fixed probs.)
# result[[2]] # summary table of the C-hat test (based on percentiles)
# result[[3]] # Pearson Chi-square statistics for H
# result[[4]] # Pearson Chi-square statistics for C-hat
# result[[5]] # sign. level for H statistics
# result[[6]] # sign. level for C-hat statistics
#-----------------------------------------------------------------------
hos.lem.test <- function(p, y, df=8)
{
# p is vector of estimated probabilities.
# y is an outcome binary vector of 0 (no event) and 1 (event).
# df is degrees of freedom for the Chi-square statistics.
# You may want to adjust the degrees of freedom based on the nature of
the
# dataset. With 10 bins in total, it is common to use 8 dfs for the
training
# (that is developmental) set, and 9 and 10 dfs with a validation test
# (e.g. depending on whether it is an internal or external validation).
#-----------------------------------------------------------------------
# collapse p into 10 groups based on fixed values of estimated
probabilities
# (for use with H test).
p.fixed <- cut(p, breaks = seq(0,1,.1), inc=T);
# collapse p into 10 groups based on percentiles (for use with C test).
# Note that observations with the same estimated probability will fall
in
# the same bin possibly resulting in some groups being (much) larger
than
# others.
q <- quantile(p, probs=seq(0,1,.1));
q[1] <- 0; q[11] <- 1;
p.quant <- cut(p, breaks = q, inc=T);
# Construct 2x10 H and C tables based on p.fixed and p.quant.
H.table <- table(y, p.fixed);
C.table <- table(y, p.quant);
# Obtain mean estimated probabilities of event (=1) in each bin.
pmean.h <- tapply(p, p.fixed, mean);
pmean.c <- tapply(p, p.quant, mean);
# Obtain total number of observations in each bin (=0 + 1).
bin.n.h <- H.table[1,] + H.table[2,];
bin.n.c <- C.table[1,] + C.table[2,];
# Obtain vector of Pearson Chi-square values. Each element in
# the vector represents the Chi-square value for the corresponding bin.
H.bin <- ((H.table[2,] - (bin.n.h *
pmean.h))^2)/((bin.n.h*pmean.h)*(1-pmean.h));
C.bin <- ((C.table[2,] - (bin.n.c *
pmean.c))^2)/((bin.n.c*pmean.c)*(1-pmean.c));
# Obtain Pearson Chi-square statistics.
H <- sum(as.vector(H.bin));
C <- sum(as.vector(C.bin));
# Construct summary tables. Each table includes a raw for the following
# bin information: number of 0 and 1 observations, mean estimated
# probability, expected number of events, mean estimated probability,
# expected number of events, total number of observations, H or C
# Pearson Chi-square.
H.summary <- as.data.frame(rbind(
H.table, pmean.h, pmean.h*bin.n.h, bin.n.h, H.bin),
row.names = c("0", "1", "mean prob", "exp.#events", "total #obs", "H
val"));
C.summary <- as.data.frame(rbind(
C.table, pmean.c, pmean.c*bin.n.c, bin.n.c, C.bin),
row.names = c("0", "1", "mean prob", "exp.#events", "total #obs", "C
val"));
# Return summary tables, H and C statistics and their corresponding
sig. levels
return(list(H.summary, C.summary, H, C, 1 - pchisq(H, df), 1-pchisq(C,
df)));
}
|