|
Walt; you might be interested in the following reference:
Hsieh FY, Bloch DA, Larsen MD. A simple method of sample size calculation for linear and logistic regression. Statistics in Medicine. 1998;17:1623-34.
However, now you have the data, I think the information about the accuracy which the sample size can yield is given by the confidence interval of the analysis (rather than doing post hoc sample size calculations), see eg:
Smith AH, Bates MN. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology. 1992;3(5):449-52.
Neal
Hi, I'm hoping someone has some insight about sample size and logit estimation that could help me. I inherited a logit model from a client in the direct marketing area. The previous consultant used approximately 143,800 observations in the training data set, of which only 50 (0.03%) were the target ( = 1) value for the dependent variable. The literature I could find gives very little guidance on sample sizes (Hosmer & Lemeshow have some material, but they basically say that little has been done). Does anyone know of some literature or even rules-of-thumb about sample sizes and/or ratio of target to non-target values of the dependent variable? The use of 143,800 observations is excessive. Does this do anything to the significance of the estimates (e.g., am I always guaranteed very small p-values?)? Is oversampling of the target value the key and if so, how do I calculate weights for the estimation
s? Any guidance or suggestions in this area are definitely welcome. Walt Paczkowski
_________________________________
Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733
|