| To: | s-news@lists.biostat.wustl.edu |
|---|---|
| Subject: | Sample Size and Logit Models |
| From: | "Walter R. Paczkowski" <dataanalytics@earthlink.net> |
| Date: | Sun, 05 Aug 2007 21:15:13 -0400 |
| Domainkey-signature: | a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=earthlink.net; b=bvmq5CQAt6wtqa6BqptLV1+IBjRTOT/BWRP4N9RY7uFo0aqUbxgCB1NIKKxJAY1A; h=Received:X-Mailer:Date:To:From:Subject:Mime-Version:Content-Type:Message-ID:X-ELNK-Trace:X-Originating-IP; |
|
Hi, I'm hoping someone has some insight about sample size and logit estimation that could help me. I inherited a logit model from a client in the direct marketing area. The previous consultant used approximately 143,800 observations in the training data set, of which only 50 (0.03%) were the target ( = 1) value for the dependent variable. The literature I could find gives very little guidance on sample sizes (Hosmer & Lemeshow have some material, but they basically say that little has been done). Does anyone know of some literature or even rules-of-thumb about sample sizes and/or ratio of target to non-target values of the dependent variable? The use of 143,800 observations is excessive. Does this do anything to the significance of the estimates (e.g., am I always guaranteed very small p-values?)? Is oversampling of the target value the key and if so, how do I calculate weights for the estimations? Any guidance or suggestions in this area are definitely welcome. Walt Paczkowski Walter R. Paczkowski, Ph.D. Data Analytics Corp. 44 Hamilton Lane Plainsboro, NJ 08536 (V) 609-936-8999 (F) 609-936-3733 |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: count occurrences, Ita Cirovic |
|---|---|
| Next by Date: | Candlestick plot, Olanrewaju Sanni |
| Previous by Thread: | count occurrences, Ita Cirovic |
| Next by Thread: | Re: Sample Size and Logit Models, Neal Alexander |
| Indexes: | [Date] [Thread] [Top] [All Lists] |