s-news
[Top] [All Lists]

Sample Size and Logit Models

To: s-news@lists.biostat.wustl.edu
Subject: Sample Size and Logit Models
From: "Walter R. Paczkowski" <dataanalytics@earthlink.net>
Date: Sun, 05 Aug 2007 21:15:13 -0400
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=dk20050327; d=earthlink.net; b=bvmq5CQAt6wtqa6BqptLV1+IBjRTOT/BWRP4N9RY7uFo0aqUbxgCB1NIKKxJAY1A; h=Received:X-Mailer:Date:To:From:Subject:Mime-Version:Content-Type:Message-ID:X-ELNK-Trace:X-Originating-IP;
Hi,

I'm hoping someone has some insight about sample size and logit estimation that could help me.  I inherited a logit model from a client in the direct marketing area.  The previous consultant used approximately 143,800 observations in the training data set, of which only 50 (0.03%) were the target ( = 1) value for the dependent variable.  The literature I could find gives very little guidance on sample sizes (Hosmer & Lemeshow have some material, but they basically say that little has been done).  Does anyone know of some literature or even rules-of-thumb about sample sizes and/or ratio of target to non-target values of the dependent variable?  The use of 143,800 observations is excessive.  Does this do anything to the significance of the estimates (e.g., am I always guaranteed very small p-values?)?  Is oversampling of the target value the key and if so, how do I calculate weights for the estimations? 

Any guidance or suggestions in this area are definitely welcome.

Walt Paczkowski

_________________________________

Walter R. Paczkowski, Ph.D.
Data Analytics Corp.
44 Hamilton Lane
Plainsboro, NJ  08536
(V) 609-936-8999
(F) 609-936-3733

<Prev in Thread] Current Thread [Next in Thread>