s-news
[Top] [All Lists]

Re: general statistical issue: weighting observations in logit-regressio

To: fharrell@virginia.edu, dcts@dcts.de
Subject: Re: general statistical issue: weighting observations in logit-regression
From: Hongjiew@aol.com
Date: Thu, 31 Oct 2002 11:47:47 -0500
Cc: s-news@lists.biostat.wustl.edu
I agree that if any weighted sampling is used, the parameters need to be 
adjusted to make valid inference on the original population. But there might be 
practical reasons to use response based sampling. In the area I am in (database 
marketing), we frequently encounter situation where the occurrences of events 
are very "rare" or E(Y=1)<=0.01 for example. There will be computational 
problems associated with the prediction, see "Predictive performance of the 
binary logit model in unbalanced samples" by J. S Cramer in "The Statistician 
(1999). One of the nice properties of logistic regression (not sure it carries 
over to general logit models) is that if oversample is done based on the 
response variable, the coefficients estimates of the predictors are not 
changed. Only the intercept needs to be adjusted. There are researches on the 
area of comparing statistical efficiency between multiple sampling schemes in 
logistic regression setting. For example, if N (overall population) =100K where 
10% of them Y=1.One may take a response based sample (so #y=1 is close to #y=0) 
and one may make a true random sample. The first sample usually can be much 
smaller than the second one to generate comparable estimates (see " The effect 
of sample size and proportion of buyers in the sample on the performance of 
list segmentation equations generated by regression analysis" by Berger and 
Magliozzi in Journal of direct marketing. 

 



In a message dated 10/30/2002 8:27:15 PM Eastern Standard Time, 
fharrell@virginia.edu writes:

> 
> 
> On Wed, 30 Oct 2002 23:22:05 +0100
> DCTS <dcts@dcts.de> wrote:
> 
> > 
> > I am confronted with a Logit-regression, in which y=0 is much less frequent
> > than y=1. It is argued that the less frequent observations with y=0 should
> > receive higher weights in the regression, such that the proportion is
> > balanced between Ys being 0 and 1.
> 
> Who argues that?  No, you don't want to distort the data.  If your sample is 
> a random sample from the population to which you want to infer, then rely on 
> maximum likelihood to give good parameter estimates.  You weight observations 
> if you oversampled a segment of the population and you want to represent the 
> original population [even then don't always weight as this reduces efficiency 
> when compared with covariate adjustment for oversampling factors].
> 
> Frank Harrell
> 
> > 
> > To my knowledge there are usually two motivations to use weights others than
> > unity:
> > - prior knowledge of the probability of y=0
> > - optimisation of a cost function (in the example above y=0 is much more
> > expensive and should be predicted with higher attention)
> > 
> > In my limited econometric library and in the internet I wasn't able to find
> > a discussion on the issue of weighting observations. If someone has a good
> > hint to a source or could sketch the ideas of consequences, pros and cons I
> > would be very pleased.
> > 
> > 
> > Thank you,
> > Thomas
> > 
> > --------------------------------------------------------------------
> > This message was distributed by s-news@lists.biostat.wustl.edu.  To
> > unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> > the BODY of the message:  unsubscribe s-news
> 
> 
> -- 
> Frank E Harrell Jr              Prof. of Biostatistics & Statistics
> Div. of Biostatistics & Epidem. Dept. of Health Evaluation Sciences
> U. Virginia School of Medicine  http://hesweb1.med.virginia.edu/biostat
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to 
> s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
<Prev in Thread] Current Thread [Next in Thread>