s-news
[Top] [All Lists]

Re: [S] Re: Selection of analysis method

To: murray.finkelstein@utoronto.ca
Subject: Re: [S] Re: Selection of analysis method
From: Frank E Harrell Jr <fharrell@virginia.edu>
Date: Sun, 26 Dec 1999 20:54:01 -0500
Cc: s-news@wubios.wustl.edu
Organization: University of Virginia
References: <Pine.OSF.4.05.9912231538370.25490-100000@yule.ucdavis.edu> <38666BD0.BB6D7193@utoronto.ca>
Sender: owner-s-news@wubios.wustl.edu
Dear Murray,

I have used the Cox model to good effect for data such as yours, which
allowed me to use a single model.  I
randomly added an amount uniformly between 0 and $1 to the 0s just
to not have extreme ties.  In the data I analyzed, the proportional hazards
assumption was well met except for one risk factor for which I
stratified.  See the following for some talks on the subject, along
with S-Plus code.  You'll also see some material on the use of AVAS
for such models.  -Frank Harrell

http://hesweb1.med.virginia.edu/biostat/teaching/hpstat95.pdf
http://hesweb1.med.virginia.edu/biostat/presentations/feh/ichpr99/slide.pdf
http://hesweb1.med.virginia.edu/biostat/presentations/dia.econ97.pdf


"Dr. Murray Finkelstein" wrote:

> Greetings, everyone.
>
> I need some assistance with the selection of an analysis method for a health
> economics data set.
>
> I have a file which contains a random sample from the population. The 
> dependent
> variables are the costs of physician services over a 1 year period for various
> disease diagnoses. Independent variables include age, sex, and use of tobacco 
> and
> alcohol. I have coded the latter 2 as dichotomous. My difficulty is that, when
> looking at heart disease as a diagnosis, the majority of the population, even 
> at
> the oldest ages, has a physician cost of $0. Among those for whom costs were
> incurred, the costs are, more or less, lognormally distributed. I wish to 
> compute
> the average incremental cost in the population attributable to smoking and 
> saved
> by alcohol consumption. (The good news is that for, heart disease and stroke, 
> the
> odds ratio for having a physician's service is 1.5 for those consuming less 
> than
> 1 drink per day compared with 1 or more drinks per day.)
>
> The data thus consists of 2 subpopulations; the majority with a cost of $0 and
> the remainder for whom the cost is lognormally distributed. I've considered a
> variety of methods including linear regression (because the residuals will 
> not be
> normally distributed inference will not be valid, but I assume that the means
> will be correctly computed), tobit regression (assuming a censor point of
> log($1), but again the resuduals will not be appropriately distributed, and an
> accelerated failure time model with a lognormal link.
>
> I'd be grateful for advice on how to analyze this dataset.
>
> Best wishes,
>
> Murray Finkelstein
> murray.finkelstein@utoronto.ca
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message:  unsubscribe s-news

--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat


-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>