Dear Murray,
I have used the Cox model to good effect for data such as yours, which
allowed me to use a single model. I
randomly added an amount uniformly between 0 and $1 to the 0s just
to not have extreme ties. In the data I analyzed, the proportional hazards
assumption was well met except for one risk factor for which I
stratified. See the following for some talks on the subject, along
with S-Plus code. You'll also see some material on the use of AVAS
for such models. -Frank Harrell
http://hesweb1.med.virginia.edu/biostat/teaching/hpstat95.pdf
http://hesweb1.med.virginia.edu/biostat/presentations/feh/ichpr99/slide.pdf
http://hesweb1.med.virginia.edu/biostat/presentations/dia.econ97.pdf
"Dr. Murray Finkelstein" wrote:
> Greetings, everyone.
>
> I need some assistance with the selection of an analysis method for a health
> economics data set.
>
> I have a file which contains a random sample from the population. The
> dependent
> variables are the costs of physician services over a 1 year period for various
> disease diagnoses. Independent variables include age, sex, and use of tobacco
> and
> alcohol. I have coded the latter 2 as dichotomous. My difficulty is that, when
> looking at heart disease as a diagnosis, the majority of the population, even
> at
> the oldest ages, has a physician cost of $0. Among those for whom costs were
> incurred, the costs are, more or less, lognormally distributed. I wish to
> compute
> the average incremental cost in the population attributable to smoking and
> saved
> by alcohol consumption. (The good news is that for, heart disease and stroke,
> the
> odds ratio for having a physician's service is 1.5 for those consuming less
> than
> 1 drink per day compared with 1 or more drinks per day.)
>
> The data thus consists of 2 subpopulations; the majority with a cost of $0 and
> the remainder for whom the cost is lognormally distributed. I've considered a
> variety of methods including linear regression (because the residuals will
> not be
> normally distributed inference will not be valid, but I assume that the means
> will be correctly computed), tobit regression (assuming a censor point of
> log($1), but again the resuduals will not be appropriately distributed, and an
> accelerated failure time model with a lognormal link.
>
> I'd be grateful for advice on how to analyze this dataset.
>
> Best wishes,
>
> Murray Finkelstein
> murray.finkelstein@utoronto.ca
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message: unsubscribe s-news
--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|