s-news
[Top] [All Lists]

Re: Real-Life

To: Jim Stapleton <stapleton@stt.msu.edu>
Subject: Re: Real-Life
From: Spencer Graves <spencer.graves@PDF.COM>
Date: Wed, 25 Jun 2003 19:48:23 -0700
Cc: s-news@lists.biostat.wustl.edu
References: <5.1.0.14.0.20030625171229.01e48868@assist.stt.msu.edu>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
I've gotten a couple of responses to my earlier reply so I will elaborate here. Consider:

          qqnorm(c(rnorm(99), 1e6), datax=T)

I just did this 10 times, and in every case, the image was a vertical line of 99 points plus one line way to the right: a very obvious outlier.

If the outlier were not obvious, it might still imact the analysis, but in most situations, I doubt if the impact would be great. I can think of one exception: If it occurred with a high leverage point in a regresion analysis. In the standard normal probabity plot of effects from a saturated 2-level fractional factorial, an outlier will create a gap in the middle or a shift to the right or the left, so the apparently insignificant effects clearly have an average different from zero.

I repeat a version the question that started this thread: Does someone have a real life example [or a sufficiently serious hypothetical] where a subtle deviation from normality might invalidate a standard analysis?

Spencer Graves
#####################################
Wouldn't the problems you mention be caught by reasonable plots like
normal probability plots of data and residuals?

Spencer Graves

Jim Stapleton wrote:
> My earlier answer wasn't "real-life", but there are certainly examples
> for which the distr. is a mixture of a normal and, with small
> probability, a distribution with  prob. mass far to the right. I
> remember a client who had some rat swimming-time data. He came to me
> because some of the rats drowned, but I noticed that a few rats had very
> large times, so that the distr. of times seemed to be a mixture of this
> kind. It turned out that some rats were able to "hang" along the side of
> the tub, and achieve very large times. I used that example for years to
> motivate nonparametric methods.
>
> One correction for my first message:  In the sentence
>
>> "Let X be a std. normal cdf with prob .999 and be 1 million with prob.
>> 0.001.    X has cdf F(x)"    replace "cdf" by "random variable."
>> .
>
>
>
> Jim Stapleton
> Professor and Graduate Director
> Dept. of Statistics and Probability
> Michigan State University
> 517-355-9678
>
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news




<Prev in Thread] Current Thread [Next in Thread>