s-news
[Top] [All Lists]

Re: How to Sum the Lengths of Plotted Lines in a Time

To: "Liaw, Andy" <andy_liaw@merck.com>
Subject: Re: How to Sum the Lengths of Plotted Lines in a Time
From: Spencer Graves <spencer.graves@PDF.COM>
Date: Wed, 20 Aug 2003 09:12:57 -0700
Cc: Andrew White <andrew_white@hmsa.com>, "S-News List (E-mail)" <s-news@wubios.wustl.edu>
References: <3A822319EB35174CA3714066D590DCD50205C9F9@usrymx25.merck.com>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.0.2) Gecko/20030208 Netscape/7.02
If X[i] are independent with constant variance, then var(X[i]-X[i-1) = 2*var(X). However, for a standard autoregressive(1) process with mean 0,

          X[i] = phi*X[i-1] + a[i],

where the a's are independent with constant variance, and (-1) < phi < 1. Therefore,

          var(X) = phi^2*var(X) + var(a)

Meanwhile,

          (X[i]-X[i-1]) = (1-phi)*X[i-1] + a[i],

so      

          var(X[i]-X[i-1]) = ((1-phi)^2)*var(X) + var(a).

With the additional assumption of normality, one could work out the expectation of the absolute differences. I'd guess that it would be expressible in terms of the gamma function, perhaps gamma(0.5), and a simple expression involving root(pi).

However, before I went into this, I would want to make normal probability plots, plot the ACF, plot X[i] vs. X[i-1], etc. Then we can talk about the need for something new and different.

Best Wishes,
Spencer Graves

Liaw, Andy wrote:
If I'm not mistaken, one half the mean of the *squared* first order
differences is essentially the sample variance, so one can draw the
connection with acf and pacf based on that, I'd guess.   Not sure what it is
if you average the *absolute* first order differences.

Andy


-----Original Message-----
From: Spencer Graves [mailto:spencer.graves@PDF.COM] Sent: Tuesday, August 19, 2003 9:04 PM
To: Andrew White
Cc: S-News List (E-mail)
Subject: Re: [S] How to Sum the Lengths of Plotted Lines in a Time Series


For the total distance, if y = a vector of observations at equally spaced points in time, sum(abs(diff(y))) should give you the total length of the line; mean(abs(diff(y))) should normalize it for the number of observations.

Are you familiar with the use of autocorrelation (acf) and partial autocorrelations (pacf; function acf with type="partial") for model identification, as described, e.g., in Box, Jenkens, Reinsel (1994) Time Series Analysis, Forecasting and Control (Prentice Hall)? My preference today for basic time series analysis is to use ACF and PACF for model identification and then use state space techniques a la West and Harrison (1997) Bayesian Forecasting and Dynamic Models (Springer). There should be some relationship between your "total distance" and the ACF, but I'm not certain what.

hope this helps.  spencer graves

Andrew White wrote:

I want to develop an estimate of the "complexity" of any

time series
trend line in terms of its "cumulative trend-line distance".

I need help in how to calculate the "cumulative distance" or sum of lengths of individual lines connecting each data point to its next data point - like tracing the (jagged) time series plot

line with your
finger and measuring the total length traced.

Consider a regular time series plotted using ts.plot()

The index of complexity I am toying with would use as a

reference the
Minimum Length where each data point would have the same

value across all time periods (observation points): a straight horizontal line. The Maximal Length would then be where maximal data value variability occurs between every adjacent data point in the series (sorta like a massive earthquake). Intermediate lengths would represent intermediate forms of time series data variability.

Obviously I need to "stabilize" the data ranges being

referenced and
eliminate influences of scale. That comes later ..

But I am stuck initially with just how to use S-Plus

commands to sum
the sequential series of plotted time series plot lengths between adjacent points from the starting point to the ending point.

Note: I believe this measure is different in principle than just measuring variance. For the following reason: I ran some

time series test cases for 36 time periods: (1) same value repeated = straight horizontal line, (2) a steadily rising value = forms a straight angled line across the time series plot, (3) flat for half the points then steady decline, (4) max variation or seesaw between two extreme data values = maximum jaggy plot, and (5) a random series of values set by rnorm(). Now the variance of # 2 is greater than #5 (random) and yet #2 is far more regular and "less complex" in my view than the random changes in direction and length of #5.

Anyone have a method or can suggest some S-Plus standard

functions to
measure the cumulative line lengths - or get a better

measure of the
"complexity" of a time series plot-line?

Many thanks in advance.

Andy White
Andrew N. White, Ph.D. - Manager Research Unit
Financial Reporting & Medical Economics Dept.
Hawaii Medical Service Association
- Blue Cross Blue Shield of Hawaii
An Independent Licenseee of the Blue Cross and Blue Shield

Association
- 818 Keeaumoku Street, Honolulu, HI 96814 Ph. 808-948-5344

- Email:
andrew_white@hmsa.com



--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail to

s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news


--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with the BODY of the message: unsubscribe s-news



------------------------------------------------------------------------------
Notice:  This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message.  If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.
------------------------------------------------------------------------------
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



<Prev in Thread] Current Thread [Next in Thread>