If X[i] are independent with constant variance, then var(X[i]-X[i-1) =
2*var(X). However, for a standard autoregressive(1) process with mean 0,
X[i] = phi*X[i-1] + a[i],
where the a's are independent with constant variance, and (-1) < phi <
1. Therefore,
var(X) = phi^2*var(X) + var(a)
Meanwhile,
(X[i]-X[i-1]) = (1-phi)*X[i-1] + a[i],
so
var(X[i]-X[i-1]) = ((1-phi)^2)*var(X) + var(a).
With the additional assumption of normality, one could work out the
expectation of the absolute differences. I'd guess that it would be
expressible in terms of the gamma function, perhaps gamma(0.5), and a
simple expression involving root(pi).
However, before I went into this, I would want to make normal
probability plots, plot the ACF, plot X[i] vs. X[i-1], etc. Then we can
talk about the need for something new and different.
Best Wishes,
Spencer Graves
Liaw, Andy wrote:
If I'm not mistaken, one half the mean of the *squared* first order
differences is essentially the sample variance, so one can draw the
connection with acf and pacf based on that, I'd guess. Not sure what it is
if you average the *absolute* first order differences.
Andy
-----Original Message-----
From: Spencer Graves [mailto:spencer.graves@PDF.COM]
Sent: Tuesday, August 19, 2003 9:04 PM
To: Andrew White
Cc: S-News List (E-mail)
Subject: Re: [S] How to Sum the Lengths of Plotted Lines in a
Time Series
For the total distance, if y = a vector of
observations at equally
spaced points in time, sum(abs(diff(y))) should give you the total
length of the line; mean(abs(diff(y))) should normalize it for the
number of observations.
Are you familiar with the use of autocorrelation
(acf) and partial
autocorrelations (pacf; function acf with type="partial") for model
identification, as described, e.g., in Box, Jenkens, Reinsel
(1994) Time
Series Analysis, Forecasting and Control (Prentice Hall)? My
preference
today for basic time series analysis is to use ACF and PACF for model
identification and then use state space techniques a la West and
Harrison (1997) Bayesian Forecasting and Dynamic Models (Springer).
There should be some relationship between your "total
distance" and the
ACF, but I'm not certain what.
hope this helps. spencer graves
Andrew White wrote:
I want to develop an estimate of the "complexity" of any
time series
trend line in terms of its "cumulative trend-line distance".
I need help in how to calculate the "cumulative distance" or sum of
lengths of individual lines connecting each data point to its next
data point - like tracing the (jagged) time series plot
line with your
finger and measuring the total length traced.
Consider a regular time series plotted using ts.plot()
The index of complexity I am toying with would use as a
reference the
Minimum Length where each data point would have the same
value across all time periods (observation points): a
straight horizontal line. The Maximal Length would then be
where maximal data value variability occurs between every
adjacent data point in the series (sorta like a massive
earthquake). Intermediate lengths would represent
intermediate forms of time series data variability.
Obviously I need to "stabilize" the data ranges being
referenced and
eliminate influences of scale. That comes later ..
But I am stuck initially with just how to use S-Plus
commands to sum
the sequential series of plotted time series plot lengths between
adjacent points from the starting point to the ending point.
Note: I believe this measure is different in principle than just
measuring variance. For the following reason: I ran some
time series test cases for 36 time periods: (1) same value
repeated = straight horizontal line, (2) a steadily rising
value = forms a straight angled line across the time series
plot, (3) flat for half the points then steady decline, (4)
max variation or seesaw between two extreme data values =
maximum jaggy plot, and (5) a random series of values set by
rnorm(). Now the variance of # 2 is greater than #5 (random)
and yet #2 is far more regular and "less complex" in my view
than the random changes in direction and length of #5.
Anyone have a method or can suggest some S-Plus standard
functions to
measure the cumulative line lengths - or get a better
measure of the
"complexity" of a time series plot-line?
Many thanks in advance.
Andy White
Andrew N. White, Ph.D. - Manager Research Unit
Financial Reporting & Medical Economics Dept.
Hawaii Medical Service Association
- Blue Cross Blue Shield of Hawaii
An Independent Licenseee of the Blue Cross and Blue Shield
Association
- 818 Keeaumoku Street, Honolulu, HI 96814 Ph. 808-948-5344
- Email:
andrew_white@hmsa.com
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to
s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
--------------------------------------------------------------------
This message was distributed by
s-news@lists.biostat.wustl.edu. To unsubscribe send e-mail
to s-news-request@lists.biostat.wustl.edu with the BODY of
the message: unsubscribe s-news
------------------------------------------------------------------------------
Notice: This e-mail message, together with any attachments, contains
information of Merck & Co., Inc. (Whitehouse Station, New Jersey, USA), and/or
its affiliates (which may be known outside the United States as Merck Frosst,
Merck Sharp & Dohme or MSD) that may be confidential, proprietary copyrighted
and/or legally privileged, and is intended solely for the use of the
individual or entity named on this message. If you are not the intended
recipient, and have received this message in error, please immediately return
this by e-mail and then delete it.
------------------------------------------------------------------------------
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|