s-news
[Top] [All Lists]

[S] managing a lot of time series

To: "S-news (E-mail)" <S-news@wubios.wustl.edu>
Subject: [S] managing a lot of time series
From: Vadim Ogranovich <vograno@arbitrade.com>
Date: Mon, 28 Feb 2000 13:56:37 -0600
Sender: owner-s-news@wubios.wustl.edu
Dear All, My question is how to better manage a big (4000) number of time
series. I have ~4000 of products, and for each product its historical daily
market data. Altogether it's 10M observations. To be concrete let's say that
for each product I have its daily prices and sales volumes. In the name of
"vectorization" I create four vectors of length 10M: name, date, price, and
volume where each product data is stacked one after another. Now I can
efficiently compute "derived" values like dollar.volume = volume * price,
which makes me feel good.
What turns out to be a problem is computation of  "time series" statistics
like moving averages, etc. For example to compute a vector of moving
averages of prices I have, provided that the data is sorted by "name" and
"date", to do the following:
x _ tapply(price, name, moving.average)
x _ unlist(x)

This is the call to tapply() that takes quite a bit of time. So my narrow
question is whether there is a more efficient substitution to tapply(),
which takes advantage of the fact that the data is already sorted by "name"?
My more general question is how to better structure the data to make both
vector and "time series" computations efficient? 

Thank you,
Vadim

P.S. I use V5.1 on UNIX
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] managing a lot of time series, Vadim Ogranovich <=