s-news
[Top] [All Lists]

Re: svd for large array

To: jsv@stat.ohio-state.edu
Subject: Re: svd for large array
From: Prof Brian Ripley <ripley@stats.ox.ac.uk>
Date: Sun, 24 Jul 2005 08:07:53 +0100 (BST)
Cc: s-news@lists.biostat.wustl.edu
In-reply-to: <1442.24.210.95.83.1122158816.squirrel@www.stat.ohio-state.edu>
References: <1781.24.210.95.83.1122043216.squirrel@www.stat.ohio-state.edu> <Pine.LNX.4.61.0507231731070.7549@gannet.stats> <1442.24.210.95.83.1122158816.squirrel@www.stat.ohio-state.edu>
That's 100 x 30000, and you mentioned 8000 x 100. If you want N >> M (and that is not at all clear from your original posting), you need to transpose the matrix or modify my algorithm accordingly.

On Sat, 23 Jul 2005 jsv@stat.ohio-state.edu wrote:

Dear Prof. Ripley:

 Here is what I get

A <- matrix(rnorm(3000000),100,30000)
S <- svd(A,nu=0)
Problem in 1:ll: Cannot create - data would have length greater than 536
870911 (.Machine$integer.max/sizeof(integer))
Use traceback() to see the call stack

traceback()
10: eval(action, sys.parent())
9: doErrorAction("Problem in 1:ll: Cannot create - data would have lengt
h greater than 536870911 (.Machine$integer.max/sizeof(integer))",
8: 1:ll
7: array(0., c(p, p))
6: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
5: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
4: svd.default(A, nu = 0)
3: svd(A, nu = 0)
2: eval(expression(S <- svd(A, nu = 0)))
1:
Message: Problem in 1:ll: Cannot create - data would have length greater
than 536870911 (.Machine$integer.max/sizeof(integer))

I interpreted line 7 from the traceback as svd's attempt to
allocate a 30K x 30K intermediate result matrix.  This is corroborated
by the fact that the same message appears in the allocation

A <- matrix(0,30000,30000)
Problem in 1:ll: Cannot create - data would have length greater than 536
870911 (.Machine$integer.max/sizeof(integer))
Use traceback() to see the call stack

I think that construction of a full symmetric matrix may be part of a
computationally fast algorithm, but I would be willing to trade some
extra time to do the computation with less space.

I am using "svd" that is included in the initial loading of Splus.

Thanks for any further pointers you might provide.

Joe Verducci

______________________________


What do you want from the svd?  In a M x N system the left singular
vectors form a M x M system, taking up much more space that the 2Gb
Windows will allow S-PLUS.  By default svd() avoids this by giving you
only N (for M > N) components of the singular vectors.

You will probably do better to do things in two steps, for example

A <- matrix(rnorm(30), 10, 3)
S <- svd(A, nu=0)
S$u <- A %*% S$v / rep(S$d, each=nrow(A))

Now, if finding S is too slow (and it should not be with LAPACK-based
code, so you might want to try the Matrix library) you can get the
components (less accurately) from eigen(crossprod(A)),

E <- eigen(crossprod(A))
S <- list(d=sqrt(E$values), v=E$vectors)
S$u <- A %*% S$v / rep(S$d, each=nrow(A))

My 1GB laptop swapped rather a lot, but was able to do that in less than a
minute CPU.

On Fri, 22 Jul 2005 jsv@stat.ohio-state.edu wrote:

 I'm using Splus 7.0 on a Windows XP with 4Gb memory.  I would like to
perform an svd on a rectangular array where one dimension is very large
(around 30K) and the other is relatively small (around 100).
Unfortunately, svd is not one of the functions listed to take advantage
of the out-of-memory mode in version 7.0.  Right now my feasible limits
in-memory are (8K x 100).  It should be possible to use program an SVD
for rectangular arrays using sequential optimization, but before going
in that route, I am hoping someone here knows of an available code.

--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news





--
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

<Prev in Thread] Current Thread [Next in Thread>