s-news
[Top] [All Lists]

Re: svd for large array

To: "Prof Brian Ripley" <ripley@stats.ox.ac.uk>
Subject: Re: svd for large array
From: jsv@stat.ohio-state.edu
Date: Sat, 23 Jul 2005 18:46:56 -0400 (EDT)
Cc: jsv@stat.ohio-state.edu, s-news@lists.biostat.wustl.edu
Importance: Normal
In-reply-to: <Pine.LNX.4.61.0507231731070.7549@gannet.stats>
References: <1781.24.210.95.83.1122043216.squirrel@www.stat.ohio-state.edu> <Pine.LNX.4.61.0507231731070.7549@gannet.stats>
User-agent: SquirrelMail/1.4.4
Dear Prof. Ripley:

  Here is what I get

> A <- matrix(rnorm(3000000),100,30000)
> S <- svd(A,nu=0)
Problem in 1:ll: Cannot create - data would have length greater than 536
870911 (.Machine$integer.max/sizeof(integer))
Use traceback() to see the call stack

> traceback()
10: eval(action, sys.parent())
9: doErrorAction("Problem in 1:ll: Cannot create - data would have lengt
h greater than 536870911 (.Machine$integer.max/sizeof(integer))",
8: 1:ll
7: array(0., c(p, p))
6: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
5: .Fortran(if(!cmplx) "dsvdcs" else "zsvdcs",
4: svd.default(A, nu = 0)
3: svd(A, nu = 0)
2: eval(expression(S <- svd(A, nu = 0)))
1:
Message: Problem in 1:ll: Cannot create - data would have length greater
 than 536870911 (.Machine$integer.max/sizeof(integer))

I interpreted line 7 from the traceback as svd's attempt to
allocate a 30K x 30K intermediate result matrix.  This is corroborated
by the fact that the same message appears in the allocation

> A <- matrix(0,30000,30000)
Problem in 1:ll: Cannot create - data would have length greater than 536
870911 (.Machine$integer.max/sizeof(integer))
Use traceback() to see the call stack

 I think that construction of a full symmetric matrix may be part of a
computationally fast algorithm, but I would be willing to trade some
extra time to do the computation with less space.

I am using "svd" that is included in the initial loading of Splus.

Thanks for any further pointers you might provide.

Joe Verducci

______________________________


> What do you want from the svd?  In a M x N system the left singular
> vectors form a M x M system, taking up much more space that the 2Gb
> Windows will allow S-PLUS.  By default svd() avoids this by giving you
> only N (for M > N) components of the singular vectors.
>
> You will probably do better to do things in two steps, for example
>
> A <- matrix(rnorm(30), 10, 3)
> S <- svd(A, nu=0)
> S$u <- A %*% S$v / rep(S$d, each=nrow(A))
>
> Now, if finding S is too slow (and it should not be with LAPACK-based
> code, so you might want to try the Matrix library) you can get the
> components (less accurately) from eigen(crossprod(A)),
>
> E <- eigen(crossprod(A))
> S <- list(d=sqrt(E$values), v=E$vectors)
> S$u <- A %*% S$v / rep(S$d, each=nrow(A))
>
> My 1GB laptop swapped rather a lot, but was able to do that in less than a
> minute CPU.
>
> On Fri, 22 Jul 2005 jsv@stat.ohio-state.edu wrote:
>
>>  I'm using Splus 7.0 on a Windows XP with 4Gb memory.  I would like to
>> perform an svd on a rectangular array where one dimension is very large
>> (around 30K) and the other is relatively small (around 100).
>> Unfortunately, svd is not one of the functions listed to take advantage
>> of the out-of-memory mode in version 7.0.  Right now my feasible limits
>> in-memory are (8K x 100).  It should be possible to use program an SVD
>> for rectangular arrays using sequential optimization, but before going
>> in that route, I am hoping someone here knows of an available code.
>
> --
> Brian D. Ripley,                  ripley@stats.ox.ac.uk
> Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
> University of Oxford,             Tel:  +44 1865 272861 (self)
> 1 South Parks Road,                     +44 1865 272866 (PA)
> Oxford OX1 3TG, UK                Fax:  +44 1865 272595
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news
>



<Prev in Thread] Current Thread [Next in Thread>