s-news
[Top] [All Lists]

[S] power set of a vector - summary

To: s-news <s-news@wubios.wustl.edu>
Subject: [S] power set of a vector - summary
From: Bin Chen <pbchen@gsbphd6.uchicago.edu>
Date: Wed, 20 Oct 1999 22:31:06 -0500 (CDT)
Sender: owner-s-news@wubios.wustl.edu
Many thanks to Mr. Berry and Mr. Lancelot for sharing their wisdom with
me; I've got my job done with their very helpful tips:

On Wed, 20 Oct 1999 17:18:34 +0100, Renaud Lancelot
<lancelot@telecomplus.sn> wrote:

Bin Chen wrote:
> 
> Hi. Is there a simple command that gives the power set of a vector
> (that is, all its subsets)? I'm trying to run 256 variants of a least
> squares model involving a maximum of 8 independent variables
> (including the intercept) and then compute their posterior odds. I
> figure if I can get all the subsets of the names of the X data.frame I
> would be able to do this quickly, using a loop in which
> 
> subset.of.names <-- ?? names(X) ??
> lm(data.frame(y,X[,row(matrix(names(X)))[names(X)==subset.of.names]]))


Hi Bin,

I recently met the same problem and here is how I did:

Suppose we want to perform all subset models of:
lm(cost ~ age + type + car.age, data=na.omit(claims))


1. Write a subsets() function, found in VR 2nd ed. p. 149 or in the
S-news archives:

subsets <- function(n, k, set = 1:n)
    if(k <= 0) NULL else
      if(k >= n) set else
        rbind(cbind(set[1], Recall(n - 1, k - 1, set[-1])),
        Recall(n - 1, k, set[-1]))

2. Compute all the covariate combinations:
> f <- c("age", "type", "car.age")
> form <- NULL
> K <- 0
> for(i in 1:length(f)) {
        m <- subsets(length(f), i)
        l <- ifelse(i < length(f), dim(m)[1], 1)
        for(j in 1:l) {
                K <- K + 1
                formu <- NULL
                for(k in 1:i) {
                        if(i < length(f))
                                formu <- paste(formu, f[m[j, k]], sep =
ifelse(k == 1, "", "+"))
                        else formu <- paste(formu, f[k], sep = ifelse(k ==
1, "", "+"))
                }
                formu <- paste("cost ~ ", formu, sep = "")
                form[K] <- formu
        }
}
> rm(m, l, K, formu)
> form
[1] "cost ~ age"              "cost ~ type"             "cost ~
car.age"          "cost ~ age+type"        
[5] "cost ~ age+car.age"      "cost ~ type+car.age"     "cost ~
age+type+car.age"

3. Compute all the linear models:
> for(i in 1:length(form)) {
        assign(paste("lmod.", i, sep = ""), try(lm(form[i], data =
na.omit(claims))), where = 1, immediate = 
                T)
}
> objects(pattern = "lmod.*")
[1] "lmod.1" "lmod.2" "lmod.3" "lmod.4" "lmod.5" "lmod.6" "lmod.7"
> tapply(objects(pattern = "lmod.*"), 1:length(objects(
        pattern = "lmod.*")), function(x)
class(get(x)))
    1    2    3    4    5    6    7 
 "lm" "lm" "lm" "lm" "lm" "lm" "lm"
> result <- tapply(objects(pattern = "lmod.*"), 1:length(
        objects(pattern = "lmod.*")), function(x)
summary(get(x)))
> names(result) <- form
> result[3]
$"cost ~ car.age":

Call: lm(formula = form[i], data = na.omit(claims))
Residuals:
    Min     1Q Median    3Q   Max 
 -144.7 -49.47 -29.69 12.78 576.3

Coefficients:
                Value Std. Error   t value  Pr(>|t|) 
(Intercept)  235.3179   10.3098    22.8247    0.0000
  car.age.L -120.6810   20.8369    -5.7917    0.0000
  car.age.Q  -23.3411   20.6196    -1.1320    0.2599
  car.age.C   -0.3798   20.3999    -0.0186    0.9852

Residual standard error: 114.2 on 119 degrees of freedom
Multiple R-Squared: 0.224 
F-statistic: 11.45 on 3 and 119 degrees of freedom, the p-value is
1.195e-006 

Correlation of Coefficients:
          (Intercept) car.age.L car.age.Q 
car.age.L 0.0488                         
car.age.Q 0.0265      0.0420             
car.age.C 0.0050      0.0159    0.0259   


*********************

Careful:

1. assign() will overwrite any existing object with the same name as the
assigned name

2. If you plan to include interaction and / or some of your models are
saturated or lead to singular matrices, you will have errors and for()
will stop. That's the reason why I have embedded the fitting function
within a try() call. This function exists in S+ 2000 and is as
following:

try <- function(expr, first = T)
{
# evaluates expr (in frame of caller) and returns either
# its value or the error message (with class "Error").
# DO NOT USE THE 'first' ARGUMENT.
        restart(first)
        if(first) {
                first <- F
                expr
        }
        else {
                error.msg <- get.message()
                class(error.msg) <- "Error"
                error.msg
        }
}

In case of an error occurring within try(), an object of class "error"
is returned, which can be separated from regular fitted objects before
editing the summaries or doing whatever you want.

3. You can also decide to set options(warn=2) which turns any warning
(false convergence, etc.) into an error.

Hope this helps,

Renaud

-- 
Renaud Lancelot
ISRA-LNERV
BP 2057 Dakar-Hann
Senegal

tel     (221) 832 49 02
fax     (221) 821 18 79
email   renaud.lancelot@cirad.fr


On Wed, 20 Oct 1999 09:48:09 -0700 (PDT), "Charles C. Berry"
<cberry@tajo.ucsd.edu> wrote:

Hmmm. This sounds more like a job for leaps(), but if lm() really is the
way to 
go,

all.combos <- as.logical(
                as.matrix(
                        do.call("expand.grid",rep(list(c(0,1)),8))
                        ))

returns a matrix whose 256 rows give all possible subsets

then names(X)[all.combos[i,]] will return the names of the vraibles 
corresponding to the subset for the i^{th} row.


>
>Hi. Is there a simple command that gives the power set of a vector
>(that is, all its subsets)? I'm trying to run 256 variants of a least
>squares model involving a maximum of 8 independent variables
>(including the intercept) and then compute their posterior odds. I
>figure if I can get all the subsets of the names of the X data.frame I
>would be able to do this quickly, using a loop in which
>
>subset.of.names <-- ?? names(X) ??
>lm(data.frame(y,X[,row(matrix(names(X)))[names(X)==subset.of.names]]))
>...
>
>Thanks a lot in advance.
>
>- Bin Chen
>-----------------------------------------------------------------------
>This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
>send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
>message:  unsubscribe s-news

Charles C. Berry                        (858) 534-2098 
                                         Dept of Family/Preventive
Medicine
E mailto:cberry@tajo.ucsd.edu            UC San Diego
http://hacuna.ucsd.edu/members/ccb.html  La Jolla, San Diego 92093-0645

 



-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>