s-news
[Top] [All Lists]

Fwd: Correlation matrix with dummy coded variables

To: "Fletcher, Thomas" <fletchert@umsl.edu>
Subject: Fwd: Correlation matrix with dummy coded variables
From: "Richard M. Heiberger" <rmh@temple.edu>
Date: Mon, 10 Oct 2005 15:02:26 -0400
Cc: <s-news@lists.biostat.wustl.edu>
To find out which dummy variables have been automatically created,
use the contrasts() function.  To control the dummy variables, you can
assign to the contrasts function.  See ?contrasts for details.

To see how the dummy variables look in your dataset, use the x=T argument
to the lm function.

> tmp<- data.frame(g=factor(rep(letters[1:4], 2)), y=rnorm(8), x=rnorm(8))
> tmp
  g            y           x 
1 a  0.312621837  0.82913486
2 b -1.138111370 -1.45293697
3 c -1.236898635 -0.47729847
4 d -0.575734340 -0.46414376
5 a  0.879161805 -0.68452210
6 b -0.272059928  0.25805670
7 c  0.021852414 -0.36991149
8 d -1.077289717 -1.30991319
> contrasts(tmp$g)
  [,1] [,2] [,3] 
a   -1   -1   -1
b    1   -1   -1
c    0    2   -1
d    0    0    3
> tmp.lm <- lm(y ~ x + g, data=tmp, x=T)
> tmp.lm$x
  (Intercept)           x g1 g2 g3 
1           1  0.82913486 -1 -1 -1
2           1 -1.45293697  1 -1 -1
3           1 -0.47729847  0  2 -1
4           1 -0.46414376  0  0  3
5           1 -0.68452210 -1 -1 -1
6           1  0.25805670  1 -1 -1
7           1 -0.36991149  0  2 -1
8           1 -1.30991319  0  0  3
> 

## If you prefer the (B==1) type contrasts, use
contrasts(tmp$g) <- contr.treatment(4)
contrasts(tmp$g)
tmp.lm <- lm(y ~ x + g, data=tmp, x=T)
tmp.lm$x


## You can control the order of the levels, it doesn't have to be alphabetical.

## defaults to alphabetical
size <- factor(c("little", "medium", "big"))
levels(size)
contrasts(size)

## take control of order
size <- factor(c("little", "medium", "big"),
              levels=c("little", "medium", "big"))
levels(size)
contrasts(size)


As you note, to interpret the regression coefficients you need to
know the dummy variables.  The x=T arg provides them for you.

---- Original message ----
>Date: Mon, 10 Oct 2005 12:32:00 -0500
>From: "Fletcher, Thomas" <fletchert@umsl.edu>  
>Subject: [S] Correlation matrix with dummy coded variables  
>To: <s-news@lists.biostat.wustl.edu>
>
>   I am trying to do (understand) 2 related things involving dummy coded 
> variables.
>
>   1. When assessing a model with functions such as lm or lme, SPLUS 
> automatically creates
>       dummy variables when appropriate. How can you interpret these? That is, 
> how do you
>       know what is the reference category and which of the new dummy 
> variables are related
>       to which level of the original variable? Is it the case that the 
> original levels are
>       alphabetically arranged and the first is the reference?
>        a. For example, suppose original variable is group (A, B, C, D) with 4 
> levels. Would
>            the new dummy variables be group1, group2, group3 where, group 1 
> is (B=1), group2
>            is (C=1), group3 is (D=1).
>
>   2. Suppose you want to create a correlation matrix involving the new dummy 
> codes from
>       question one above. Is there a built in function (i.e., something 
> within `cor') to
>       assist in this? Do you have to `manually' create the new dummies? I 
> think this latter
>       step shouldn't be too difficult; I am just struggling with trying to 
> match up the
>       regression results with the correlation matrix I need to present.

<Prev in Thread] Current Thread [Next in Thread>