s-news
[Top] [All Lists]

[S] RE: Generating correlated binary data

To: <s-news@wubios.wustl.edu>
Subject: [S] RE: Generating correlated binary data
From: "Christopher R. Bilder" <bilder@stat.ksu.edu>
Date: Thu, 24 Jun 1999 11:54:40 -0500
Importance: Normal
In-reply-to: <199906240650.BAA15340@wubios.wustl.edu>
Sender: owner-s-news@wubios.wustl.edu

Hello,

In my research, I have found 4 papers regarding generating correlated binary
data:

1) Park, Park, Shin (1996) - Use the Poisson distribution
2) Emrich & Piedmonte (1991) - Use the multivariate normal distribution
3) Gange (1995) - Uses Iterative proportional fitting
4) Lee (1993) - Has 2 methods a) linear programming b) Copulas


Lee's paper gives S-Plus functions to do his methods!

I hope this helps.

Christopher R. Bilder
Kansas State University
Department of Statistics
Dickens Hall, Room 9B
Manhattan, KS 66506
Office: (785) 532-0527
Fax: (785) 532-7736
bilder@stat.ksu.edu
www-personal.ksu.edu/~bilder
STAT 351: www.ksu.edu/stats/tch/bilder/s351

GO BIG RED!

-----Original Message-----
From: owner-s-news-digest@wubios.wustl.edu
[mailto:owner-s-news-digest@wubios.wustl.edu]
Sent: Thursday, June 24, 1999 1:50 AM
To: s-news-digest@wubios.wustl.edu
Subject: S-News Digest V1 #486


S-News Digest         Thursday, June 24 1999         Volume 01 : Number 486



       In this issue:

               [S] block-kriging
               [S] S-plus performance
               [S] phreg
               [none]
               [S] Generating correlated binary random variables
               [S] How do you create a formula for a model with a large
number of variables?
               Re: [S] How do you create a formula for a model with a large
number of variables?
               [S] Remove column(s) from data frame?
               RE: [S] Generating correlated binary random variables

----------------------------------------------------------------------

Date: Wed, 23 Jun 1999 09:08:25 +0200
From: R.N.M.Duin@rikz.rws.minvenw.nl
Subject: [S] block-kriging

Hi

We are using splus 4.5 with the additional package spatial stats. Does
anyone
has built in yet a block-kriging function or know how to do this. What we
want is to estimate a average value of a variable within a prescribed local
area. Main problems arethe estimation of  the covariance between the blocks.

with best wishes Richard

- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 09:54:55 -0300
From: =?iso-8859-1?Q?Jos=E9?= Ailton Alencar andrade <andrade@inep.gov.br>
Subject: [S] S-plus performance

    Hello All,

    I am using S-plus 4.5 release 2 for windows 95. S-plus has a lot of
memory problems. For example, I can not import a big data frame from SAS
because it stop working or get too slow.

    What can I do, in terms of setup, for improve the s-plus
ferformance?

    bye.

- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 08:46:56 -0500
From: "Therneau, Terry M., Ph.D." <therneau@mayo.edu> (Terry Therneau)
Subject: [S] phreg

   I have compared fits between SAS phreg and Splus coxph on scores of
data sets.  Both procedures are reliable, and their answers agree.  A
query about "why are they different" gets sent to me once or twice a year.
In order of frequency the usual reasons are
        a. phreg was run with ties=breslow (its default) and coxph with
            ties=efron (its default).
        b. the SAS and Splus data sets differ.
        c. trivial differences due to convergence.

As to the last one: phreg has a smaller "epsilon" for convergence than
coxph,
and often does one more iteration.  I am opinionated on this, if the se of
beta is .1, say, then iterating until the MLE is correct to the .001 digit
is silly -- a really smart program wouldn't even print those digits.

   It is possible that you have found a test case in which one or the other
of the programs is in error.  I am always interested in such potential
cases,
but need more details, and probably a copy of the data, to follow up.

   My favorite "differences" email to date was the one that began "I have
found a bug in your coxph program; it gives a different answer than SAS".


 Terry M. Therneau, Ph.D.                        (507) 284-3694
 Head, Section of Biostatistics                  (507) 284-9542  FAX
 Mayo Clinic                                     therneau.terry@mayo.edu
 Rochester, Minn 55905
- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 13:05:22 -0500 (CDT)
From: Erin Hodgess <hodgess@uhdux2.dt.uh.edu>
Subject: [none]

Hi there!

Here's another variation on the theme:

> grid1
function(x, n = length(x))
{
        y <- rep(cbind.data.frame(x), n)
        z <- expand.grid(y)
        z
}

> xa
[1] "a" "b"
> grid1(xa,3)
  X1 X1 X1
1  a  a  a
2  b  a  a
3  a  b  a
4  b  b  a
5  a  a  b
6  b  a  b
7  a  b  b
8  b  b  b


Sincerely,
Erin M. Hodgess, Ph.D.
Assistant Professor
Department of Computer and Mathematical Sciences
University of Houston - Downtown
One Main Street
Houston, TX 77002
e-mail: hodgess@uhdux2.dt.uh.edu


Subject: Re2: [splus-users,11557] [S] Combinatory function
In-Reply-To: <199906221414.XAA25328@cabbage.math.keio.ac.jp>


Hi!

>dd1<-seq(from=1, by=1, length=5)
>comb <- expand.grid(x1 =dd1 , x2 =dd1, x3 =dd1, x4 = dd1, x5=dd1,
>xx6=dd1)
>This program gives you all combinations of c(1,2,3,4,5).
>If you replace "1" by "a", "2" by "b",-----, "5" by "e",
>it results in all combinations of a,b,c,d,e.


   Program like below may be simpler; you do not have
to replace numbers by letters :

function()
{
        dd1 <- c("a", "b")
        comb <- expand.grid(x1 = dd1, x2 = dd1, x3 = dd1)
        print(comb)
}


Result is :

  x1 x2 x3
1  a  a  a
2  b  a  a
3  a  b  a
4  b  b  a
5  a  a  b
6  b  a  b
7  a  b  b
8  b  b  b

K. Takezawa

   *****    Kunio Takezawa, Ph.D. (takezawa@affrc.go.jp)    *****
  *****            Research Information Section              *****
 **** Hokuriku National Agricultural Experiment Station, JAPAN ****
*****     <http://www.inada.affrc.go.jp/~takezawa/patent-e.html>    *****

- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 14:09:38 -0400
From: "Davidov, Ori" <ori_davidov@merck.com>
Subject: [S] Generating correlated binary random variables

Hi,

Does anyone out there have or knows of a S-Plus function that generates
correlated binary rv's. Obviously there are many dependence structures and
therefore many possible functions.

Any information will be helpful to me at this point, thanks,

Ori
- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 17:07:37 -0400
From: "Michael Radmacher" <mdradmac@helix.nih.gov>
Subject: [S] How do you create a formula for a model with a large number of
variables?

I'm trying to do a stepwise logistic regression using the step.glm function.
My question is about how to create the formula for my model.  I have a large
set of variables to work with (over 100) and want to do a step-forward
regression which starts with a null model, adding one variable at a time
until it attains the best fit.

The problem is, as input to step.glm, I must define the scope of models for
the stepwise search.  The formula for the lower bound is easy (it's RESPONSE
~ 1) but the upper bound should contain all of the more than 100 potential
regressor variables.  My question is, how can this be done in Splus without
having to explicitly list every single one of the variables in the formula
(i.e., RESPONSE ~ X1 + X2 + ... + X100 + ...)?

I've tried doing this by using a matrix, X, where each column of the matrix
contains one of the regressor variables and then using the formula RESPONSE
~ X.  The problem with this is that X is considered a single factor and in
the stepwise regression, only the entire matrix is considered for addition,
not individual columns of the matrix.

I think there must be a simple solution to this problem, but haven't had any
luck looking through the manuals I have. Any help you can give would be
greatly appreciated.

Thanks,
Michael Radmacher


- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 16:55:19 -0500 (CDT)
From: Edward Malthouse <ecm@casbah.acns.nwu.edu>
Subject: Re: [S] How do you create a formula for a model with a large number
of variables?

> I'm trying to do a stepwise logistic regression using the step.glm
function.
> My question is about how to create the formula for my model.  I have a
large
> set of variables to work with (over 100) and want to do a step-forward
> regression which starts with a null model, adding one variable at a time
> until it attains the best fit.
>
> The problem is, as input to step.glm, I must define the scope of models
for
> the stepwise search.  The formula for the lower bound is easy (it's
RESPONSE
> ~ 1) but the upper bound should contain all of the more than 100 potential
> regressor variables.  My question is, how can this be done in Splus
without
> having to explicitly list every single one of the variables in the formula
> (i.e., RESPONSE ~ X1 + X2 + ... + X100 + ...)?
>
> I've tried doing this by using a matrix, X, where each column of the
matrix
> contains one of the regressor variables and then using the formula
RESPONSE
> ~ X.  The problem with this is that X is considered a single factor and in
> the stepwise regression, only the entire matrix is considered for
addition,
> not individual columns of the matrix.
>
> I think there must be a simple solution to this problem, but haven't had
any
> luck looking through the manuals I have. Any help you can give would be
> greatly appreciated.
>
> Thanks,
> Michael Radmacher
>
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message:  unsubscribe s-news
>

When I have to analyze a data set in Splus with many variables, I
create a text file with my formula and source it in.  Utility
programs such as awk and perl are indispensable when it comes to
creating such a file.  For example, suppose the file "formula.s"
contains

myform <- y~x1+x2+x3+x4+x5+x6+x7+x8+x9+x10

Then I type
> source("formula.s")
> fit <- lm(myform, data=mydata)

Ed Malthouse

Dr. Edward C. Malthouse
Assistant Professor
Integrated Marketing Communications Department
Medill School of Journalism
1908 Sheridan Road
Evanston, IL  60208-1290
Tele:  847-467-3376
Fax:  847-491-5925
- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Wed, 23 Jun 1999 21:04:27 -0500
From: "David Parkhurst" <parkhurs@indiana.edu>
Subject: [S] Remove column(s) from data frame?

How do I remove one or more columns from a data frame?  Please reply
directly to me and I'll summarize unless it's too trivial.

Thanks.

Dave Parkhurst

- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

Date: Thu, 24 Jun 1999 12:59:57 +1000
From: "Ellis, Nick (Marine, Cleveland)" <Nick.Ellis@cmis.csiro.au>
Subject: RE: [S] Generating correlated binary random variables

> -----Original Message-----
> From: Davidov, Ori [mailto:ori_davidov@merck.com]
> Sent: Thursday, 24 June 1999 4:10
> To: 's-news@wubios.wustl.edu'
> Subject: [S] Generating correlated binary random variables
>
>
>
> Hi,
>
> Does anyone out there have or knows of a S-Plus function that
> generates
> correlated binary rv's. Obviously there are many dependence
> structures and
> therefore many possible functions.
>
> Any information will be helpful to me at this point, thanks,
>
> Ori

Here's one idea.

Take X ~ Bernoulli(p), Y ~ Bernoulli(q), W ~ Bernoulli(r) and define
Z = (X|Y)&Z
Then Z ~ Bernoulli((p+(1-p)q)r) and
Cov(X,Z) = p(1-p)(1-q)r so that
Cor(X,Z) = sqrt(p(1-p)r) (1-q) / srqt[(p+(1-p)q) (1-pr-(1-p)qr)]

You can tune q and r to give desired values of E(Z) and Cor(X,Z). For
negative correlations use
Z = ((!X)|Y)&Z. The reason I used two auxiliary random variables Y and W was
to get a truth table for X and Z with all cells having non-zero probability
(see tables in example below).

Examples follow:

> p<-q<-r<-.5
> x<-rbinom(1000,1,p)
> y<-rbinom(1000,1,q)
> w<-rbinom(1000,1,r)
> z<-(x|y)&w
> apply(cbind(x,y,w,z),2,mean)
     x     y    w     z
 0.511 0.473 0.48 0.356
> (p+(1-p)*q)*r # E(z)
[1] 0.375
> round(var(cbind(x,y,w,z)),2)
      x     y     w    z
x  0.25 -0.01  0.00 0.06
y -0.01  0.25 -0.01 0.05
w  0.00 -0.01  0.25 0.19
z  0.06  0.05  0.19 0.23
> p*(1-p)*(1-q)*r # Cov(x,z)
[1] 0.0625
> round(cor(cbind(x,y,w,z)),2)
      x     y     w    z
x  1.00 -0.03 -0.01 0.26
y -0.03  1.00 -0.04 0.20
w -0.01 -0.04  1.00 0.77
z  0.26  0.20  0.77 1.00
> sqrt(p*(1-p)*r)* (1-q) / sqrt((p+(1-p)*q)* (1-p*r-(1-p)*q*r)) # Cor(x,z)
[1] 0.2581989
> table(x,x|y)
  FALSE TRUE
0   250  239
1     0  511
> table(x,x&y)
  FALSE TRUE
0   489    0
1   277  234
> table(x,z)
  FALSE TRUE
0   376  113
1   268  243

Nick Ellis
CSIRO Marine Research   mailto:Nick.Ellis@marine.csiro.au
PO Box 120                      ph    +61 (07) 3826 7260
Cleveland QLD 4163      fax   +61 (07) 3826 7222
Australia                       http://www.marine.csiro.au

- -----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

------------------------------

End of S-News Digest V1 #486
****************************

-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu.  To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message:  unsubscribe s-news

<Prev in Thread] Current Thread [Next in Thread>
  • [S] RE: Generating correlated binary data, Christopher R. Bilder <=