I would like to generate a dataset of several (say 1000) correlated binary variables and an
outcome variable (continous) which is correlated to a subset of the binary variables. I am trying
to mimick data from HIV genotyping where the binary variables represent presence or absence of a
mutation at positions of the HIV genome. The outcome is a measure of HIVdrug resistance. Some
position mutations confer resistance while others dont.
Let X1,...Xp be p binary variables and Y the outcome. I want E(Y|X1,...,Xp) = E(Y|Xm,...Xp) where
1<m<p and that E(Y|Xi=0) = E(Y|Xi=1) for 1<=i<m. For this to hold, we probably want Corr(Xi,Xj) =
0 for i<m and j>=m.
Please help - I am on a tight deadline.
|