Hi, some more notes, the data frame generated with my simulated data is:
my.frame
patientid time aspirations
1 A 1 3
2 B 1 3
3 C 1 1
4 D 1 7
5 E 1 3
6 F 1 0
7 G 1 4
8 H 1 1
9 I 1 3
10 J 1 6
11 A 2 1
12 B 2 1
13 C 2 0
14 D 2 2
15 E 2 3
16 F 2 1
17 G 2 4
18 H 2 2
19 I 2 2
20 J 2 6
21 A 3 1
22 B 3 4
23 C 3 3
24 D 3 5
25 E 3 5
26 F 3 5
27 G 3 3
28 H 3 2
29 I 3 2
30 J 3 7
31 A 4 2
32 B 4 3
33 C 4 1
34 D 4 6
35 E 4 3
36 F 4 5
37 G 4 3
38 H 4 4
39 I 4 6
40 J 4 4
Using your data I get:
my.frame
patientid time aspirations
1 A 1 11
2 B 1 9
3 C 1 11
4 D 1 22
5 E 1 21
6 F 1 19
7 G 1 19
8 H 1 15
9 I 1 13
10 J 1 7
11 A 2 15
12 B 2 17
13 C 2 14
14 D 2 17
15 E 2 15
16 F 2 24
17 G 2 12
18 H 2 11
19 I 2 12
20 J 2 4
21 A 3 13
22 B 3 12
23 C 3 15
24 D 3 6
25 E 3 15
26 F 3 17
27 G 3 13
28 H 3 7
29 I 3 19
30 J 3 11
31 A 4 18
32 B 4 15
33 C 4 15
34 D 4 9
35 E 4 13
36 F 4 13
37 G 4 12
38 H 4 10
39 I 4 17
40 J 4 12
Running the same model I get:
> my.analysis<-glme(data=my.frame,fixed=aspirations~time, random=~1 |
patientid, family="poisson",method="REPQL")
> summary(my.analysis)
Generalized linear mixed-effects model fit by restricted PQL
Family: Poisson with Log link
Data: my.frame
AIC BIC logLik
34.92479 41.47514 -13.4624
Random effects:
Formula: ~ 1 | patientid
(Intercept) Residual
StdDev: 0.1340106 1.072718
Fixed effects: aspirations ~ time
Value Std.Error DF t-value p-value
(Intercept) 2.711057 0.1182229 29 22.93175 <.0001
time -0.037834 0.0409368 29 -0.92419 0.363
Deviance Within-Group Residuals:
Min Q1 Med Q3 Max
-2.591542 -0.5766377 -0.03564258 0.5698866 1.859828
Number of Observations: 40
Number of Groups: 10
Here is the information.
1. The model fits well, since the residual is close to 1.0, which we expect
it is an estimate of the dispersion.
2. The intercept values of .1340106 is the between patient variance of the
rate.
3. The time component, has a negative value, but it is not significant. E.G.
there is no significant linear relationship between time and log(lambda).
Here is the same analysis, which takes into consideration the correlation
between observations:
my.analysis<-glme(data=my.frame,fixed=aspirations~time,
correlation=corSymm(), random=~1 | patientid, family=poisson,method="REPQL")
> summary(my.analysis)
Generalized linear mixed-effects model fit by restricted PQL
Family: Poisson with Log link
Data: my.frame
AIC BIC logLik
31.17987 47.55573 -5.589933
Random effects:
Formula: ~ 1 | patientid
(Intercept) Residual
StdDev: 0.000300613 1.252183
Correlation Structure: General
Formula: ~ 1 | patientid
Parameter estimate(s):
Correlation:
1 2 3
2 0.292
3 -0.310 0.337
4 -0.557 0.294 0.820
Fixed effects: aspirations ~ time
Value Std.Error DF t-value p-value
(Intercept) 2.639046 0.1401231 29 18.83376 <.0001
time 0.003548 0.0501044 29 0.07082 0.944
Deviance Within-Group Residuals:
Min Q1 Med Q3 Max
-3.181238 -0.8460662 -0.2967695 0.7238918 2.393916
Number of Observations: 40
Number of Groups: 10
This allows one to make conclusions about parameter estimates in the
presence of correlated data. These are correlations amongst the weighted
residuals.
The full fit which includes a separate variance for each time period as well
the correlation structure gives the following model.
> my.analysis<-glme(data=my.frame,
fixed=aspirations~time,
random=~1 | patientid,
correlation=corSymm(),
weights=varIdent(form=~ 1 | time),
family=poisson,method="REPQL")
> summary(my.analysis)
Generalized linear mixed-effects model fit by restricted PQL
Family: Poisson with Log link
Data: my.frame
AIC BIC logLik
32.14563 53.43425 -3.072814
Random effects:
Formula: ~ 1 | patientid
(Intercept) Residual
StdDev: 0.0002803696 1.409948
Correlation Structure: General
Formula: ~ 1 | patientid
Parameter estimate(s):
Correlation:
1 2 3
2 0.519
3 -0.101 0.234
4 -0.517 0.071 0.710
Variance function:
Structure: Different standard deviations per stratum
Formula: ~ 1 | time
Parameter estimates:
1 2 3 4
1 0.939777 0.7741384 0.5449109
Fixed effects: aspirations ~ time
Value Std.Error DF t-value p-value
(Intercept) 2.648033 0.1546734 29 17.12016 <.0001
time -0.003349 0.0462071 29 -0.07248 0.9427
Deviance Within-Group Residuals:
Min Q1 Med Q3 Max
-3.265855 -0.8536019 -0.2971708 0.7908088 2.489928
Number of Observations: 40
Number of Groups: 10
##Now we generate the residuals. Note that we divide by the
##square root of the variance function, to get normalized residuals:
> tmp<-resid(my.analysis)/sqrt(fitted(my.analysis))
> dim(tmp)<-c(10,4)
This generates a matrix with one row per patient and one column for each
time:
First we compare the correlation structure and find the match is reasonably
close:
> cor(tmp)
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.49635398 -0.1009276 -0.51696713
[2,] 0.4963540 1.00000000 0.2347860 0.08737366
[3,] -0.1009276 0.23478603 1.0000000 0.70900454
[4,] -0.5169671 0.08737366 0.7090045 1.00000000
Next we look at the standard deviations for each time and compare them to
the model values:
> colStdevs(tmp)
[1] 1.4105200 1.3696283 1.0902171 0.7701237
###from the data we can calculate the standard deviation across patients for
###each time period: It is the residual standard error times the standard
###deviation
> c( 1, 0.939777, 0.7741384 ,0.5449109)*1.409948
[1] 1.409948 1.325037 1.091495 0.768296
They are very close. The model fits the data. Again we have no reason to
suspect any changes over time of the number of aspirations, after accounting
for patient, within patient correlation and variance.
> References:
1. Modern Applied Statistics with S, Fourth Edition, Venables and Ripley,
pages 292-299, If you can work through the math, its probably the easiest to
understand, good use of consistent notation.
2. Mixed Effects Models in S and Splus by Pinheiro and Bates. This will
explain the weights and correlation arguments.
3. glme is based on lme.
While I find it interesting that you can fit data with correlated Poisson
variables, it is often difficult to simulate the models with correlated
response data.
I hope this does help. It is a lot of detail, but the model progression
should be of interest.
For 5 observations per patient, just change the rep(LETTERS(1:10),4) to
rep(LETTERS(1:10,5) and rep(1:4, each=10) to rep(1:5, each=10). I might use
rep(0:4,each=10), so that the intercept gives the fixed effect for time
zero.
Tom
________________________________________
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Pat Farr
Sent: Wednesday, May 25, 2005 10:18 PM
To: Thomas Jagger
Cc: s-news@lists.biostat.wustl.edu
Subject: Re: [S] Question in Time Series
I am afraid to say I am not following you. Laura Holt kindly and promptly
proposed a way to tackle my problem. This was posted and short after Alain
Yamakana pointed out a flaw. How to fix the flaw is now the challenge.
Would GLME help me out? Could apply GLME to the following data?
serie1<-c(11, 9, 11, 22, 21, 19, 19, 15, 13, 7)
serie2<-c(15, 17, 14, 17, 15, 24, 12, 11, 12, 4)
serie3<-c(13, 12, 15, 6, 15, 17, 13, 7, 19, 11)
serie4<-c(18, 15, 15, 9, 13, 13, 12, 10, 17, 12)
I would appreciate to see step-by-step how GLME works. What would be an
introductory text on GLME? The method suggested by Laura Holt is posted
under "Question in Time Series -SUMMARY". The only difficulty with the
method is how to make it applicable to the case of say, 50 patients. Hope to
read from you again,
Thanks,
Pat
Thomas Jagger <tjagger@blarg.net> wrote:
Hi, one more shot at the data set. Since you are using count data we will
derive an example based on the new GLME (Generalized Linear Mixed Effects)
functionality and assume that the data are conditionally independent Poisson
random variables given the Poisson rate lambda. Note that this functionality
does allow correlation in the residuals, but we will assume that the
correlation is the result of using a random effect for the patient. This is
reasonable, since each patient can be seen as a random selection from a
larger pool of patients.
Suppose you create a data frame using the following sample data:
This sample has a base rate for each patient as 1,2,1,4,2,2,3,4 and we
assume that there is a linear trend in the log of the Poisson rate, with the
log rate as
0,.1,.2.,.3
Thus the actual rates for a patient D is 4*exp(c(0,.1,.2,.3)).
Generate the data:
aspirations<-rpois(40,
lambda=rep(c(1,2,1,4,2,3,2,2,3,4),4)+rep(exp(c(0,.1,.2,.3)),each=10))
I create a data frame from the data set and create a factor using the first
10 letters for the patientid. The time is 1,2,3,4 for each patient.
We assume that observations 1:10 are time 1, 11:20 time 2 etc, and that the
patients are in order, e.g. patient C is observation 3,13,23,33 etc.
my.frame< data.frame(patientid=rep(LETTERS[1:10],4),time=rep(1:4,each=10),
aspirations=aspirations)
Now, conditional on the individual we could do the following analysis, using
V7.0 or V6.2 Correlated Data library:
library(CorrelatedData)
my.analysis<-glme(data=my.frame,fixed=aspirations~time,
random=~1 | patientid, family="poisson",method="REPQL")
This assumes that the number of aspirations is Poisson with rate lambda that
depends on the time and the year.
The Poisson rate is modeled as:
log(lambda)=a*time+error
where a is the regression on time, here time is a value from 1 to 4, and
error is modeled as a random effect, one value for each of the patients.
Here is the result:
summary(my.analysis)
Generalized linear mixed-effects model fit by restricted PQL
Family: Poisson with Log link
Data: my.frame
AIC BIC logLik
80.87125 87.4216 -36.43563
Random effects:
Formula: ~ 1 | patientid
(Intercept) Residual
StdDev: 0.3547657 0.8713678
Fixed effects: aspirations ~ time
Value Std.Error DF t-value p-value
(Intercept) 0.8496517 0.2286127 29 3.716555 0.0009
time 0.1042568 0.0694778 29 1.500576 0.1443
Deviance Within-Group Residuals:
Min Q1 Med Q3 Max
-2.201736 -0.7919849 -0.1346267 0.5977866 1.484844
Number of Observations: 40
Number of Groups: 10
Also it is useful to get the fixed effect predictions and full prediction:
ww<-predict(my.analysis,level=c(0,1),type="response")
patientid predict.fixed predict.patientid
1 A 2.595836 1.891837
2 B 2.595836 2.423822
3 C 2.595836 1.645689
4 D 2.595836 3.769375
5 E 2.595836 2.852344
6 F 2.595836 2.423822
7 G 2.595836 2.852344
8 H 2.595836 2.151678
9 I 2.595836 2.706990
10 J 2.595836 4.252084
11 A 2.881080 2.099722
12 B 2.881080 2.690164
13 C 2.881080 1.826526
14 D 2.881080 4.183575
15 E 2.881080 3.165775
16 F 2.881080 2.690164
17 G 2.881080 3.165775
18 H 2.881080 2.388116
19 I 2.881080 3.004449
20 J 2.881080 4.719326
21 A 3.197669 2.330451
22 B 3.197669 2.985774
23 C 3.197669 2.027235
24 D 3.197669 4.643288
25 E 3.197669 3.513647
26 F 3.197669 2.985774
27 G 3.197669 3.513647
28 H 3.197669 2.650535
29 I 3.197669 3.334594
30 J 3.197669 5.237911
31 A 3.549046 2.586533
32 B 3.549046 3.313867
33 C 3.549046 2.249998
34 D 3.549046 5.153518
35 E 3.549046 3.899746
36 F 3.549046 3.313867
37 G 3.549046 3.899746
38 H 3.549046 2.941790
39 I 3.549046 3.701017
40 J 3.549046 5.813481
Since the link function is the log one can verify both the fixed data and
the patient data. Here we remove the intercept and check on the
coefficients, using the type="link" predictions.
Remove the base value and check on the fixed response (trend).
We divide by the base value so we get the trend.
round(log(1/2.59836*ww[,2])[c(1,11,21,31)],2)
[1] 0.00 0.10 0.21 0.31
Now we check on the random effects, note since we have a multiplicative
model, we must divide the full response by the fixed response:
round((2.59836*ww[,3]/ww[,2])[1:10],2)
[1] 1.89 2.43 1.65 3.77 2.86 2.43 2.86 2.15 2.71 4.26 compare to:
1 2 1 4 2 3 2 2 3 4
The product of the fixed response and the random effect should be close to
the initial rate value for lambda used in generating the samples.
See the CorrelatedData library for help on these functions.
Tom Jagger
Research Associate
FSU Geography Department
________________________________________
From: s-news-owner@lists.biostat.wustl.edu
[mailto:s-news-owner@lists.biostat.wustl.edu] On Behalf Of Pat Farr
Sent: Tuesday, May 24, 2005 2:02 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] Question in Time Series
Hi there!
I collected observations on four different occasions equally (or almost)
spaced 6 months. On each occasion, I registered the frequency of 10
different
individuals' aspirations. This results in data like {y11,
y12,....,y110,y21,...,y210,....,y41,y42,...,y410}. I want to analyze any
temporal correlation starting with rts() and tsplot(). However, I am facing
difficulty in determining the appropriate frequency. When I used
rts(....,frequency=10), I got nice plot, but my series goes from 1 to 40. On
the other hand if defined z1 = y11+y12+...+y10 and z2=... I ended up with
only {z1, z2, z3, z4}. Does this mean my data are not appropriate for time
series analysis? If it is, what would be the appropriate frequency? I am
afraid I am missing theory here.
Thanks for your kindness,
Pat
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu. To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message: unsubscribe s-news
|