Hi S+ers,
I did not see this message appear on the list. Hence, I resend it. Sorry
if crossposting occurred.
I have a dataframe with the columns "cluster", "period" and "sp" (for
species), and a last column called "r1" filled with a series of 1, which
I use to aggregate the data (see below; not very elegant, but it works).
As a first step, I would like to remove those rows which have exactly
the same values for the three columns (there are many of them and the
dataframe has ca. 25'000 rows), e.g.
>come1
cluster period sp r1
206 17 1 casp 1
207 17 1 casp 1
208 16 2 casp 1
276 12 1 casp 1
277 12 2 casp 1
278 11 2 casp 1
493 3 1 casp 1
494 3 1 casp 1
525 10 1 casp 1
784 6 2 cavi 1
...
To do this, I used the following command:
>aggregate.data.frame(come1,list(cluster=come1$cluster,period=come1$period,sp=come1$sp),FUN=mean)[,c(1,2,3,7)]
cluster period sp r1
cluster period sp r1
1 3 1 casp 1
2 10 1 casp 1
3 17 1 casp 1
4 11 2 casp 1
5 12 1 casp 1
6 12 2 casp 1
7 16 2 casp 1
8 6 2 cavi 1
...
but there is probably a much simpler way to reach the same result,
especially since some columns appeared to be duplicated in the resulting
dataframe (that why I have to specify a subset of columns in the final
brackets). Thanks for any better solution on this particular point !!
Second, I would then like to get a table with every single combination
of "cluster" and "period" defining the rows, and "sp" defining the
columns. When I do this in Excel or Access, I usually use a column of
zero (see "r1" in the df above) to specify as the values to aggregate.
The result should be a table with 0 or 1 as values for the species'
columns (I will have about 80 of them), of the type:
cluster period casp cavi ...
3 1 1 0
6 2 0 1
10 1 1 0
11 2 1 0
12 1 1 0
12 2 1 0
16 2 1 0
17 1 1 0
...
I was thinking to use either the 'table' or 'crosstabs' functions, but
it does not really give me what I want. I need to be able to use the
resulting table as a dataframe for further analyses (i.e. being able to
read the "casp", "cavi", ... columns). 'Table' could perhaps work if I
could define the first argument as a factor defined from the combination
of "cluster" AND "period" (every single combination would be a level -->
see e.g. for cluster 12 in the table above). Entering all three columns
as arguments in 'table' results in a three-way array which is not
exactly what I want.
Thanks a lot in anticipation for your help.
I will summarize,
Cheers
Antoine
+----------------------------------------------------------------+
| |
| Antoine Guisan, Dr. |
| |
| Swiss Center for Faunal Cartography (CSCF) |
| Terreaux 14 tel +41 32 724 92 97 |
| CH-2000 Neuchâtel fax +41 32 717 79 69 |
| Suisse Antoine.Guisan@cscf.unine.ch |
| http://www.cscf.ch |
| |
| See also http://www.cscf.ch/projets/odo/index_fr.htm |
http://www.cscf.ch/projets/chs/index.htm
| |
| You can also send me a fax at my home office |
| fax +41 21 683 30 67 |
| |
+----------------------------------------------------------------+
|