s-news
[Top] [All Lists]

Re:

To: s-news@lists.biostat.wustl.edu
Subject: Re:
From: Yannis.C.Tzamouranis@dynegy.com
Date: Fri, 3 May 2002 09:45:38 -0500
Folks:
I have a question pertaining to cleaning raw data from a variety of
sources.
Instead of writing many small functions to alter data (e.g. take one
source's "999s" out and replace them with NAs, or delete certain rows), I
want to write a generic routine.

The input to this would be dataframe like this:
Flags Rules Actions
0     "<="  NA
999   "="   NA
-3    "="   removeRow
Any   my.func     0

and vectors or dataframes of raw data:
 x <- c( 235.6, 232.5, 368.2, 335.1, 209.9, 193.2,   NA, 357.8,  999,   -9,
-20, 468.1, 381.3, 393.1,   NA, 400.7)

I would like to combine the flags, rules and actions vectors so that the
first rule is combined with the first flag and then the first action is
applied if true, e.g.
"look in vector x for values that are <=0; if you find it, set it to NA"
"look in vector x for values that are =999; if you find it, set it to NA"
"look in vector x for values that are =-3; if you find it remove the row"
"apply myfunc to all elements of the vector; set those elements for which
myfunc is TRUE  to 0"

I have a couple of conceptual problems with this:
1) I want to avoid the for loops  since the input data can be LONG
2) I need to extract the elements of rules (e.g. "<=" or myFunc)  and use
as operators...
Any ideas?  Has anyone done something like this before?  If this is not the
right way to go about it, any other approaches?

Thanks,
Yannis Tzamouranis


<Prev in Thread] Current Thread [Next in Thread>
  • Re:, Yannis . C . Tzamouranis <=