s-news
[Top] [All Lists]

Re: replacement of character in column by values

To: <s-news@lists.biostat.wustl.edu>
Subject: Re: replacement of character in column by values
From: <D.Ciraki@lse.ac.uk>
Date: Tue, 30 Oct 2007 21:51:38 -0000
References: <64E0B1A1FF5F1B4497212A108E11118F31F558@MAPIUDEM2.sim.umontreal.ca>
Thread-index: AcgbNIixuvzJfBEgRh6UI/iXCmdHUwACTSJ2
Thread-topic: replacement of character in column by values
If you have the finmetrics module, there are couple of useful vectorised 
functions that will run fast on fairly large data sets. First is to use tslag() 
function, which can be used within the ifelse(), but you will need to convert 
the data.frame into a matrix and then back if you wish to make ifelse work on 
the entire table, e.g. if your original data was in a data frame called MyRows, 
then you can do something like:

MyRows.mat <- as.matrix(MyRows)
MyRows.mat <- ifelse(tslag(MyRows.mat)==1 & tslag(MyRows.mat,-1)==1 & 
is.na(MyRows),1,MyRows.mat)
MyRows.mat <- ifelse(tslag(MyRows.mat)==0 & tslag(MyRows.mat,-1)==0 & 
is.na(MyRows),0,MyRows.mat)
MyRows.df <- data.frame(MyRows.mat) # convert back to data.frame
colIds(MyRows.df) <- colIds(MyRows) # replace original column names

A more efficient alternative (also needing finmetrics) is to use the interpNA 
function with linear interpolation -- since the only way linear interpolation 
will yield 0.5 is if the NA was between 1 and 0 (or vice versa), but will be 
zero if NA is between zeros and 1 if between, as you would like, you can use 
the trick of using linear interpolation and then replace 0.5 values with NAs, 
e.g.

MyRows.df <- MyRows
MyRows.df <- interpNA(MyRows,method="linear")
MyRows <- ifelse(MyRows.df==0.5,NA,MyRows.df) # overwrite the original data

This should be very fast even if you have large number of rows and columns.

-----Original Message-----
From: s-news-owner@lists.biostat.wustl.edu on behalf of El Imam Hanan Attia Rizk
Sent: Tue 10/30/2007 8:36 PM
To: s-news@lists.biostat.wustl.edu
Subject: [S] replacement of character in column by values
 
Dear S Plus user
I am working in a big data, I have two data frames that  I did bind by rows 
(rbind)  and I gut these columns(A,B,C,D)
 
A    B    C    D
 
0     0    0     0
NA NA NA NA
NA NA NA NA
0      0    0      0
1      1    0      1
NA NA NA NA
1      1    0     1

I would like to go through each column and replace NA by 0 if it is between 2 
zeros, and if it is between two ones replace it by 1
 
I would really appreciate if any of you have the chance to give me some 
suggestions.

Hanan Elimam
Ph.D student, Pharmaceutical Sciences
Université de Montréal
Faculté de pharmacie
Pavillon Jean-Coutu, bureau 3173
2940 Chemin de la polytechnique
Montréal, H3T 1J4
 
tél.:  514-343-6111, poste 0388
FAX: 514-343-7073
--------------------------------------------------------------------
This message was distributed by s-news@lists.biostat.wustl.edu.  To
unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
the BODY of the message:  unsubscribe s-news



Please access the attached hyperlink for an important electronic communications 
disclaimer: http://www.lse.ac.uk/collections/secretariat/legal/disclaimer.htm

<Prev in Thread] Current Thread [Next in Thread>