jmp-l
[Top] [All Lists]

Parsing ugly data from Excel

To: jmp-l@lists.biostat.wustl.edu
Subject: Parsing ugly data from Excel
From: kyle hamar <kyle.hamar@gmail.com>
Date: Fri, 6 Jan 2006 09:54:06 -0800
Domainkey-signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type:content-transfer-encoding:content-disposition; b=AMDDQZkUXoxNbWBFW2nWhmDQ2JGAtA40PHM5LmWDjfv8Rr9S6AQhIJ29YO87xeAi9L4hgLkWWwQyLAltOOx6QLEc9ZvbF90hoiMOqLmqacjBiv4pdBHVn542AMXFE1f9q6yGw2bK/QRJ8zPuTZqCDTUQs3Y2IOKIL2m0hFN3hCI=
Hello all. I am new to the list and finding the archives to be useful
while learning JSL.

Perhaps you have found (and hopefully solved) a similar problem more
efficiently:

I have data in excel files that contains entries such as "<1" for
below detection limit results. My goal is to automate the creation of
simple run charts from JMP.

To display the "<1" values as "1" on the charts I am stripping the "<"
sign like this:

file = "k:\testCNH4OH.xls";
table = open(file);
Current Data Table(table);
nr = NRow(table);
nc = NCol(table);
show(nr, nc);
table<<Minimize Window;

//note: I am only concerned about data started from column 15 and higher
for (j=15, j<=nc, j++,
    col = Column(j);
    name = col<<Get Name;
    write("\!r"||name);
    for (i=1, i<=nr, i++,
        Column Name(j)[i]=Munger(Column Name(j)[i], 1, "<", "");
//strip the < here
         write("."); //progress is slow - display something during operation
        ));

Sadly, the Munger will not operate on the data unless it is in character format.
This forces me to change all the columns to character first:

    for (j=15, j<=nc, j++,
        write("\!rcol "||char(j)||" ");
        col = Column(j);         //assign a reference to this col
        name = col<<Get Name;     //get the col name
        write("Named: "||name||" ");
        col("name") << data type (character);
        write("has been set to type character.");
        );

Then change them back to numeric for graphing after the stripping
operation above:
    // change them to numeric format
    for (j=15, j<=nc, j++,
        write("\!rcol "||char(j)||" ");
        col = Column(j);         //assign a reference to this col
        name = col<<Get Name;     //get the col name
        write("Named: "||name||" ");
        col("name") << data type (numeric);
        col("name") << Preselect Role(Y);
        write("has been set to type numeric with role Y.");
        );

This process is not foolproof. I have one excel file that fails for
some unknown reason. I think a non-printing character (perhaps a Kanji
character or some such) stops the process cold. JSL does not appear to
understand the full range of printable character sets.

Is anyone else parsing data to remove unwanted garbage like "<" signs?
What is your solution?

Kyle

<Prev in Thread] Current Thread [Next in Thread>