s-news
[Top] [All Lists]

Re: handling octals generated by importData

To: Jonathan Dakin <jdakin@overwey.demon.co.uk>
Subject: Re: handling octals generated by importData
From: Tony Plate <tplate@blackmesacapital.com>
Date: Mon, 09 Jan 2006 18:08:30 -0700
Cc: Splus Mailing List <s-news@lists.biostat.wustl.edu>
In-reply-to: <000f01c61545$8e675060$1afea8c0@jdakin>
References: <000f01c61545$8e675060$1afea8c0@jdakin>
User-agent: Mozilla Thunderbird 1.0.5 (Windows/20050711)
I suspect one problem you are having is that you might be confusing the contents of the string with the way it is printed out.

E.g., consider this example:

> x <- "\001\240"
> x
[1] "\001\240"
> nchar(x) # number of characters in the string is 2, not 8!
[1] 2
> AsciiToInt(x)
[1]   1 160
>

Note that the backslash in the definition of the string, and in how it is printed, are not actually part of the string -- they are just part of how a string containing these non-printing characters is described.

That might be why your attempts to find a generic identifier based on "\" are not succeeding (if my understanding of what you are trying to do is correct, it *is* why).

This doesn't solve your problem, but maybe it gives some better understanding of what's going on. The function AsciiToInt() might help. I thought there was an inverse of AsciiToInt() in S-PLUS, but I can't find it now.

-- Tony Plate

Jonathan Dakin wrote:
Could I ask if anyone's come up against handling octal characters generated by importing data from excel files. Excel sheets seem to be littered with invisible control codes, both inside apparently empty cells, and prefixed to data within cells. When such data is imported into S using importData, such fields as "\240\240\240\240\240" or "\001\240" pop up. (I've previously attached a sample file with code on a previous posting:

_http://www.biostat.wustl.edu/archives/html/s-news/2005-12/msg00061.html_

I'm trying to write code to weed out these nonsense characters. However, they don't handle in the normal way. Each is preceded by "\", which would be an obvious marker for all such fields. But substring (x,1,1) returns "\240", making a generic identification, which would fit all possible octals impossible. (An is.octal function would be nice !) Does anyone know of another approach to this ? Many thanks. (I'm using Splus 6.1 under W2K).

Jonathan Dakin
Portsmouth Hospital UK




<Prev in Thread] Current Thread [Next in Thread>