The Hmisc library has a little function cleanup.import that when run on a data
frame
imported from an Excel spreadsheet will clear up this kind of problem plus a few
others, as well as make the data frame stored more efficiently. cleanup.import
will
sense that a character vector has values that are always numeric (or blank) and
will
change it to a numeric vector. -Frank Harrell
Peter Wollan wrote:
> Another problem with importing from Excel: if a column is mostly numeric,
> but has a few character values somewhere in the middle, S-plus imports the
> column as numeric and replaces the characters with NA. Columns which have
> enough characters in the first few rows (100?) are imported as character.
> Columns which are mostly dates are imported as dates, but deviations from
> the date format are killed.
>
> I'd like to propose a solution: allow the option of importing from Excel
> ***entirely as character data***. This would enable the user to see what is
> actually in the file, and to convert to numeric through functions which
> compare with the original and flag deletions.
>
> The Import routine, as it is now, runs into mysterious S-plus conventions,
> which interact with mysterious Excel conventions, to produce serious data
> problems. I'd be pleased to hear better alternatives than importing as
> character data (which can't even be done now).
>
> This may only be a problem to people who use S-plus in an environment where
> Excel is used as a data recording or transfer tool, but surely there are
> more than two of us.
>
> Peter Wollan
> wollan@mayo.edu
>
> -----Original Message-----
>
> >Date: Tue, 28 Sep 1999 10:01:59 +1200
> >From: Michael Camilleri <branzmtc@branz.org.nz>
> >Subject: [S] Excel Import Problem
> >
> >I have been having some problems with importing Excel spreadsheets into
> >S-plus, and hope someone might have some insight (or at least avoid the
> same
> >problems).
> >
> >When importing a ragged array from an Excel spreadsheet (i.e. there is a
> >header, and many empty cells in the first 100 rows) S-Plus does not import
> >all the columns. It will stop importing columns after it encounters a
> column
> >of empty cells, or a block of empty cells in the first 100 rows (approx).
> >S-PLus appears to only look at the first 100 or so rows to determine the
> >number of columns to import.
> >
> >If I specify the number of columns (EndCol parameter) to match the actual
> >data, then it crashes.
> >
> >The only way around it seems to be to put in the Excel missing value codes
> >#N/A, which is a pain to do, and complicates my procedure which converts
> the
> >ragged array to a list.
> >
> >Be warned: this problem can cause data loss by failing to import data. Be
> >safe and put #N/A codes in empty cells in XL.
> >
> >Any ideas?
> >
> >Michael Camilleri
> >Energy and Environment Section
> >BRANZ
> >
> >Private Bag 50908
> >Porirua City
> >New Zealand
> >
> >Ph: +64 6 235 7600
> >Fax: +64 4 235 6070
> >
> >email: branzmtc@branz.org.nz
> >http://www.branz.org.nz
> >
>
> -----------------------------------------------------------------------
> This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
> send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
> message: unsubscribe s-news
--
Frank E Harrell Jr
Professor of Biostatistics and Statistics
Division of Biostatistics and Epidemiology
Department of Health Evaluation Sciences
University of Virginia School of Medicine
http://hesweb1.med.virginia.edu/biostat
-----------------------------------------------------------------------
This message was distributed by s-news@wubios.wustl.edu. To unsubscribe
send e-mail to s-news-request@wubios.wustl.edu with the BODY of the
message: unsubscribe s-news
|