s-news
[Top] [All Lists]

Re: Importing from Excel

To: <P.J.Wells@open.ac.uk>, <s-news@wubios.wustl.edu>
Subject: Re: Importing from Excel
From: "Michael Schmidt" <Michael.Schmidt2@verizon.net>
Date: Wed, 22 Jan 2003 15:15:22 -0500
Importance: Normal
In-reply-to: <F9D44AC0D89CD311B9E3000629385C400B70E3CD@marconi.open.ac.uk>
Perhaps this isn't the most elegant answer available, but the 10 minute
answer is to import the 13 spreadsheets into a single Access database,
ensure that each company has a common identifier (number or string),
create a query based on the 13 tables, and re-export the data for
analysis.  

The fact that your spreadsheets are not identical in shape is irrelevant
using this approach.

I'd be happy to walk you through a query if you are unfamiliar with
Access.  It really isn't difficult, especially for just pulling data
together from separate tables. 

Cheers,
Michael  

> -----Original Message-----
> From: s-news-owner@lists.biostat.wustl.edu [mailto:s-news-
> owner@lists.biostat.wustl.edu] On Behalf Of P.J.Wells@open.ac.uk
> Sent: Wednesday, January 22, 2003 2:47 PM
> To: s-news@wubios.wustl.edu
> Subject: [S] Importing from Excel
> 
> My dataset of accounting data on 110,000 companies has its primary
> existence
> in the form of Excel spreadsheets.
> 
> Each year's data  is stored in a directory containing 13 spreadsheet
files
> (with columns for each of 105 variables) corresponding to the
one-digit
> SIC
> code applicable to the company, with supplementary files where the
number
> of
> firms under a given one-digit code exceeds the maximum number of rows
> allowed in Excel.
> 
> Thus, obtaining an S+ vector for a given variable for a given year
> involves
> extracting the relevant column from each spreadsheet and concatenating
the
> results.
> 
> Since the structure of the spreadsheets is slightly different for
> different
> years it is necessary to vary the column chosen for a given variable
> according to year for which it is sought, and also to switch between
> scripts
> for reading the spreadsheets and concatenating the results.
> 
> A function intended to do this (essentially a wrapper for
import.data() )
> is
> outlined at the end of this message: my problem is that the working
> version's results suffer from the following errors:
> 
> 1) length of resulting vectors does not equal the total number of
firms.
> 
> I suspect this may be related to the fact that NAs (in the form of
empty
> spreadsheet cells) are prevalent in the data; to try to accommodate
this I
> have set explicit StartRow and EndRow arguments to import.data() as
> appropriate, and have also experimented with different inputs to the
> Delimiters argument, but without improvement.
> 
> 2) vectors of 1990 data turn out identical to those for 1992 (i.e.
> all.equal(1990, 1992) returns T) !
> 
> I can't *see* anything wrong with the syntax of the relevant if() ...
else
> ... lines (and, yes, I have checked the spreadsheets concerned to see
that
> the data really is different...) and moreover the very similar section
> selecting the data to be extracted works OK.
> 
> I'm just about at my wits' end on this, and I *really* don't want to
get
> into writing an Excel macro to fiddle the data into a more S+ friendly
> form...
> 
> Although 131 messages turned up in response to a search on "import"
AND
> "Excel" in the S-news archive none of them appear to be relevant to my
> problems.
> 
> Any comments or suggestions gratefully received.
> 
> Julian Wells
> 
> School of Management
> The Open University
> Walton Hall
> Milton Keynes
> MK7 6AA
> United Kingdom
> 
> (01908) 654658
> +44 (1908) 654658
> 
> ===========================
> 
> Outline function:
> 
> make.data <- function(date, data){
> 
>       if (date==90) switch(data,
>               "bank" = {data.col <- 47},
>               ...             # etc.#
> 
>       else if (date==91 | date==93 | date==95) switch(data,
>               "bank" = {data.col <- 95},
>               ...             # etc.#
> 
>       else if (date==92 | date==94 | date==96) switch(data,
>               "bank" = {data.col <- 94},
>               ...             # etc.#
> 
>       # sets StartCol, EndCol = data.col, as needed
> 
> 
>       switch(data,
>               "bank" = {data.mode <- "numeric"},
>               # ...           # etc.#
> 
>       # sets data.mode [="numeric" for most]
> 
> 
>       file.string <- "C:\\..."
> 
>       if (date==90)   {import.data(DataFrame = "temp0",
>               FileName=paste(file.string, "0w.xls", sep=""),
> FileType="Excel",
>               StartCol=data.col, EndCol=data.col,
>               StartRow=3, EndRow=1920)
>       guiClose("data.frame", "temp0")
> 
>       # ...
> 
>       import.data(DataFrame = "temp9",
>               FileName=paste(file.string, "9w.xls", sep=""),
> FileType="Excel",
>               StartCol=data.col, EndCol=data.col,
>               StartRow=3, EndRow=7792)
>       guiClose("data.frame", "temp9")
> 
>       temp.0 <- as.vector(temp0$V1, mode=data.mode)
>       # ...
>       temp.9 <- as.vector(temp9$V1, mode=data.mode)
> 
>       output <- c(temp.0, ... , temp.9)
>       }
> 
>       else if (date==91 | date==92){
>               # as before, for different set of *.xls files
>       }
> 
>       else if (date==93 | date==94){
>               # as before, for a further set of *.xls files
>       }
> 
>       else{
>               # as before, for yet another set of *.xls files
>       }
> }
> 
> --------------------------------------------------------------------
> This message was distributed by s-news@lists.biostat.wustl.edu.  To
> unsubscribe send e-mail to s-news-request@lists.biostat.wustl.edu with
> the BODY of the message:  unsubscribe s-news


<Prev in Thread] Current Thread [Next in Thread>