|
Hello S-users
I got many ideas on how to treat my fixed format data with different
line length. Thanks to Ursula Becker, Daniel Erdin, Martin Reist, James
Holtman, Nick Ellis and Jean Adams.
Most of the ideas consisted of reading the data into other programs
such as Access, SAS, SPSS to get the data into the right format and export
the data after having it changed. As my true dataset is very large the
Access solution was not workable.
James Holtman provided a perl script solution, which is very interesting:
Here is the perl script
to process your data into a 'comma separated' input. This is much easier to
process:
use strict;
$" = ','; # use ',' (comma) as output separator
while (<>){
chomp;
my @a = unpack("a2 a1 a8 a2 a2 a1 a3 a3",$_);
$#a = 7; # force to 8 fields if some are missing
print "@a\n";
}
Here is the output of the program for your data:
C:\> perl test.pl tempxx
13, ,19960425,14,01, ,234,abc
11, ,19950425,14,02,,,
15, ,19930425,14,12,,,
You can get perl for Windows free off the net at:
http://www.activestate.com/ActivePerl/default.htm
What I finally did is the following (I happened to find a hint as I
searched S-News at http://lib.stat.cmu.edu/cgi-bin/iform?SNEWS )
make.fields(c(1,3,4,12,14,16,17,20), "test1.txt", "test.new", separator=",",
blanks.out=F)
13, ,19960425,14,01, ,234,abc
11, ,19950425,14,02
15, ,19930425,14,12
import.data("example", "test.new",
FileType="ASCII",
TargetStartCol=1, EndCol="END",
Format="%2s, %1*, %8f, %2f, %2f, %1*, %3f, %3s",
Delimiters=",")
> example
V1 V2 V3 V4 V5 V6
1 13 19960425 14 1 234 abc
2 11 19950425 14 2 NA
3 15 19930425 14 12 NA
Best regards
Regula
--
Dr. Regula Schwager-Suter
Institute of Animal Science
Swiss Federal Institute of Technology
ETH-Zentrum CLU C5
8092 Zurich
Switzerland
phone +41 (0)1 632 39 56
fax +41 (0)1 632 12 60
e-mail regula.suter@inw.agrl.ethz.ch
http://www.tz.inw.agrl.ethz.ch/~suter
|
|