|
Earlier in the week I posted the following:
A am creating a subset of data from a data matrix as follows:
theSubset=theData[theData$Fieldname=="A string",]
For some reason, a row of "NA" values is inserted into theSubset, screwing up the statistics I want to do on that subset of data. I can't figure out why it is doing this. There are no "NA" values in the field that I am using to subset the data. The data set was imported from an Excel spreadsheet.
Solution:
The following worked:
theSubset=theData[as.character(theData$Fieldname)=="A string",]
After looking at the data more closely, I found that the column Fieldname had an NA value in it. If it was not converted to a character field first a row of NA's was inserted into theSubset. Also, the list of logical values returned from the comparison were all T and F values, and one NA (for the one comparison against the NA value). levels(theData$Fieldname) did not indicate the presence of an NA value. I had to manually sort the data to find it.
I would not expect the row of NA values to be insterted in theSubset in any case, but that seems to be the way factors work.
Thanks to Stephen Smith at the Bedford Institute of Oceanography in Dartmouth Nova Scotia for help with this.
Jim Daley Coldwater Fisheries Unit Leader NYSDEC Bureau of Fisheries 625 Broadway, 5th Floor Albany, NY 12233-4753 jgdaley@gw.dec.state.ny.us
|