Friday, February 24, 2012

Importing CSV files

Hello,

I need to import a bunch of .csv files. The problem I am having is the
"non data" information in the files creating bogus rows and column
definitions. Here is an example of the csv file.

CBOT - End-of-Day Futures Bulk Download 2001.
2 Year U.S. Treasury Notes Futures

DateSymbolMonth CodeYear CodeOpen
20010103ZTH2001102.09375
20010104ZTH2001102.03125
20010105ZTH2001102.28125

In this case, there are bogues rows created with the text at the
beginning of the file, and also the column names get placed into a row
as well. My question is; how do you import the file and strip out the
"non-data" data? so that only the actual data gets inserted into the db?

Thanks,

TGru

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!Write a dotnet or vb exe to loop thru your csv file and place valid
rows in a separate file, then run bcp using this new file.|||Use BCP or BULK INSERT with the FIRSTROW (-F) option.

--
David Portas
SQL Server MVP
--|||tgru (tgru@.devdex.com) writes:
> I need to import a bunch of .csv files. The problem I am having is the
> "non data" information in the files creating bogus rows and column
> definitions. Here is an example of the csv file.
> CBOT - End-of-Day Futures Bulk Download 2001.
> 2 Year U.S. Treasury Notes Futures
> Date Symbol Month Code Year Code Open
> 20010103 ZT H 2001 102.09375
> 20010104 ZT H 2001 102.03125
> 20010105 ZT H 2001 102.28125

That doesn't look like a CSV file to me...

> In this case, there are bogues rows created with the text at the
> beginning of the file, and also the column names get placed into a row
> as well. My question is; how do you import the file and strip out the
> "non-data" data? so that only the actual data gets inserted into the db?

Was it only the column headers, you could use -F 2 with BCP to specify
that the BCP is to start with the second record. (I believe -F is the
option. Double-check with Books Online.) But the introducing text is
more difficult to handle. Assuming that there are no tabs in the text,
BCP may consider the row as an error. Then again, you can control how
many errors BCP may accept, so if you only can make sure that you get
in sync, it may be possible.

However, hharry's suggestion that you write a program that strips the
header info, is probably a safer way.

--
Erland Sommarskog, SQL Server MVP, esquel@.sommarskog.se

Books Online for SQL Server SP3 at
http://www.microsoft.com/sql/techin.../2000/books.asp|||"tgru" <tgru@.devdex.com> wrote in message news:421f97a6$1_2@.127.0.0.1...
> Hello,
> I need to import a bunch of .csv files. The problem I am having is the
> "non data" information in the files creating bogus rows and column
> definitions. Here is an example of the csv file.
> CBOT - End-of-Day Futures Bulk Download 2001.
> 2 Year U.S. Treasury Notes Futures
> Date Symbol Month Code Year Code Open
> 20010103 ZT H 2001 102.09375
> 20010104 ZT H 2001 102.03125
> 20010105 ZT H 2001 102.28125
>
> In this case, there are bogues rows created with the text at the
> beginning of the file, and also the column names get placed into a row
> as well. My question is; how do you import the file and strip out the
> "non-data" data? so that only the actual data gets inserted into the db?
> Thanks,
> TGru
> *** Sent via Developersdex http://www.developersdex.com ***
> Don't just participate in USENET...get rewarded for it!

Take a look at AWK (I use the GNU version GAWK).

This is a old old text stream (file) processing utility that uses regular
expressions to let you filter and/or reformat data.

You could have it create a new import file that ignores blank lines and
headers (ie. starts with "CBOT').

No comments:

Post a Comment