Friday, February 24, 2012

Importing ASCII file using DTS with no row delimiter

Hi,

I am trying to import a file using DTS that does not have any row delimiter. It is a file that was exported from an OLD OLD OLD Macintosh database program (now defunct) called TouchBase. The file uses quotes as text seperators, and tabs for the column delimiters.

Unfortunately, from what I can tell, there are now row delimiters, so the data just keeps going and going, ie like this:

"Bob Smith" "1234 Market Street" "San Diego" "CA" "Josh Smith" "1212 Anywhere Street" "San Diego" "CA" "Jane Smith" "1234 Jane Street" "San Diego" "CA"

Any idea how I can get this to import, or some way to add the delimiter to the file?May be you can use third party tools to the text file to add row delimiter.

And refer to this SQLDTS (http://www.sqldts.com/default.aspx?6,101,257,0,1) link to accomplish the task.|||I can't seem to find any 3rd party tools using google. I've opened the file using a freeware hex editor, and used global search and replace to add one, but then I found out that some records meet the search pattern i chose in the middle of the record, not the end, so that screwed the whole file up..

Very frustrating because I have multiple problems with the file:

#1 - The User used CRLF in a field called "Notes" which is just a free flowing field where the user can type as much or as little as they want in a little diary on the record.

#2 - The User also used tabs in the notes fields, which is throwing off the delimiter for the column.

#3 - The User used double quotes " inside the notes field so that is also throwing it off.

UGH.. Not sure how to fix this mess.|||how big is the file?|||File is about 2.5 megs.

Finally got it.

Used a freeware hex/ascii editor called xvi (search google for hex editor, it's the first one that comes up).

What I did first was stip all CRLF out of the file, since they were only in places where they shouldn't be, not at the end of each row.

Then I went through and did a global replace on all "<TAB>" and changed it to {~} (I first made sure she wasn't using those characters any where in the file!)

Then, I went and stripped all the tabs and all the quotes from the file.

Then i went back and replaced the {~} with "<TAB>" so they would be where they were supposed to.

Finally, I went and I added the CRLF to the end of each record. The way I did this was the last field in each record was a "Date last modified". I asked the client if they needed this, and they said no.

So, I took and used the "wildcard" feature in XVI and did a global replace. Searched for:

"MM/DD/YY"<TAB> and replaced with "MM/DD/YY"<CR><LF>

However, it was kinda tricky, I had to actually do 4 global replaces, because it's an old mac database, and it stored the data in M/D/YY format, MM/D/YY format, M/DD/YY format, and MM/DD/YY format, so there were 4 possible combinations of the way the date was stored for date last modified.

Thank god it worked!

I had 4 errors when importing, and of course SQL server aborts the import if there are any errors. I had to go back in the file and scroll down to the aproximate part where the error was, and look for the problem. Turns out there were 4 records that were missing a field, and that was throwing the whole import off, so I added the field in where apropriate, or just deleted the record.

All in all, this job took me about 15 hours to scrub the file, do all my home work, and get everything together.. Definately a great learning experience for me, and VERY rewarding that I finally got her data to her!!!!|||If you need to do alot of this type of, I would get Codewright. I think Borland now owns the product. It is pricey but well worth it.

No comments:

Post a Comment