I have a flat file data source - call it "order". Its a text file that looks something like this:
ORDERNAME| Example1
CUSTOMER|Acme Industries
COST|11611
ITEMS
B1|550S162-43(33)|35.708|1|636
T1|550S162-43(33)|20.967|1|636
T2|550S162-43(33)|20.967|1|636
W1|350S162-43(33)|1.330|2|501
W21|350S162-43(33)|1.330|1|911
W2|350S162-43(33)|3.044|2|501
W20|350S162-43(33)|3.044|1|911
I would like to write the metadata to a [order header] table and the ITEMS to a [order detail] table. Can someone direct me to a example of something similar?
There are at least two ways to do this, either by using a custom script transform or by reading the source file twice, one for header rows, once for details. I posted a quick comparision of the options and a sample of the second approach on my blog (Pardon the site - it's still under construction -http://agilebi.com/cs/blogs/jwelch/default.aspx). You didn't specify if the source file contains multiple orders, which does have some impact on the solution.
I'm sure there are other approaches for this as well. I'd be interested to hear about other techniques, as I've run into this problem a few times myself, and am not completely satisfied with the options I've explored.
|||You can use a conditional split instead of a custom script.|||jwelch wrote: There are at least two ways to do this, either by using a custom script transform [...]
I ran into a problem with the conditional split when my header and detail rows had a different number of columns. How'd you work around that with the conditional split?
|||jwelch wrote: I ran into a problem with the conditional split when my header and detail rows had a different number of columns. How'd you work around that with the conditional split?
Normally, I read in the full record as one big text string and work with it that way.|||Yeah, I touched on that in the blog post. Guess I just don't like string manipulation |||Since my original post -- I've tried writing a class that reads the file into a StreamReader Object and then moves the data into a DataSet. Although not complete, this approach may turn out to be the simplest. If successful, I'll post the code once its complete.|||
JohnBGood wrote: Since my original post -- I've tried writing a class that reads the file into a StreamReader Object and then moves the data into a DataSet. Although not complete, this approach may turn out to be the simplest. If successful, I'll post the code once its complete.
The only problem with this approach is this:
Building a class (ie programming) isn't really simple to those who don't build code. It might be simple for you, but perhaps not for others -- just something to consider|||
I do understand your points. Writing code from scratch to do ETL instead of using an ETL tool such as SSIS does seem counterintuitive.
With that said, I did finally manage to get my C# ETL utility working correctly. To me its an easier approach simply because I'm more comfortable with C# File I/O than SSIS. My intial thought was to use SSIS, but after reading up a little on working with nested data, it seemed to me that an SSIS approach would require more time (at least for me) to implement.
As far as the raw data file goes... it comes from a 3rd party system - so we don't have alot of control over what they provide.
Thanks for taking the time to answer my questions. Very much appreciated. I'll post the code for my C# routine tomorrow AM.
|||Here's the code in C#...
||| public class BOM
{
public DataSet GetOrderInfo(string logFileName) //bring the datafile into a dataset.
{
DataTable tblOrderInfo = CreateOrderInfo(); //procedure that creates a datatable
DataTable tblConnectors = CreateConnectorInfo();
DataTable tblMembers = CreateMemberInfo();
DataSet bom = new DataSet();
if (File.Exists(logFileName))
{
FileStream fs = new FileStream(logFileName, FileMode.Open, FileAccess.Read, FileShare.None);
StreamReader sr = new StreamReader(fs);
string Line;
string[] items = new string[50];
int i = 0;
while ((Line = sr.ReadLine()) != "ITEMS")
{
items = Line;
string[] columns = Line.Split(new char[1] { '|' });
DataRow newrow;
newrow = tblOrderInfo.NewRow();
int j = 0;
foreach (string Data in columns)
{
newrow[j] = Data;
j++;
}
tblOrderInfo.Rows.Add(newrow);
i++;
}
while ((Line = sr.ReadLine()) != null)
{
items = Line;
string[] columns = Line.Split(new char[1] { '|' });
DataRow newrow;
newrow = tblMembers.NewRow();
int j = 0;
foreach (string Data in columns)
{
newrow[j] = Data;
j++;
}
tblMembers.Rows.Add(newrow);
i++;
}
bom.Tables.Add(tblOrderInfo);
bom.Tables.Add(tblItems);
fs.Close();
return bom;
}
else
{
return null;
}
}
No comments:
Post a Comment