Tuesday, August 29, 2006

Flat File Schema and Early Termination

For those who have worked with Flat File Schemas, must have used or at least heard the property called "Early Termination". This Property is exposed with Biztalk SP2 but otherwise it can only be edited using Text editor. There is some kind of confusion as if what happens if the "Early Termination" is set to True.

Let me illustrate it with an example:

Suppose FF schema have 5 fields (A,B,C,D and E) and each 10 in length. The file looks something like:

AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEEEEE.

Now if we set up "Early Termination" as True. The following scenarios are parsed properly and rest fail in Biztalk schema.

1)AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDEEEEEEE (Parsed)
2) AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDDE(Parsed)
3) AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDDDDD (Not Parsed)
4) AAAAAAAAAABBBBBBBBBBC CCCCCCCDDDDDDDDDDE (Parsed)

So this property only allows flexibility for the last field in the schema.

I faced a situation in a project where Source system sent a file that looked something like:

AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDD

It sent less than 5 fields (say 4) and the last field in the file (not in the schema) was partially or completely filled. In Either scenario it will be rejected. Problem was that Source System was very inflexible and would not change the file to suit Biztalk parsing needs.

There were different options that I thought of. I knew that we have to pad the file but question was how to do that. Since Biztalk schema does not pad the incoming file but only pads the outgoing file. And we were receiving the file so we could not use padding.

The other option was do create a pipeline component that would read the stream and pad it before it hits the Biztalk FF schema. Or create some kind of pre processor (windows service) that would do that padding while reading the stream.

It seemed as stream manipulation would be the only way to get around this issue but I wanted to avoid it as I am not a BIG fan of stream manipulation because of inherent dangers associated with it.

So, I thought hard and it clicked to me that I could use the padding property of Biztalk schema to achieve what I needed. All I need to do is to turn the padding property "inside out". In other words. I did the following:

1) Created a simple orchestration with one receive and send shape.
2) Defined a schema with one element only. This way one can take 5 fields together as single field and then use padding property to pad from the last position of incoming field to the maximum positional; length of the complete record in the schema. 50 in our case.
3) In the schema set Padding as Left justified and use Hexadecimal value of "Tab".
4) Define a single message with above schema
5) Create a send pipleline that refers to this single schema with padding.
6) Recv and Send Port will refer to same message.

Complete the orchestration and then drop the file. For a file like:

AAAAAAAAAABBBBBBBBBBCCCCCCCCCCDDDDDDD

The output will be padded to complete length of 50.

Then run this file through original FF schema with 5 fields. It will be parsed properly and will give you the right XML.