How would you fetch just first and the last record from a sequential file without using operating system commands and where design should be platform independent. So following is one of the approach to address this problem:
Following points explain the above figure. However, if you want to know more data analysis, data science then you can follow
1. Read sequential file in parallel.
2. Enable "ROW NUMBER COLUMN" property of sequential file stage. This will give us the unique number of type BigInt into a separate column. Now name this column as ROWNUM. Remember user has to define this column into the output schema as this can be omitted in RCP is enabled into the design.
3. Now output of our sequential file stage would go into a transformer. Transformer stage will have two output links where a constraint will be applied on one input to only allow record where ROWNUM = 1. This will give us the first record of the file.
4. Now the second output of the transformer will be the input for a copy stage where an in-line sort will be performed on ROWNUM column in descending order. You can change the copy stage to run into sequential mode otherwise it would also work into parallel. You can drop the ROWNUM column here.
5. The sorted output of the copy stage would now be the input of another transformer stage where a stage variable StageVar will be defined as Integer (with 0 as initial value) and would be assigned by StageVar + 1 value. This will assign unique number to the records being passed from the transformer. A constraint will be defined in this transformer stage to filter only those records where StageVar = 1. This will give us the last record from the file.
6. Both the output from step 3 and 6 will be the input for funnel stage which will ultimately populating the any stage to store the first and last record from a sequential file.
This is one of the approach to handle this requirement. There can be lost of other way to address this situation. Even the approach given here can be tuned a bit more but this one would also do the trick.
If you have a more sophisticated and optimized way to handle it then please share it with the fellow readers.
https://www.beingmoneysmart.com
No comments:
Post a Comment