Search This Blog

7.6.11

Tips - How to prepare for DataStage interview Part 2

Tips explained in part 1 would definitely help one preparing for interview. Now following are few more tips to help friends around preparing for DataStage interviews:


1. Know Designer and Director Client - There could be questions in the interview which might not be relevant to implementation of a job or job design. It could be about various other features of the tool which you use for the development or for its maintenance. For example Designer client has a feature which can be used to identify the dependencies between the job. Find or advance find feature can be used for this purpose. Another example is that reports can be generated about the job which will bring the design and its description along with other properties in the xml format.

Few features of Director client where one can see the % CPU utilization by a job, director client can also help freeing the resources. Try whether one can delete a job from director client or not (you can) and what different rights you need. Knowing these things will help you understand the tool.

2. How a Job run - Now things getting serious here, this is the area where you have to be very careful. If you know things then only go ahead. Question may be like what actually happens when you run a DataStage job. Answer is very simple, DataStage first validate the job this includes the stages and it properties, links, logic and its validity if written inside a transformer and generates the score. In the second step it attach the config file into the score it generated after compiling the job. Remember the score can have C++ code into it if you have used the transformer into the code. Now once the config file is attached the code it ready to be executed and processes can be created. Creation of a process can be known as fork. Number of processes depends on the number of stages and the number of nodes defined in the config file. Remember allocation of the processes to different processors or to one processor is actually a operating system's job. When the processes gets executed different operators works as per their properties.

One should also know that when fork happens through conductor node and next is session leader which manages the individual stage level processes those known as players. Session leader help the communication between the conductor node and players. Now Interviewer can ask question about how many processes a job will create. You should understand the job in the following manner:
a) How many stages it has
b) How many nodes config file contains
c) Type parallelism used in the stages, if any stage running into sequential mode or so. Based on above points, calculation of the processes should be done.

3. Know operators - It is the operator which actually do the trick. There may be one operator attached to a stage. So know the operators which stage is associated to which stage and so on.

4. Understand the Job score - Try reading the score of the job. Start with a simple job with less number of stages. Identify how DataStage creates the code behind the scene to execute the design. Try to identify if you can spot any operator which from your view shouldn't be there but it is there. For example DataStage try to insert sort automatically in between in order to generate the correct results. Identify if you can spot any such occurrence. Also try to see how links have been named. How many virtual datasets you can spot there. DataStage creates virtual datasets for the links used into the job. Interviewer might ask you to analyse a sample code and answer few question out of it.

But how would you see the score of a job? This could be trillion dollar question for some. So don't worry there is a environment variable in DataStage APT_DUMP_SCORE set this to TRUE and you can see the score into log.

5. Clear you thoughts on Partitioning, Collection and Sorting - These are three completely different concepts. Try different methods of Partitioning and Collection. You should be so familiar with these techniques that you should identify them by just verifying their symbol on the link. Interview could give you a sample job design and may ask what "fan out" or "bow tie" means on the link.

No comments:

Post a Comment