DataStage Interview Questions - Part 1

Posting few questions to help you preparing for an DataStage interview:

1) Explain the architecture of IBM information server?

2) What is the difference between SMP and MPP?

3) How to check the current version of the DataStage application?

4) In what order DataStage executes Stage variables?

5) How constraint is different from Stage variables?

6) How to perform a date difference in transformer stage?

7) What is sparse lookup and how it is different from the normal lookup?

8) How sparse lookup impacts the performance?

9) What would be the first three steps when you asked to optimize any existing job?

10) Consider a scenario where you have 2 different project set up in a single server. Both the projects are similar but differ in data. Can we use a dataset created in one project in another one? If not then why?

11) Upto what size of record can be handled by DataStage? What would happen if Oracle table's record size if greater than size supported by DataStage record? How would you handle this?

12) How would you delete the Datasets in bulk?

13) What do you understand by operator and how it is associated with a stage?

14) When we should use APT_ORACLE_LOAD_OPTION environment variable?

15) How load method is different from upsert?

16) Where dataset gets stored in the DataStage setup? What is the significance of scratch area?

17) What is fork join?

18) How would you clear the locks from a job?

19) Can a DataStage job be delete from director client?

20) How would you turn the warning into information into director client?

21) Can we connect to lotus notes from DataStage? Does IBM support this interface?

22) What can be set from tunable page of administrator client?

23) What is runtime column propagation?

24) Which is fast a transformer or a filter?

25) How the score changes when you have a transformer in the job?

26) Why do you need a C++ compiler to be installed on the server?

27) Can you run multiple copy of the same job at same time? How would you differentiate the two copies then?

28) How would you catch Oracle exceptions in case you are using Oracle Enterprise Stage?

29) How would you kill a particular DataStage session?

30) In what kind of requirement you would use Analyzer?

31) How would you perform impact analysis in DataStage?

32) What is the drawback of using peek stage permanently in the job design?

33) What is the difference between BuildOps and Wrapper stages?

34) Can a column generator stage be the first stage in any job design?

35) List down the name of the services which needs to be started for successful use of DataStage?

36) how can you stop and start DataStage services in Windows environment?

37) How can we only export the executable of a job?

38) Can duplicate records be removed just by using sort stage?

39) We have job which address 50 million records and take 4 minutes to run. If we add a copy stage in between would it impact the performance of job?

40) Can one DataStage project have multiple configuration file?

41) How would you release resource locks?

42) How would you turn a warning into an info?

43) What is checkpoint with respect to sequences? How would you set this property for all sequences in a project in one shot?

44) Can RCP property be set for all jobs?

45) Does shared container support RCP?

more questions on the way...

Error and Fix Travel

Phasellus facilisis convallis metus, ut imperdiet augue auctor nec. Duis at velit id augue lobortis porta. Sed varius, enim accumsan aliquam tincidunt, tortor urna vulputate quam, eget finibus urna est in augue.


  1. Answer for Q3 is:

    Q - How to check the current version of the DataStage application?

    A - Version can be checked at version.xml which is available at \IBM\ location. This file also list the patches applied on the application.

  2. Answer for Q 23) "What is runtime column propagation?" is

    RCP is a property which allows the propagation of the columns during runtime. It means when if we do not load the column information then the stage will derive the column definition from the source ie database or schema files.

  3. our job has more coloumns which not difined in the metadata,if the run time coloumn propagation (RCP)is enable,those extra coloumns are propagated into the rest of the job

  4. srinivas reddy,
    q1)Architecture of ibm information server.
    it has two components
    1)server components
    datastage server
    datastage repository
    package installer

    2)client components
    datastage administrator
    datastage manager
    data stage director
    data stage desiginer