Search This Blog

29.1.11

DataStage Interview Questions - Part 2

1. What is "Parallel Processing"?
In simple words, parallel processing means executing the jobs on multiple processors. Performing ETL simultaneously.



2. What are the types of parallelism?
There are only two types of parallelism pipeline and partition.

3. How does degree of parallelism determined?
Degree of parallelism determined by Configuration file.

4. Can different configuration files be used in different jobs?
Yes.

5. What is the benefit of using different configuration files in different jobs?
To use different hardware in different jobs.

6. Can we force objects to run to a particular CPU?
No. The request originally goes to operation system and then operation system choose the CPU.

7. If a job contains a transformer stage in it then will it take longer to compile?
Yes.

8. What is the use of virtual dataset?
Virtual dataset is used to connect output of one operator to another operator's input.

9. What is the use of $APT_DUMP_SCORE environment variable?
Set this variable to output the score the job log.

10. What is the execution order of DataStage stage variables?
Top to down.

2 comments:

  1. hi all,
    i want to pass the current date function in job level parameters like
    /datastage/seqfiles/currentdate()_reject.txt
    i want to store my output like this format:20110515_reject.txt
    and next day 20110516_reject.txt.........
    please

    ReplyDelete
  2. Hi u can do this with user variable activity
    here give the file name then : field()by that u can extract the date then :_reject.txt

    ReplyDelete