Talend tRunJob Component
A job in Talend is what we create to perform a certain desired operation. However, what if we have several such individual jobs and would like to call them form within one another? Imagine that you are the director of your organisation and you need to pull daily sales data from various shops located over the city and have it all inserted into a single excel file. What you would do is created individual Talend jobs that would connect to each shop’s server, pull data from it and append it to an excel sheet. You would have to do it one after the other since you are inserting all data into a single file. Parallel execution would not be an option as if a thread job tries to access an already open excel file, it would fail.
This concept is full proof ! Except there are a number of snags to it’s practical implementation :-
- You would need to run each job manually, one after the other, to load an entire day’s data. Now assuming you have just 5 shops around London, then it’s not too cumbersome a task. However, if you have 15 – 20 shops, then it’d be slightly more of a problem. And what if you have shops all over England?
- Since you are manually running every job, there is always a good possibility of you running one job many times and/or skipping over jobs. Doing so can lead to major inconsistencies amongst your data.
- Since we would have to un each job manually one after the other, a significant portion of our day would be spent fiddling or at least thinking about the jobs. This can be very counter productive.
An elegant solution to this would be to automate the entire process, not only so that it is more accurate, but also so that it can save you precious time and energy. But since you already have individual jobs, it doesn’t make sense to disregard those completely and create a new system from the scratch.
It is in precisely this situation that you will think about using the tRunJob component.
The tRunJob component can be used to execute a different job from within another job. Let us try to understand this better with the help of an example. Consider we have a simple delimited (CSV) file with data as follows :-
Below is a simple job to read data from it and display it onto the console :-
The jobs above filters data from the file to display the first 5 rows onto the console. Now let us attempt trying to run this job from another job. Create a new job in your Talend Open Studio and drag and drop the tRunJob component onto it. It should look like this on your window :-
Notice the red exclamation mark on top of your job, this indicates that something is wrong with it. When you hover over the component it should give a message to you like this :-
As the message clearly indicates, it requires you to specify some job that the component should run. The job name can be provided either using a Context variable or direct input. For now, double click the job to open it’s Component Window. Find the Job field. There would be a disabled text box by its side , followed by a small button labeled with the ellipses.
Once clicked, you would get a pop up window showing all jobs currently present in your repository. Select the job that you wish to run, in this case, I’ll select the simple job that I’ve created :-
Click okay and hit the F6 key to run the job. You would get the same output as your original job.
Lets have another look at the Component Window. Next to the ellipses button that we clicked, we have the option to select the desired version of the job. The default value is always ‘Latest’, but you can change it to a fixed value if you like. Besides it is the Context field that lets you specify from which Context should we pull the job name variable. This is blank for now and it to be used only if we would be using context variables.
Other than these following settings can be found in the Component Window :-
|Basic Settings||Schema and Edit Schema||It is a description of the number of columns to be processed and passed on to the next component. The schema can be either built-in or stored in your Repository.
You can click on Edit schema to make changes to the schema . If the current schema is of the Repository type, three options are available:
· 1 .View schema: choose this option to view the schema only.
· 2.Change to built-in property: choose this option to change the schema to Built-in for local changes.
· 3.Update repository connection: choose this option to change the schema stored in the repository and decide whether to propagate the changes to all the Jobs upon completion. If you just want to propagate the changes to the current Job, you can select No upon completion and choose this schema metadata again in the [Repository Content] window.
This component also offers a dynamic schema feature. This feature is designed to retrieve unknown columns of the table and is intended to be used for that purpose only. It is not intended to be used to create new tables
|Copy Child Job Schema||Fetch the child job schema.|
|Use Dynamic Job||This feature can be used to call and execute multiple jobs. However, on selecting this option, only the latest version of the jobs can be called upon. An independent process will be used to run the subjob. The Context and the Use an independent process to run subjob options disapper.|
|Use an independent process to run subjob||In issues where a large amount of CPU memory is required, this feature can be selected to have an independent process run your subjob.|
|Die on child error||With this feature unselected, the parent job does not fail if the child job were to fail during execution.|
|Transmit whole context||Select this check box to get all the context variables from the parent Job. Deselect it to get all the context variables from the child Job.
If this check box is selected when the parent and child Jobs have the same context variables defined:
· variable values for the parent Job will be used during the child Job execution if no relevant values are defined in the Context Param table.
· otherwise, values defined in the Context Param table will be used during the child Job execution.
|Context Param||· This feature is used to change the value of selected context parameters. Click the [+] button to add the parameters defined in the Context tab of the child Job.The values defined here will be used during the child Job execution even if Transmit whole context is selected.
|Advanced Settings||· Propagate the child result to the output schema||· This feature propagates the output data stored in the buffer memory via the tBufferOutput component in the child Job to the output component in the parent Job.|
|Print Parameters||If this feature is selected, the internal and external parameters are displayed in the console.|
|tStatCatcher Statistics||If this feature is selected, processing metadata is collected at the job level as well as the individual component level.|
Let’s try using tRunJob to propogate results to another component. For this I have duplicated my original simple read job and made a minor modification as can be seen in the image below :-
I have replaced the final tLogRow with tBufferOutput_1. This would save the results obtained in the job onto the memory buffer which we would be using in our parent job. But to do this, we would need to modify our parent job as well. The below image would show the modifications I have made to the parent job :-
To achieve this, first drag and drop a tMap component from the palette onto your job design area. Then open up your tRunJob’s Component properties window and click on Copy Child Job Schema. You would get a pop-up window showing the child job’s schema as can be seen in the image below :-
Click okay and close the window. Then right-click over the tRunJob Component to select Row and then Main. Drop the ensuing line onto the tMap component. Then double click the tMap component to open the tMap Editor window. You should be able to see your tRunJob component’s schema on the left hand side. On the right-most section, click the [+] sign to add an output table to add an output table. You can now drag and drop every source column onto the output table as can be seen in the image below :-
Note that we have added and additional column, LineNo, onto this schema. To add this, click on the [+] on the bottom right hand side of the editor. This adds an additional column to your current output table. You can add the name of the column on the right side of the output table. On the left side, when you click on the empty field, you would notice a button appearing on the right hand side with the text as ellipses. Click on it to open another window.
This is the Expression Builder wherein you can provide default expressions for your column row values. For now select the Numeric Category, click on the sequence function and click on Ok. Select Ok to exit then TMap editor window. Then simply add a tLogRow component and join tMap’s output table to it. You can do this by right clicking the tMap component and selecting the name of our output table. You can also select the Table mode in your tLogRow component window to display the log in a better format on the console. Save the job and press F6 to execute it.
You should get an output similar to above.