How to capture Dataiku job details and insert in a control table in Redshift

rsingamsetty · June 2020

Hi All

I have developed a simple pipeline, which will read data from source-->prepare--write to target.

Here both SORUCE1 and TARGET 1 are Redshift tables

SOURCE1 --> Prepare-->TARGET 1

Now i want to automate this flow using scenario SCENARIO 1 using SQL Trigger

SQL Trigger will check the date change in SOURCE 1 and will run BUILD STEP on TARGET 1 dataset

(Note: Here only RECURSIVE FORCE BUILD mode is only able to pull newly added data from SORUCE1 to TARGET1 other BUILD mode like Smart, Missing,non recursive data are not detecting data changes in SOURCE1, not sure why and force build is overwriting target table than appending)

Once job has run i want to capture details like below and insert them into a CONTROL_TABLE in REDSHIFT with same schema

JOB_NAME,JOB_STATE,JOB_START_TIME,JOB_END_TIME,RECORD_COUNT (TARGET1), RECORD_CREATED_DATE (current-date)

How can i achieve this using Triggers, Steps, and Reporters under scenario ?

Right now i see an option to use manage_job api under which we have list_jobs() which gives this infrmation but i want approach from GUI than writing PYTHON script

Thanks

Rajesh

Liev · June 2020

Hi @rsingamsetty

Since you're already using scenarios, why not add an extra step to run an SQL script and insert the row into your control table after the first step runs?

rsingamsetty · June 2020

Hi ,

That would a good option but i need to extract job details first like job_name,start time ,end time etc

i am not sure from which metadata tables i can pull this information. Any help around this would be really appreciated

Thanks

Rajesh

Liev · June 2020

If you're an administrator, you have access to internal metrics datasets. In the flow top right, click \

+ Dataset > Internal > Internal Stats.

You should be able to find there info about the jobs submitted and their status. So perhaps you'll need to use a combination of those elements in order to get the info you require.

How to capture Dataiku job details and insert in a control table in Redshift

Answers

Categories

Setup Info

Tags