How to capture Dataiku job details and insert in a control table in Redshift
Hi All
I have developed a simple pipeline, which will read data from source-->prepare--write to target.
Here both SORUCE1 and TARGET 1 are Redshift tables
SOURCE1 --> Prepare-->TARGET 1
Now i want to automate this flow using scenario SCENARIO 1 using SQL Trigger
SQL Trigger will check the date change in SOURCE 1 and will run BUILD STEP on TARGET 1 dataset
(Note: Here only RECURSIVE FORCE BUILD mode is only able to pull newly added data from SORUCE1 to TARGET1 other BUILD mode like Smart, Missing,non recursive data are not detecting data changes in SOURCE1, not sure why and force build is overwriting target table than appending)
Once job has run i want to capture details like below and insert them into a CONTROL_TABLE in REDSHIFT with same schema
JOB_NAME,JOB_STATE,JOB_START_TIME,JOB_END_TIME,RECORD_COUNT (TARGET1), RECORD_CREATED_DATE (current-date)
How can i achieve this using Triggers, Steps, and Reporters under scenario ?
Right now i see an option to use manage_job api under which we have list_jobs() which gives this infrmation but i want approach from GUI than writing PYTHON script
Thanks
Rajesh
Answers
-
Since you're already using scenarios, why not add an extra step to run an SQL script and insert the row into your control table after the first step runs?
-
Hi ,
That would a good option but i need to extract job details first like job_name,start time ,end time etc
i am not sure from which metadata tables i can pull this information. Any help around this would be really appreciated
Thanks
Rajesh
-
If you're an administrator, you have access to internal metrics datasets. In the flow top right, click \
+ Dataset > Internal > Internal Stats.
You should be able to find there info about the jobs submitted and their status. So perhaps you'll need to use a combination of those elements in order to get the info you require.