How to capture Dataiku job details and insert in a control table in Redshift

Options
rsingamsetty
rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭

Hi All

I have developed a simple pipeline, which will read data from source-->prepare--write to target.

Here both SORUCE1 and TARGET 1 are Redshift tables

SOURCE1 --> Prepare-->TARGET 1

Now i want to automate this flow using scenario SCENARIO 1 using SQL Trigger

SQL Trigger will check the date change in SOURCE 1 and will run BUILD STEP on TARGET 1 dataset

(Note: Here only RECURSIVE FORCE BUILD mode is only able to pull newly added data from SORUCE1 to TARGET1 other BUILD mode like Smart, Missing,non recursive data are not detecting data changes in SOURCE1, not sure why and force build is overwriting target table than appending)

Once job has run i want to capture details like below and insert them into a CONTROL_TABLE in REDSHIFT with same schema

JOB_NAME,JOB_STATE,JOB_START_TIME,JOB_END_TIME,RECORD_COUNT (TARGET1), RECORD_CREATED_DATE (current-date)

How can i achieve this using Triggers, Steps, and Reporters under scenario ?

Right now i see an option to use manage_job api under which we have list_jobs() which gives this infrmation but i want approach from GUI than writing PYTHON script

Thanks

Rajesh

Answers

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    Options

    Hi @rsingamsetty

    Since you're already using scenarios, why not add an extra step to run an SQL script and insert the row into your control table after the first step runs?

  • rsingamsetty
    rsingamsetty Dataiku DSS Core Designer, Dataiku DSS & SQL, Registered Posts: 18 ✭✭✭✭✭
    Options

    Hi ,

    That would a good option but i need to extract job details first like job_name,start time ,end time etc

    i am not sure from which metadata tables i can pull this information. Any help around this would be really appreciated

    Thanks

    Rajesh

  • Liev
    Liev Dataiker Alumni Posts: 176 ✭✭✭✭✭✭✭✭
    Options

    If you're an administrator, you have access to internal metrics datasets. In the flow top right, click \

    + Dataset > Internal > Internal Stats.

    You should be able to find there info about the jobs submitted and their status. So perhaps you'll need to use a combination of those elements in order to get the info you require.

Setup Info
    Tags
      Help me…