Capture time stamp and duration for each individual activities in a job.
How can we retrieve the start and end timestamps for each individual activities that runs parallel in a job?
Answers
-
Hi,
I hope that you are doing well!:)
You could use the internal stats datasets: https://doc.dataiku.com/dss/latest/connecting/internal-stats.html to retrieve this data. A DSS admin can create internal stats datasets from +Dataset → Internal and Select : Job
By clicking preview, you will be able to create a dataset with the the start and end timestamps and other informations as follow:
You can also access this information through code using the python API, here you have an example covering reading the jobs status:
Let me know if this covers everything or if you have any further questions here:)
Best,
Yasmine
-
Yeah this definitely helps, but to be precise is there a way we get the individual start and end time stamps for all the datasets built in one single scenario?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
I presume you want to do this with the Python API right?
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
From the job you can get the job status which lists all the job activities:
import dataiku
client = dataiku.api_client()
project = client.get_project('PROJECT_KEY')
job_handle = project.get_job('JOB_ID')
job_handle.get_status()Sample activity:
{'activityId': 'some_activity_id',
'state': 'DONE',
'activityType': 'join',
'engineType': 'SQL',
'totalTime': 327,
'preparingTime': 73,
'waitingTime': 0,
'runningTime': 262}This doesn't give a straight forward strt/end date time although can calculate it. With the Activity ID you can also call:
job_handle.get_log(activity='ACTIVITY_ID')
And get the activity full log.
-
Hi Godwin Joshua,
Our company L3 analytics is a Dataiku certified partner. We focus in providing platform support and administration for Dataiku. If you guys need assistance implementing and supporting the platform please reach out to us at info@l3-analyticsinc.com / john@l3-analyticsinc.com
Thank you.Here is the solution.
From your flow go to: +DATASET, Internal, Internal Stats
From Type Drop Down Select "Jobs" and create dataset
You should see all your jobs by project with their time_start and time_end information
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
@John_wilson / @Yasmine_T The OP wanted Activities start/end times, not jobs.
-
thanks a lot! I am now closer to the solution with start and end time stamps! But how can we dynamically fetch these job IDs which keep changing after each run? Is there any other way.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,160 Neuron
Use the list_jobs() method:
jobs = project.list_jobs() for job in jobs: print(job)
PS: Please always use the code block (the </> icon) to post code so it can be copy/pasted.