check if a dataset is currently building with python API
Tanguy
Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 120 Neuron
Is there a way to check programmatically if a dataset is currently being built? I am looking for this feature to avoid concurrent scenarios to "clash" upon a common dataset that is being built.
I would hope we could interrogate the dataset using dataiku's python API, much like we can interrogate if a scenario is running. However, I did not manage to find such a method by inspecting datasets methods (when using with dataiku.Dataset() and project.get_dataset()).
Is my request feasible?
Best Answer
-
JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 296 Dataiker
Hi @tanguy
,You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() project = client.get_project('projectkey') jobs = project.list_jobs() #print(jobs)/note the dataset build name for job in jobs: job_id = job['def']['id'] job_name = job['def']['name'] job = project.get_job(job_id) job_state = job.get_status()['baseStatus']['state'] if job_name == 'Build dataset-name' and job_state == 'RUNNING': print(f"{job_name} is running")
Thanks!
Jordan