Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Is there a way to check programmatically if a dataset is currently being built? I am looking for this feature to avoid concurrent scenarios to "clash" upon a common dataset that is being built.
I would hope we could interrogate the dataset using dataiku's python API, much like we can interrogate if a scenario is running. However, I did not manage to find such a method by inspecting datasets methods (when using with dataiku.Dataset() and project.get_dataset()).
Is my request feasible?
Hi @tanguy,
You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_project('projectkey')
jobs = project.list_jobs()
#print(jobs)/note the dataset build name
for job in jobs:
job_id = job['def']['id']
job_name = job['def']['name']
job = project.get_job(job_id)
job_state = job.get_status()['baseStatus']['state']
if job_name == 'Build dataset-name' and job_state == 'RUNNING':
print(f"{job_name} is running")
Thanks!
Jordan
Hi @tanguy,
You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
client = dataiku.api_client()
project = client.get_project('projectkey')
jobs = project.list_jobs()
#print(jobs)/note the dataset build name
for job in jobs:
job_id = job['def']['id']
job_name = job['def']['name']
job = project.get_job(job_id)
job_state = job.get_status()['baseStatus']['state']
if job_name == 'Build dataset-name' and job_state == 'RUNNING':
print(f"{job_name} is running")
Thanks!
Jordan
Thank you @JordanB!