Survey banner
The Dataiku Community is moving to a new home! New posts are now disabled and the community will shortly be in temporary read only mode: LEARN MORE

check if a dataset is currently building with python API

Solved!
tanguy
check if a dataset is currently building with python API

Is there a way to check programmatically if a dataset is currently being built? I am looking for this feature to avoid concurrent scenarios to "clash" upon a common dataset that is being built.

I would hope we could interrogate the dataset using dataiku's python API, much like we can interrogate if a scenario is running. However, I did not manage to find such a method by inspecting datasets methods (when using with dataiku.Dataset() and project.get_dataset()).

Is my request feasible?

 

 

0 Kudos
1 Solution
JordanB
Dataiker

Hi @tanguy,

You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_project('projectkey')
jobs = project.list_jobs()
#print(jobs)/note the dataset build name
for job in jobs:
    job_id = job['def']['id']
    job_name = job['def']['name']
    job = project.get_job(job_id)
    job_state = job.get_status()['baseStatus']['state']
    if job_name == 'Build dataset-name' and job_state == 'RUNNING':
        print(f"{job_name} is running")

 

Thanks!
Jordan

View solution in original post

2 Replies
JordanB
Dataiker

Hi @tanguy,

You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.

import dataiku
from dataiku import pandasutils as pdu
import pandas as pd

client = dataiku.api_client()
project = client.get_project('projectkey')
jobs = project.list_jobs()
#print(jobs)/note the dataset build name
for job in jobs:
    job_id = job['def']['id']
    job_name = job['def']['name']
    job = project.get_job(job_id)
    job_state = job.get_status()['baseStatus']['state']
    if job_name == 'Build dataset-name' and job_state == 'RUNNING':
        print(f"{job_name} is running")

 

Thanks!
Jordan

tanguy
Author

Thank you @JordanB!