check if a dataset is currently building with python API

Tanguy
Tanguy Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2023 Posts: 124 Neuron

Is there a way to check programmatically if a dataset is currently being built? I am looking for this feature to avoid concurrent scenarios to "clash" upon a common dataset that is being built.

I would hope we could interrogate the dataset using dataiku's python API, much like we can interrogate if a scenario is running. However, I did not manage to find such a method by inspecting datasets methods (when using with dataiku.Dataset() and project.get_dataset()).

Is my request feasible?

Best Answer

  • JordanB
    JordanB Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 297 Dataiker
    edited July 17 Answer ✓

    Hi @tanguy
    ,

    You can accomplish this by interrogating your running jobs - when a dataset is to be built, a job is initiated. I've provided some sample code below that you can customize to your specific needs.

    import dataiku
    from dataiku import pandasutils as pdu
    import pandas as pd
    
    client = dataiku.api_client()
    project = client.get_project('projectkey')
    jobs = project.list_jobs()
    #print(jobs)/note the dataset build name
    for job in jobs:
        job_id = job['def']['id']
        job_name = job['def']['name']
        job = project.get_job(job_id)
        job_state = job.get_status()['baseStatus']['state']
        if job_name == 'Build dataset-name' and job_state == 'RUNNING':
            print(f"{job_name} is running")

    Thanks!
    Jordan

Answers

Setup Info
    Tags
      Help me…