dataikuapi - How to rerun a job through python using the api ?

etienne_95 · July 2020

Hi,

i would like to rerun a job i made in Dss through the api using python. I used this documentation(https://doc.dataiku.com/dss/latest/python-api/jobs.html) but it's not describe how to defination properly the job.

I have these error : "dataikuapi.utils.DataikuException: java.lang.IllegalArgumentException: Computable not found or not buildable:TEST.Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332" I didn't find any documentation about it

here my script

import dataikuapi
import time

host = "http://192.168.1.14:10000"
apiKey = "BTs4pTOGCOtsBgAiSE5YHrS6GPmnBjHh"

client = dataikuapi.DSSClient(host, apiKey)

# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which the API key has access
# dss_projects = client.list_project_keys()
# print(dss_projects)

project = client.get_project('TEST')
dss_job = project.list_jobs()
# print(dss_job)

# failed_jobs = [job for job in dss_job if job['state'] == 'SUCCESS']
# print(failed_jobs)

# Start a job
print("Step 1 - Job definition")

definition = {
"type": "NON_RECURSIVE_FORCED_BUILD",
    'projectKey': 'TEST',
    'id': 'Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332',
    'name': 'Build churn_test_tmp_prepared',
    'initiator': 'admin',
    'triggeredFrom': 'RECIPE',
    'recipe': 'compute_churn_test_tmp_prepared',

    "outputs": [{
'id': 'Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332',
        'type': 'DATASET',
        'targetDatasetProjectKey': 'TEST',
        'targetDataset': 'churn_test_tmp_prepared'
    }]
}
print("Step 2 - start job")

job = project.start_job(definition)

duphan · July 2020

Hello,

In the next section of the doc you can find the syntax to run a job: https://doc.dataiku.com/dss/latest/python-api/jobs.html#starting-new-jobs

Please find attached a function that rebuild a list of given datasets (the variable outputs)

def run_job(project,outputs,job_type="NON_RECURSIVE_FORCED_BUILD"):
    definition = {
        "type" : job_type,
        "outputs" : [{
                                        "id" : "%s" %(output_name),
                                        "partition" : "NP"
                                } for output_name in outputs ]
    }
    job = project.start_job(definition)
    state = ''
    print 'Building Datasets %s' % outputs 
    while state != 'DONE' and state != 'FAILED' and state != 'ABORTED':
        time.sleep(1)
        state = job.get_status()['baseStatus']['state']

Cheers,

Du

etienne_95 · July 2020

Hi Du,

Thanks for your answer. It's working

dataikuapi - How to rerun a job through python using the api ?

Best Answers

Categories

Setup Info

Tags