Check out the first Dataiku 8 Deep Dive focusing on Productivity on October 29th Read More

dataikuapi - How to rerun a job through python using the api ?

Level 2
dataikuapi - How to rerun a job through python using the api ?

Hi,

 

i would like to rerun a job i made in Dss through the api using python. I used this documentation(https://doc.dataiku.com/dss/latest/python-api/jobs.html)  but it's not describe how to defination properly the job.

I have these error : "dataikuapi.utils.DataikuException: java.lang.IllegalArgumentException: Computable not found or not buildable:TEST.Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332" I didn't find any documentation about it

here my script

import dataikuapi
import time

host = "http://192.168.1.14:10000"
apiKey = "BTs4pTOGCOtsBgAiSE5YHrS6GPmnBjHh"

client = dataikuapi.DSSClient(host, apiKey)

# client is now a DSSClient and can perform all authorized actions.
# For example, list the project keys for which the API key has access
# dss_projects = client.list_project_keys()
# print(dss_projects)

project = client.get_project('TEST')
dss_job = project.list_jobs()
# print(dss_job)

# failed_jobs = [job for job in dss_job if job['state'] == 'SUCCESS']
# print(failed_jobs)

# Start a job
print("Step 1 - Job definition")

definition = {
"type": "NON_RECURSIVE_FORCED_BUILD",
'projectKey': 'TEST',
'id': 'Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332',
'name': 'Build churn_test_tmp_prepared',
'initiator': 'admin',
'triggeredFrom': 'RECIPE',
'recipe': 'compute_churn_test_tmp_prepared',

"outputs": [{
'id': 'Build_churn_test_tmp_prepared_2020-07-22T15-15-12.332',
'type': 'DATASET',
'targetDatasetProjectKey': 'TEST',
'targetDataset': 'churn_test_tmp_prepared'
}]
}
print("Step 2 - start job")

job = project.start_job(definition)

 

0 Kudos
2 Replies
Dataiker
Dataiker

Hello,

In the next section of the doc you can find the syntax to run a job: https://doc.dataiku.com/dss/latest/python-api/jobs.html#starting-new-jobs

Please find attached a function that rebuild a list of given datasets (the variable outputs)

def run_job(project,outputs,job_type="NON_RECURSIVE_FORCED_BUILD"):
definition = {
"type" : job_type,
"outputs" : [{
"id" : "%s" %(output_name),
"partition" : "NP"
} for output_name in outputs ]
}
job = project.start_job(definition)
state = ''
print 'Building Datasets %s' % outputs
while state != 'DONE' and state != 'FAILED' and state != 'ABORTED':
time.sleep(1)
state = job.get_status()['baseStatus']['state']

Cheers,

Du 

0 Kudos
Level 2
Author

Hi Du,

Thanks for your answer. It's working 🙂 

0 Kudos