rename dataset by python internal API

Options
steven
steven Registered Posts: 5 ✭✭✭✭

Hello everyone!

I'm working on a Python Notebook to copy paste a Project. I have already change the path of my imported Datasets like explained here: https://www.dataiku.com/learn/guide/tips/duplicate-project.html.

However I can't rename the dataset like I renamed the 'hiveTableName'. Any help please? :)


#Foreach datasets
for i in range(len(imported_datasets)):
#Get the name of the dataset
dataset_name = imported_datasets[i]["name"]
print("editing path of : " + dataset_name)
#Get his definition by his name
dataset = imported_project.get_dataset(dataset_name)
definition = dataset.get_definition()

#Update name, path, hivetable name
#Updating name doesn't work this way. @TODO
dataset.dataset_name = prefixe + "_" + dataset.dataset_name
definition["name"] = prefixe + "_" + definition["name"]


definition["params"]["hiveTableName"] = prefixe + "_" + definition["params"]["hiveTableName"]
definition["params"]["path"] = prefixe + "_" + definition["params"]["path"]

#Update the data in the dataset
dataset.set_definition(definition)

Error: DataikuException: com.dataiku.dip.server.controllers.NotFoundException: dataset does not exist: TEST_IMPORT.TEST_IMPORT_test_import_d_sync

The first TEST_IMPORT is my project, the second is the prefixe (to differentiate with the original project)

Best Answer

  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Answer ✓
    Options

    Hi,

    Indeed renaming datasets through the API is not supported.

    It is actually not needed since you updated the path and the Hive name, the datasets can coexist between the 2 projects. This already guarantees logical separation across projects, while the dataset names are the same. Hence you can use your script, except for this part:

    definition["name"] = prefixe + "_" + definition["name"]

    which is not needed.

    Note that you would have achieved the same automatically without code if you had put ${projectKey}_ as prefix in the path and Hive section of your connection, before creating the initial project. This is documented here: https://doc.dataiku.com/dss/latest/connecting/relocation.html

    Cheers,

    Alex

Answers

  • steven
    steven Registered Posts: 5 ✭✭✭✭
    Options
    Ok thanks for the answer. I tried what you linked but it seems my Datasets are still connected.
    Correct me if I'm wrong: I went to the home page of my imported project, settings, settings again, code recipes, then I changed the option to 'Table references with variables' (suppose to add ${projectKey}_ to the table I think?)
    Thanks again :)

    Edit: My datasets are on HDFS
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options
    Yes, the relocation setting link will work for duplicating future projects. For your current project, which was created before having this setting, you will have to use your script to apply it "a posteriori". Your script is correct, you simply need to remove the part:
    definition["name"] = prefixe + "_" + definition["name"]
  • steven
    steven Registered Posts: 5 ✭✭✭✭
    Options
    Ok this is working great :)
    I am going to be a pain, but another question: I didn't find a way to share a Dataset via the API. Is it possible or not?
  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭
    Options
    This is not currently part of our API. We are planning to add it in our next release :)
Setup Info
    Tags
      Help me…