Discover the winners & finalists of the 2022 Dataiku Frontrunner Awards!READ THEIR USE CASES

Is It Possible to Add Multiple/All Datasets as Additional Content to a Bundle at Once

Solved!
kathyqingyuxu
Is It Possible to Add Multiple/All Datasets as Additional Content to a Bundle at Once

Hello everyone!

 

I wanted to know if it's possible to add multiple/all datasets as additional content to a project bundle at once (i.e. add all by clicking and highlighting?) vs. manually selecting the datasets one at a time. The scenario I have is for example there might be a project with multiple uploaded datasets that I want to add to my project bundle at one time. Right now we can add all the datasets by individually clicking on each through the drop down. I'm not sure if it's possible to select all the appropriate tables at once instead, thanks!

 

Best,

Kathy

0 Kudos
1 Solution
AlexT
Dataiker

Hi @kathyqingyuxu ,

There is currently no bulk select and add for datasets available in the UI when creating a bundle and you have to add them one by one. 

If you have a lot of datasets you use the Python API to create a bundle that includes all datasets. You can find an example here: 

https://community.dataiku.com/t5/Using-Dataiku/Create-Bundle-which-includes-contents-with-Python-API... 

A slightly modified version that will add all uploaded datasets would be :


import dataiku

client = dataiku.api_client()
proj_key = "<replace-project-key>"

# Need to have a pre-created project variable called "version" assigned to a number as a count for the bundles
proj = client.get_project(proj_key)
proj_vars = proj.get_variables()
current_vers = proj_vars['standard']['version']

# Running raw['bundleExporterSettings'] = {}
# and saving the settings, there is no need for a dummy bundle.
settings = proj.get_settings()
raw = settings.get_raw()
raw['bundleExporterSettings'] = {}
settings.save()

#get list of all uploaded datasets

# Retrieve and update export options
settings = proj.get_settings()
export_options = settings.get_raw()['bundleExporterSettings']['exportOptions']
export_options['exportUploads'] = "True"
export_options['exportSavedModels'] = "True"
export_options['exportManagedFolders'] = "True"

#get list of all uploaded datasets to include other modify the if statement

datasets = proj.list_datasets()

for dataset in datasets:
    if dataset['type'] == 'UploadedFiles':
        ds_name = dataset['name']
        export_options['includedDatasetsData'].append({"name": ds_name})

#ucomment to add models
#export_options['includedSavedModels'].append({"id": "<model_ID>"})
settings.save()

# Setting bundleID
bundle_id = "version_" + str(current_vers)

# Creating bundle
try:
    proj.export_bundle(bundle_id)
    print(f"The bundle with ID {bundle_id} was created successfully")
except:
    print("There was an error creating the bundle. See the traceback:")
    raise

# Increasing the count of the version variable:
proj_vars['standard']['version'] = current_vers + 1
proj.set_variables(proj_vars)
print("Increased the count for the version")

View solution in original post

1 Reply
AlexT
Dataiker

Hi @kathyqingyuxu ,

There is currently no bulk select and add for datasets available in the UI when creating a bundle and you have to add them one by one. 

If you have a lot of datasets you use the Python API to create a bundle that includes all datasets. You can find an example here: 

https://community.dataiku.com/t5/Using-Dataiku/Create-Bundle-which-includes-contents-with-Python-API... 

A slightly modified version that will add all uploaded datasets would be :


import dataiku

client = dataiku.api_client()
proj_key = "<replace-project-key>"

# Need to have a pre-created project variable called "version" assigned to a number as a count for the bundles
proj = client.get_project(proj_key)
proj_vars = proj.get_variables()
current_vers = proj_vars['standard']['version']

# Running raw['bundleExporterSettings'] = {}
# and saving the settings, there is no need for a dummy bundle.
settings = proj.get_settings()
raw = settings.get_raw()
raw['bundleExporterSettings'] = {}
settings.save()

#get list of all uploaded datasets

# Retrieve and update export options
settings = proj.get_settings()
export_options = settings.get_raw()['bundleExporterSettings']['exportOptions']
export_options['exportUploads'] = "True"
export_options['exportSavedModels'] = "True"
export_options['exportManagedFolders'] = "True"

#get list of all uploaded datasets to include other modify the if statement

datasets = proj.list_datasets()

for dataset in datasets:
    if dataset['type'] == 'UploadedFiles':
        ds_name = dataset['name']
        export_options['includedDatasetsData'].append({"name": ds_name})

#ucomment to add models
#export_options['includedSavedModels'].append({"id": "<model_ID>"})
settings.save()

# Setting bundleID
bundle_id = "version_" + str(current_vers)

# Creating bundle
try:
    proj.export_bundle(bundle_id)
    print(f"The bundle with ID {bundle_id} was created successfully")
except:
    print("There was an error creating the bundle. See the traceback:")
    raise

# Increasing the count of the version variable:
proj_vars['standard']['version'] = current_vers + 1
proj.set_variables(proj_vars)
print("Increased the count for the version")

Labels

?

Setup info

?
A banner prompting to get Dataiku