recipe zone manipulation
I would like to create a file (from python) into a specific zone. Problem is, I could not find any way to read the zone attribute of a certain existing file, yet alone to change/set that attribute to a different zone, out of python.
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
This sample shows how to deal with zones:
After your dataset = builder.create() statement you could have something like this:
# Create new Flow Zone flow = project.get_flow() zone = flow.create_zone("zone1") # Move item to new flow zone zone.add_item(dataset)
But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
What do you mean create a file in a specific zone? You can't have files in Flow Zones in Dataiku. You can have a Dataiku Manager Folder in a Flow Zone and create files inside that folder. Is that what you want?
-
You're correct. I wasn't specific enough earlier. In reality, I have a recipe that exists in a certain zone. Within this recipe, I create some files in addition to the output files that are generated upon starting the recipe. Currently, everything, including the recipe and all the files, land in the default zone regardless of where I initiated the process. This is why I'm seeking ways to modify the zone attribute.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,165 Neuron
Can you please post your Python code in a code block (click on the <\> on the toolbar).
-
import pyarrow as pa import pyarrow.parquet as pq client = dataiku.api_client() project = client.get_project('my_project_name') dataset_list = project.list_datasets() dataset_names = [] for dataset in dataset_list: dataset_names.append(dataset['name']) if 'my_dataset_name' in dataset_names: dataset = project.get_dataset('my_dataset_name') dataset.clear() else: builder = project.new_managed_dataset('my_dataset_name') builder.with_store_into('my_storage_location', format_option_id='PARQUET_HIVE') dataset = builder.create() my_s3_path = 'something_s3'+'/my_project_name' + '/my_dataset_name' pq.write_to_dataset(table=pa.Table.from_pandas(my_df), root_path=my_s3_path, filesystem=s3, )
-
Thank you very much. That was quite helpful.