recipe zone manipulation

ThPr · ‎01-15-2024

I would like to create a file (from python) into a specific zone. Problem is, I could not find any way to read the zone attribute of a certain existing file, yet alone to change/set that attribute to a different zone, out of python.

Turribeach · ‎01-16-2024

This sample shows how to deal with zones:

https://developer.dataiku.com/latest/concepts-and-examples/flow.html#creating-a-zone-and-adding-item...

After your dataset = builder.create() statement you could have something like this:

# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")

# Move item to new flow zone
zone.add_item(dataset)

But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.

View solution in original post

Turribeach · ‎01-15-2024

What do you mean create a file in a specific zone? You can't have files in Flow Zones in Dataiku. You can have a Dataiku Manager Folder in a Flow Zone and create files inside that folder. Is that what you want?

ThPr · ‎01-15-2024

You're correct. I wasn't specific enough earlier. In reality, I have a recipe that exists in a certain zone. Within this recipe, I create some files in addition to the output files that are generated upon starting the recipe. Currently, everything, including the recipe and all the files, land in the default zone regardless of where I initiated the process. This is why I'm seeking ways to modify the zone attribute.

Turribeach · ‎01-15-2024

Can you please post your Python code in a code block (click on the <\> on the toolbar).

ThPr · ‎01-16-2024

import pyarrow as pa
import pyarrow.parquet as pq
 
client = dataiku.api_client()
project = client.get_project('my_project_name')
dataset_list = project.list_datasets()
dataset_names = []
for dataset in dataset_list:
    dataset_names.append(dataset['name'])
if 'my_dataset_name' in dataset_names:
    dataset = project.get_dataset('my_dataset_name')
    dataset.clear()
else:
    builder = project.new_managed_dataset('my_dataset_name')
    builder.with_store_into('my_storage_location', format_option_id='PARQUET_HIVE')
    dataset = builder.create()
my_s3_path = 'something_s3'+'/my_project_name' + '/my_dataset_name'
pq.write_to_dataset(table=pa.Table.from_pandas(my_df),
        root_path=my_s3_path,
        filesystem=s3,
)

Turribeach · ‎01-16-2024

This sample shows how to deal with zones:

https://developer.dataiku.com/latest/concepts-and-examples/flow.html#creating-a-zone-and-adding-item...

After your dataset = builder.create() statement you could have something like this:

# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")

# Move item to new flow zone
zone.add_item(dataset)

But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.

ThPr · ‎01-16-2024

Thank you very much. That was quite helpful.

Sign up to take part

recipe zone manipulation

recipe zone manipulation