recipe zone manipulation

Solved!
ThPr
Level 2
recipe zone manipulation

I would like to create a file (from python) into a specific zone. Problem is, I could not find any way to read the zone attribute of a certain existing file, yet alone to change/set that attribute to a different zone, out of python.

0 Kudos
1 Solution
Turribeach

This sample shows how to deal with zones:

https://developer.dataiku.com/latest/concepts-and-examples/flow.html#creating-a-zone-and-adding-item...

After your dataset = builder.create() statement you could have something like this:

# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")

# Move item to new flow zone
zone.add_item(dataset)

 But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.

View solution in original post

0 Kudos
6 Replies
Turribeach

What do you mean create a file in a specific zone? You can't have files in Flow Zones in Dataiku. You can have a Dataiku Manager Folder in a Flow Zone and create files inside that folder. Is that what you want?

0 Kudos
ThPr
Level 2
Author

You're correct. I wasn't specific enough earlier. In reality, I have a recipe that exists in a certain zone. Within this recipe, I create some files in addition to the output files that are generated upon starting the recipe. Currently, everything, including the recipe and all the files, land in the default zone regardless of where I initiated the process. This is why I'm seeking ways to modify the zone attribute.

0 Kudos
Turribeach

Can you please post your Python code in a code block (click on the <\> on the toolbar). 

0 Kudos
ThPr
Level 2
Author
import pyarrow as pa
import pyarrow.parquet as pq
 
client = dataiku.api_client()
project = client.get_project('my_project_name')
dataset_list = project.list_datasets()
dataset_names = []
for dataset in dataset_list:
    dataset_names.append(dataset['name'])
if 'my_dataset_name' in dataset_names:
    dataset = project.get_dataset('my_dataset_name')
    dataset.clear()
else:
    builder = project.new_managed_dataset('my_dataset_name')
    builder.with_store_into('my_storage_location', format_option_id='PARQUET_HIVE')
    dataset = builder.create()
my_s3_path = 'something_s3'+'/my_project_name' + '/my_dataset_name'
pq.write_to_dataset(table=pa.Table.from_pandas(my_df),
        root_path=my_s3_path,
        filesystem=s3,
)
0 Kudos
Turribeach

This sample shows how to deal with zones:

https://developer.dataiku.com/latest/concepts-and-examples/flow.html#creating-a-zone-and-adding-item...

After your dataset = builder.create() statement you could have something like this:

# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")

# Move item to new flow zone
zone.add_item(dataset)

 But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.

0 Kudos
ThPr
Level 2
Author

Thank you very much. That was quite helpful.