Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I would like to create a file (from python) into a specific zone. Problem is, I could not find any way to read the zone attribute of a certain existing file, yet alone to change/set that attribute to a different zone, out of python.
This sample shows how to deal with zones:
After your dataset = builder.create() statement you could have something like this:
# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")
# Move item to new flow zone
zone.add_item(dataset)
But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.
What do you mean create a file in a specific zone? You can't have files in Flow Zones in Dataiku. You can have a Dataiku Manager Folder in a Flow Zone and create files inside that folder. Is that what you want?
You're correct. I wasn't specific enough earlier. In reality, I have a recipe that exists in a certain zone. Within this recipe, I create some files in addition to the output files that are generated upon starting the recipe. Currently, everything, including the recipe and all the files, land in the default zone regardless of where I initiated the process. This is why I'm seeking ways to modify the zone attribute.
Can you please post your Python code in a code block (click on the <\> on the toolbar).
import pyarrow as pa
import pyarrow.parquet as pq
client = dataiku.api_client()
project = client.get_project('my_project_name')
dataset_list = project.list_datasets()
dataset_names = []
for dataset in dataset_list:
dataset_names.append(dataset['name'])
if 'my_dataset_name' in dataset_names:
dataset = project.get_dataset('my_dataset_name')
dataset.clear()
else:
builder = project.new_managed_dataset('my_dataset_name')
builder.with_store_into('my_storage_location', format_option_id='PARQUET_HIVE')
dataset = builder.create()
my_s3_path = 'something_s3'+'/my_project_name' + '/my_dataset_name'
pq.write_to_dataset(table=pa.Table.from_pandas(my_df),
root_path=my_s3_path,
filesystem=s3,
)
This sample shows how to deal with zones:
After your dataset = builder.create() statement you could have something like this:
# Create new Flow Zone
flow = project.get_flow()
zone = flow.create_zone("zone1")
# Move item to new flow zone
zone.add_item(dataset)
But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.
Thank you very much. That was quite helpful.