recipe zone manipulation

ThPr
ThPr Registered Posts: 4

I would like to create a file (from python) into a specific zone. Problem is, I could not find any way to read the zone attribute of a certain existing file, yet alone to change/set that attribute to a different zone, out of python.

Best Answer

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron
    edited July 17 Answer ✓

    This sample shows how to deal with zones:

    https://developer.dataiku.com/latest/concepts-and-examples/flow.html#creating-a-zone-and-adding-items-in-it

    After your dataset = builder.create() statement you could have something like this:

    # Create new Flow Zone
    flow = project.get_flow()
    zone = flow.create_zone("zone1")
    
    # Move item to new flow zone
    zone.add_item(dataset)

    But this assumes you need to create a new zone every time so you may need to customise that to deal with moving the new dataset to an existing zone. The Zone data is stored in a different place than the dataset so you have to first create the dataset and then move it.

Answers

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron

    What do you mean create a file in a specific zone? You can't have files in Flow Zones in Dataiku. You can have a Dataiku Manager Folder in a Flow Zone and create files inside that folder. Is that what you want?

  • ThPr
    ThPr Registered Posts: 4

    You're correct. I wasn't specific enough earlier. In reality, I have a recipe that exists in a certain zone. Within this recipe, I create some files in addition to the output files that are generated upon starting the recipe. Currently, everything, including the recipe and all the files, land in the default zone regardless of where I initiated the process. This is why I'm seeking ways to modify the zone attribute.

  • Turribeach
    Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron

    Can you please post your Python code in a code block (click on the <\> on the toolbar).

  • ThPr
    ThPr Registered Posts: 4
    edited July 17
    import pyarrow as pa
    import pyarrow.parquet as pq
     
    client = dataiku.api_client()
    project = client.get_project('my_project_name')
    dataset_list = project.list_datasets()
    dataset_names = []
    for dataset in dataset_list:
        dataset_names.append(dataset['name'])
    if 'my_dataset_name' in dataset_names:
        dataset = project.get_dataset('my_dataset_name')
        dataset.clear()
    else:
        builder = project.new_managed_dataset('my_dataset_name')
        builder.with_store_into('my_storage_location', format_option_id='PARQUET_HIVE')
        dataset = builder.create()
    my_s3_path = 'something_s3'+'/my_project_name' + '/my_dataset_name'
    pq.write_to_dataset(table=pa.Table.from_pandas(my_df),
            root_path=my_s3_path,
            filesystem=s3,
    )
  • ThPr
    ThPr Registered Posts: 4

    Thank you very much. That was quite helpful.

Setup Info
    Tags
      Help me…