I have a dataset that is partitioned by a descriptive column,

I want to apply a python recipe for each partition,

but I get this error :

Path does not exist: Error while connecting to dataset NLP_SARA.Gph1_2_data (partition *)

caused by: DataStoreIOException: Path does not exist in the dataset: '/*/'

can someone please help ?

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer
    edited July 17


    Wildcard "*" is not supported. In order to run something across all partitions, you will need to explicitly list the discreet partitions to build.

    So it would be something like Part1/Part2/Part3... instead of "*"

    If you want to build all partitions initially you can do so from a Scenario for example and run :

    To generate the list of all partitions you can run the following in a notebook :

    import dataiku
    dataset = dataiku.Dataset("my_dataset_name")
    partitions = dataset.list_partitions()
    partitions_str = str('/'.join(partitions))

    To actually build all partitions you can use a Scenario:

    from dataiku.scenario import Scenario
    import dataiku
    scenario = Scenario()
    dataset = dataiku.Dataset("split_input_dataset")
    partitions = dataset.list_partitions() # get all partitions from input datasets
    partitions_str = ','.join(partitions) # concatenate them
    # Building a dataset
    scenario.build_dataset("split_output_dataset", partitions=partitions_str)

