caused by: DataStoreIOException: Path does not exist in the dataset: '/*/'

Options
saraa1
saraa1 Registered Posts: 13 ✭✭✭

Hello all,

I have a dataset that is partitioned by a descriptive column,

I want to apply a python recipe for each partition,

but I get this error :

Path does not exist: Error while connecting to dataset NLP_SARA.Gph1_2_data (partition *)

caused by: DataStoreIOException: Path does not exist in the dataset: '/*/'

can someone please help ?

thank you very much

Answers

  • Alexandru
    Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,209 Dataiker
    edited July 17
    Options

    Hi,

    Wildcard "*" is not supported. In order to run something across all partitions, you will need to explicitly list the discreet partitions to build.

    https://doc.dataiku.com/dss/latest/partitions/identifiers.html#ranges-specifications

    So it would be something like Part1/Part2/Part3... instead of "*"

    If you want to build all partitions initially you can do so from a Scenario for example and run :

    To generate the list of all partitions you can run the following in a notebook :

    import dataiku
    
    dataset = dataiku.Dataset("my_dataset_name")
    partitions = dataset.list_partitions()
    partitions_str = str('/'.join(partitions))
    print(partitions_str)

    To actually build all partitions you can use a Scenario:

    from dataiku.scenario import Scenario
    import dataiku
    
    scenario = Scenario()
    
    dataset = dataiku.Dataset("split_input_dataset")
    partitions = dataset.list_partitions() # get all partitions from input datasets
    partitions_str = ','.join(partitions) # concatenate them
    
    # Building a dataset
    scenario.build_dataset("split_output_dataset", partitions=partitions_str)
    

Setup Info
    Tags
      Help me…