caused by: DataStoreIOException: Path does not exist in the dataset: '/*/'
Hello all,
I have a dataset that is partitioned by a descriptive column,
I want to apply a python recipe for each partition,
but I get this error :
Path does not exist: Error while connecting to dataset NLP_SARA.Gph1_2_data (partition *)
caused by: DataStoreIOException: Path does not exist in the dataset: '/*/'
can someone please help ?
thank you very much
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,226 Dataiker
Hi,
Wildcard "*" is not supported. In order to run something across all partitions, you will need to explicitly list the discreet partitions to build.
https://doc.dataiku.com/dss/latest/partitions/identifiers.html#ranges-specifications
So it would be something like Part1/Part2/Part3... instead of "*"
If you want to build all partitions initially you can do so from a Scenario for example and run :
To generate the list of all partitions you can run the following in a notebook :
import dataiku dataset = dataiku.Dataset("my_dataset_name") partitions = dataset.list_partitions() partitions_str = str('/'.join(partitions)) print(partitions_str)
To actually build all partitions you can use a Scenario:
from dataiku.scenario import Scenario import dataiku scenario = Scenario() dataset = dataiku.Dataset("split_input_dataset") partitions = dataset.list_partitions() # get all partitions from input datasets partitions_str = ','.join(partitions) # concatenate them # Building a dataset scenario.build_dataset("split_output_dataset", partitions=partitions_str)