Ignoring input subfolder
Hi All,
We are trying to load a daily updating dataset, we are using partitioning to do this.
However in the input folder a subfolder _delta_log is present which we want to ignore. The required data is in the date subfolders (see screenshot).
In the advanced options tab I see it is possible to exclude files but I cannot find any documentation on this subject. Can someone help me with an expression to ignore the delta log subfolder and all its contents?
Operating system used: Windows
Best Answer
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
I think for the expression, you just need
_delta_log
dropping the single quotes that I used in the comment (which were to indicate the expression only).
You could also try:
_delta_log/*
Hope this helps
Answers
-
Ignacio_Toledo Dataiku DSS Core Designer, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 415 Neuron
Hi @KevinHart
I believe you are talking about this advanced option?
It means you have to give either a glob or regex expression to setup the rule. I'm not an expert on this kind of expressions (usually testing by trial or error, or using some kind of online tool), but if your only folder to avoid is _delta_log, then select Glob and then add '_delta_log'.
Hope that helps.
-
Hi @Ignacio_Toledo
,Many thanks for your reply!
Unfortunately this did not help
The sample file is still a file in the _delta_log subfolder, see screenshot. Perhaps a regex would work better in this case indeed.
-