Using Dataiku

Hi all,
I'm working on a Python recipe to automate file validation in Dataiku using managed folders. My goal is to:
- Scan a "validation" folder for Excel or CSV files.
- Check that they contain the exact column headers that I defined.
- Route them to either an "inprogress" or "rejected" folder based on the result.
I’m using dataiku.Folder(...).list_paths_in_partition()
and get_download_stream()
to read files, but even correctly formatted .xlsx
files seem to end up in the rejected folder. My code tries to read the files with pandas.read_excel()
and falls back to read_csv()
if needed.
Despite this, files are consistently rejected with read errors, even though they open fine in Excel.
Has anyone successfully implemented this kind of folder-based validation workflow? Are there any known issues with pandas.read_excel()
in Dataiku, or is there a better pattern?
Any examples, insights, or debugging tips would be greatly appreciated!
Thanks in advance!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,467 Neuron
Paste your read/validation Python code and a sample XLS file that fails validation. You can remove real data and leave just dummy data.