Best choice : replacement of datasets in a flow (conformity of filed datasets)

Kevin_dataiku67 · March 2023

Hello all,

Thank you for your future answers.

I explain my problem. I will have several datasets (Excel) that will have to be joined together before undergoing processing (no problem with the join recipes and the rest)

However, these datasets will have to be replaced every month to undergo the same treatment (same flow). One of my solutions is to create folders for each dataset before the flow and to make python to check the conformity of the datasets deposited in each folder and then to make the joins and treatment (calculation).

Do you think it's possible to do it without code for the verification of datasets deposited in the folders? Is it better that I manage only the verification of datasets filed with code and do the rest with recipes?

I am looking for the best solution, thank you very much

Alexandru · March 2023

Hi @Kevin_dataiku67
,

Have you considered using partitioning here .eg
partitioned dataset or folder in this case? Each file/s would be added in a path e.g YYYY-MM and then you can build your flow for the new monthly partition.

https://doc.dataiku.com/dss/latest/partitions/fs_datasets.html

Kevin_dataiku67 · March 2023

Thank you for your feedback Alex ! Yes it's great, I didn't know you could do that!

But how to check that the file dropped in the folder is valid? (That the user does not drop anything in the place of the inital dataset to replace)

In my opinion, I would have to check the names of the columns via python, no?

Thanks for your expertise

Best choice : replacement of datasets in a flow (conformity of filed datasets)

Answers

Categories

Setup Info

Tags