Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello everyone, I have the following problem: I have built a flow with multiple input data. Now I want to run it with different scenarios (different input data). To exchange the data, I open the uploaded data and change the dataset under settings. But this takes a long time. Isn't there a simpler method for replacing data? I also had the problem that there was an update and a new column was added to my input data. This was not recognized in my flow and is not displayed to me, even though it is visible in the uploaded input file. I am searching for a solution without using Python.
Where is the input data stored under (what technology, what kind of connection)? When you say "different input data" what exactly do you mean? What is actually changing here, tables names, file names, etc?
Currently, we are using excel files that are stored locally on a drive. In the excel files a new column has been added. The rest of the columns have remained the same.
Well first you need to move away from using Excel files which a human user format and move to something like CSV which is a computer oriented format. You can automate the extractation of CSV from your XLS using a VBA macro. Then you need to move your files to a shared storage layer. This could be a network drive you can mount in Dataiku or some Cloud Storage bucket. Then you would create a Dataiku folder on this storage layer and use the Files in Folder dataset to read the files dymanically. I posted about some hidden features of the Files and Folders dataset here. You can use more than one Files in Folder dataset pointed to a single Dataiku Folder. Basically use one Files in Folder dataset per type of file you load. The file names can be dynamic as you can set a pattern to select the files to read from the folder. In order words your whole flow could run for newly added files without having to modify any recipe.