Excel (xlsx) file with multiple sheets as input in flow
Hi,
I have created a flow which has excel file xlsx as input dataset.
In single excel file I have multiple sheet.
Based on multiple sheets I have created 5 different dataset and completed my flow.
The problem here is I want to upload files on monthly basis.
So every time manually I want to remove the existing file and upload the new file to proceed further which is not automated and progressive one.
Is there any way that i can run my flow by passing input file name as parameter based on that it can read file from the folder and use the existing dataset to run the flow.
Use case.
Input file: single excel file with multiple sheets
Upload monthly basis.
Create and store the data into table for each sheet seperately.
Need to automate the process read the input file from folder
Can you provide me the steps as solution.
Answers
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Hi Sushil,
There are a few ways of doing this depending on how automated you want this and how you are uploading the files.
Firstly, you can use Scenarios (https://knowledge.dataiku.com/latest/courses/automation/scenarios-summary.html) which will run parts of a flow and can be triggered manually, by a dataset changing, or based on time.
This will be a good option if you are just running the process after overwriting an existing uploaded dataset.
If you want to use a managed folder and a variable (https://knowledge.dataiku.com/latest/kb/o16n/variables/index.html) to name the file then manually start the scenario, you can do this too.
Step 1 - create a new variable for your filename in the project variables:
Step 2 - replace the 'Files to Include' field in the 'Create dataset from folder' step with a variable.
Anytime you load a new file to the folder you will just have to update the variable nam and you can then select 'build all' or run a Scenario to build the flow from this point.
After you have done all these - if you want to package it all up nicely you may want to consider converting this into an Application (https://knowledge.dataiku.com/latest/kb/o16n/dataiku-applications/index.html) . This will give you a GUI to do things like uploading a Folder, changing the variables, and running the Scenario (and even showing dashboards or downloading result files).
Hope that helps - let me know if you get stuck or need more detail.
-
Hi,
Thanks for detailed steps.
I have used the managed folder option and by using variable name i replicated the steps fl provided by you.
When running the build all, the job is getting completed but there was no changes when I see on summary.
I need to go to settings and do the test means, it's showing the file name when I save manually then only reflecting.
I was struck here why it was automatically not reflecting when ever the file name changes in variable and execute the flow by using build all option means.. .
Why it's need to save manually in settings to reflect the changes
Can you help on these
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
Because of the file type, you need to make sure you have 'Force Build All Dependencies' selected, otherwise it will just try to build only where it's missing files.
You shouldn't have to go into the Settings and click Test for anything other than the preview.
The Force Build Dependencies will also be needed if you are building a Scenario
-
Hi,
Thanks for quick response.
If the flow is having any data preparation after that means it's working fine when we select force rebuild and chages are reflecting on explore screen.
The problem is when we go to dataset which is just before to dataprepation recipe, in that when click on explore the changes is not reflecting in screen.
Manually we need to go to settings and go to format and preview tab and click on preview and save.
Then only it reflecting the changes in explore screen. Till that it show only last fetch data in screen.
-
AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker
I assume this won't matter too much if you are automating the process?
If it does then can you try going into the dataset and selecting 'Configure Sample' and checking the 'Auto refresh sample' checkbox at the bottom? See if that helps.