Excel (xlsx) file with multiple sheets as input in flow

SUSHIL
Level 3
Excel (xlsx) file with multiple sheets as input in flow

Hi, 

I have created a flow which has  excel file xlsx as input dataset. 

In single excel file I have multiple sheet. 

Based on multiple sheets I have created 5 different dataset  and completed my flow. 

The problem here is I want to  upload files on monthly basis. 

So every time manually I want to remove the existing file and upload the new file to proceed further which is not automated and progressive one. 

Is there any way that i can run my flow by passing input file name as parameter based on that it can read file from the folder and use the existing dataset to run the flow. 

Use case. 

Input file: single excel file with multiple sheets

Upload monthly basis. 

Create and store the data into table for each sheet  seperately. 

Need to automate the process  read the input file from folder

Can you provide me the steps as solution. 

0 Kudos
5 Replies
AlexGo
Dataiker

Hi Sushil,

There are a few ways of doing this depending on how automated you want this and how you are uploading the files. 

Firstly, you can use Scenarios (https://knowledge.dataiku.com/latest/courses/automation/scenarios-summary.html) which will run parts of a flow and can be triggered manually, by a dataset changing, or based on time.

This will be a good option if you are just running the process after overwriting an existing uploaded dataset.

 

If you want to use a managed folder and a variable (https://knowledge.dataiku.com/latest/kb/o16n/variables/index.html) to name the file then manually start the scenario, you can do this too.

Step 1 - create a new variable for your filename in the project variables:

Screen Shot 2022-04-12 at 3.40.43 PM.png

Step 2 - replace the 'Files to Include' field in the 'Create dataset from folder' step with a variable.

Screen Shot 2022-04-12 at 3.39.09 PM.png

Anytime you load a new file to the folder you will just have to update the variable nam and you can then select 'build all' or run a Scenario to build the flow from this point. 

 

After you have done all these - if you want to package it all up nicely you may want to consider converting this into an Application (https://knowledge.dataiku.com/latest/kb/o16n/dataiku-applications/index.html) . This will give you a GUI to do things like uploading a Folder, changing the variables, and running the Scenario (and even showing dashboards or downloading result files).  

Screen Shot 2022-04-12 at 3.52.15 PM.png

 

Hope that helps - let me know if you get stuck or need more detail.

0 Kudos
SUSHIL
Level 3
Author

Hi, 

Thanks for detailed steps. 

I have used the managed folder option and by  using variable name i replicated the steps fl provided by you. 

When running the build all, the job is getting completed but there was no changes when I see on summary. 

I need to go to settings and do the test means, it's showing the file name when I save manually then only reflecting. 

I was struck  here why it was automatically not reflecting when ever the file name changes in variable and execute the flow by using build all option means.. . 

Why it's need to save manually in settings to reflect the changes

Can you help on these

0 Kudos
AlexGo
Dataiker

Because of the file type, you need to make sure you have 'Force Build All Dependencies' selected, otherwise it will just try to build only where it's missing files.

Screen Shot 2022-04-13 at 9.34.54 AM.png

 

You shouldn't have to go into the Settings and click Test for anything other than the preview.

The Force Build Dependencies will also be needed if you are building a Scenario

Screen Shot 2022-04-13 at 9.34.12 AM.png

 

0 Kudos
SUSHIL
Level 3
Author

Hi, 

Thanks for quick response. 

If the flow is having any data preparation after that means it's working fine when we select force rebuild and chages are reflecting on explore screen. 

The problem is when we go to dataset which is just before to dataprepation recipe, in that when click on explore the changes is not reflecting in screen. 

Manually we need to go to settings and  go to format and preview  tab and click on preview and save. 

Then only it reflecting the changes in explore  screen. Till that it show only last fetch data in screen. 

 

 

0 Kudos
AlexGo
Dataiker

I assume this won't matter too much if you are automating the process?

If it does then can you try going into the dataset and selecting 'Configure Sample' and checking the 'Auto refresh sample' checkbox at the bottom? See if that helps.

Screen Shot 2022-04-13 at 10.28.11 AM.png

 

0 Kudos