Excel (xlsx) file with multiple sheets as input in flow

SUSHIL
SUSHIL Registered Posts: 22 ✭✭✭

Hi,

I have created a flow which has excel file xlsx as input dataset.

In single excel file I have multiple sheet.

Based on multiple sheets I have created 5 different dataset and completed my flow.

The problem here is I want to upload files on monthly basis.

So every time manually I want to remove the existing file and upload the new file to proceed further which is not automated and progressive one.

Is there any way that i can run my flow by passing input file name as parameter based on that it can read file from the folder and use the existing dataset to run the flow.

Use case.

Input file: single excel file with multiple sheets

Upload monthly basis.

Create and store the data into table for each sheet seperately.

Need to automate the process read the input file from folder

Can you provide me the steps as solution.

Tagged:

Answers

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker

    Hi Sushil,

    There are a few ways of doing this depending on how automated you want this and how you are uploading the files.

    Firstly, you can use Scenarios (https://knowledge.dataiku.com/latest/courses/automation/scenarios-summary.html) which will run parts of a flow and can be triggered manually, by a dataset changing, or based on time.

    This will be a good option if you are just running the process after overwriting an existing uploaded dataset.

    If you want to use a managed folder and a variable (https://knowledge.dataiku.com/latest/kb/o16n/variables/index.html) to name the file then manually start the scenario, you can do this too.

    Step 1 - create a new variable for your filename in the project variables:

    Screen Shot 2022-04-12 at 3.40.43 PM.png

    Step 2 - replace the 'Files to Include' field in the 'Create dataset from folder' step with a variable.

    Screen Shot 2022-04-12 at 3.39.09 PM.png

    Anytime you load a new file to the folder you will just have to update the variable nam and you can then select 'build all' or run a Scenario to build the flow from this point.

    After you have done all these - if you want to package it all up nicely you may want to consider converting this into an Application (https://knowledge.dataiku.com/latest/kb/o16n/dataiku-applications/index.html) . This will give you a GUI to do things like uploading a Folder, changing the variables, and running the Scenario (and even showing dashboards or downloading result files).

    Screen Shot 2022-04-12 at 3.52.15 PM.png

    Hope that helps - let me know if you get stuck or need more detail.

  • SUSHIL
    SUSHIL Registered Posts: 22 ✭✭✭

    Hi,

    Thanks for detailed steps.

    I have used the managed folder option and by using variable name i replicated the steps fl provided by you.

    When running the build all, the job is getting completed but there was no changes when I see on summary.

    I need to go to settings and do the test means, it's showing the file name when I save manually then only reflecting.

    I was struck here why it was automatically not reflecting when ever the file name changes in variable and execute the flow by using build all option means.. .

    Why it's need to save manually in settings to reflect the changes

    Can you help on these

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker

    Because of the file type, you need to make sure you have 'Force Build All Dependencies' selected, otherwise it will just try to build only where it's missing files.

    Screen Shot 2022-04-13 at 9.34.54 AM.png

    You shouldn't have to go into the Settings and click Test for anything other than the preview.

    The Force Build Dependencies will also be needed if you are building a Scenario

    Screen Shot 2022-04-13 at 9.34.12 AM.png

  • SUSHIL
    SUSHIL Registered Posts: 22 ✭✭✭

    Hi,

    Thanks for quick response.

    If the flow is having any data preparation after that means it's working fine when we select force rebuild and chages are reflecting on explore screen.

    The problem is when we go to dataset which is just before to dataprepation recipe, in that when click on explore the changes is not reflecting in screen.

    Manually we need to go to settings and go to format and preview tab and click on preview and save.

    Then only it reflecting the changes in explore screen. Till that it show only last fetch data in screen.

  • AlexGo
    AlexGo Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 18 Dataiker

    I assume this won't matter too much if you are automating the process?

    If it does then can you try going into the dataset and selecting 'Configure Sample' and checking the 'Auto refresh sample' checkbox at the bottom? See if that helps.

    Screen Shot 2022-04-13 at 10.28.11 AM.png

Setup Info
    Tags
      Help me…