Is it possible to extract multiple datasets from one workbook

Options
pyousefi
pyousefi Registered Posts: 10 ✭✭✭✭

I have a sizeable excel document with around 10 sheets with different columns and different data.

Is there any way for me to use the same backend uploaded file to extract different datasets or do I have to upload the Excel workbook each time?

Answers

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,595 Neuron
    Options

    @pyousefi

    Screenshot-2020-03-10-at-15.54.14

    I think there is a newish plug-in for this called Excel sheet importer. It is listed as working with DSS V7.0

    https://www.dataiku.com/product/plugins/excel-sheet-importer/

    I've not used this personally. However, it looks cool. Will this do what you need?

    I invite you to let us know how you get on with this. If you run into problems let us know.

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 317 Neuron
    Options

    In addition to the plug-in @tgb417
    suggested, the following approach is an option. Although it doesn't require a plug-in, it may be less flexible.

    Create a folder to upload the Excel file to (new Dataset, "Folder", store into filesystem folders). Upload the file to that folder.

    Then from the vertical 3 dots menu for the file (hover over the file to see it), choose Create a Dataset. Click the Test button and then click on the Format / Preview tab. You can choose the Excel tab here. Create the dataset. Repeat this process for other Excel tabs.

    Marlan

  • Diwei
    Diwei Registered Posts: 5 ✭✭✭✭
    Options

    Hi @pyousefi
    ,

    It is possible. Instead of creating plenty datasets, I personally prefere to extarct every worksheet in python directly.

    1. Create a folder XXXXXXX

    2. Drop the EXCEL.xlsx into it

    3. Take the below codes in the Python, then u are able to extract all the worksheets:

    ***********************************************************************

    # Read recipe inputs
    folder = dataiku.Folder("XXXXXXX")
    with folder .get_download_stream('EXCEL.xlsx') as file:
    data = file.read()
    df_1 =pd.read_excel(data,sheet_name='worksheet_1', engine='openpyxl')
    df_2 =pd.read_excel(data,sheet_name='worksheet_2', engine='openpyxl')

    ...

    ***********************************************************************

    u can add as many as u wish.

Setup Info
    Tags
      Help me…