Is it possible to extract multiple datasets from one workbook
I have a sizeable excel document with around 10 sheets with different columns and different data.
Is there any way for me to use the same backend uploaded file to extract different datasets or do I have to upload the Excel workbook each time?
Answers
-
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
I think there is a newish plug-in for this called Excel sheet importer. It is listed as working with DSS V7.0
https://www.dataiku.com/product/plugins/excel-sheet-importer/
I've not used this personally. However, it looks cool. Will this do what you need?
I invite you to let us know how you get on with this. If you run into problems let us know.
-
Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron
In addition to the plug-in @tgb417
suggested, the following approach is an option. Although it doesn't require a plug-in, it may be less flexible.Create a folder to upload the Excel file to (new Dataset, "Folder", store into filesystem folders). Upload the file to that folder.
Then from the vertical 3 dots menu for the file (hover over the file to see it), choose Create a Dataset. Click the Test button and then click on the Format / Preview tab. You can choose the Excel tab here. Create the dataset. Repeat this process for other Excel tabs.
Marlan
-
Hi @pyousefi
,It is possible. Instead of creating plenty datasets, I personally prefere to extarct every worksheet in python directly.
1. Create a folder XXXXXXX
2. Drop the EXCEL.xlsx into it
3. Take the below codes in the Python, then u are able to extract all the worksheets:
***********************************************************************
# Read recipe inputs
folder = dataiku.Folder("XXXXXXX")
with folder .get_download_stream('EXCEL.xlsx') as file:
data = file.read()
df_1 =pd.read_excel(data,sheet_name='worksheet_1', engine='openpyxl')
df_2 =pd.read_excel(data,sheet_name='worksheet_2', engine='openpyxl')...
***********************************************************************
u can add as many as u wish.