Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 14, 2019 3:31PM
Likes: 0
Replies: 4
Hi,
If columns are different across sheets, you will need to upload the dataset multiple times for each set of sheets with the same columns. Another option would be to put the file in a managed folder and use several files-in-folder datasets. Then you can use a stack recipes to remap the columns as necessary.
Alternatively, you could read the file with python from a DSS managed folder as such:
import dataiku
import pandas as pd, numpy as np
import os
folder_path = dataiku.Folder("MYFOLDER").get_path()
excel_path = os.path.join(folder_path, "MYFILE.xlsx")
sheet_list = ["SHEET1", "SHEET2"]
df_dict = {
k: pd.read_excel(excel_path, sheet_name = k, engine = "xlrd")
for k in sheet_list
}
for k, df in df_dict.items():
df["origin_sheet_name"] = k
df_stacked = pd.concat([v for k,v in df_dict.items()], axis = 0, ignore_index = True, sort = False)
Hope it helps,
Alex
Hi,
Updating this thread to let you know that Dataiku know lets you use the sheet name of an Excel as a column in the dataset. When you configure the format of an uploaded Excel, there is a checkbox 'Add the sheet name as an output column' which will do just that. The steps are outlined in this tutorial.
I hope it helps!
Cheers,
Ashley