Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi,
If columns are different across sheets, you will need to upload the dataset multiple times for each set of sheets with the same columns. Another option would be to put the file in a managed folder and use several files-in-folder datasets. Then you can use a stack recipes to remap the columns as necessary.
Alternatively, you could read the file with python from a DSS managed folder as such:
import dataiku
import pandas as pd, numpy as np
import os
folder_path = dataiku.Folder("MYFOLDER").get_path()
excel_path = os.path.join(folder_path, "MYFILE.xlsx")
sheet_list = ["SHEET1", "SHEET2"]
df_dict = {
k: pd.read_excel(excel_path, sheet_name = k, engine = "xlrd")
for k in sheet_list
}
for k, df in df_dict.items():
df["origin_sheet_name"] = k
df_stacked = pd.concat([v for k,v in df_dict.items()], axis = 0, ignore_index = True, sort = False)
Hope it helps,
Alex