Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Added on June 19, 2019 8:58PM
Likes: 0
Replies: 5
Hi,
I am having trouble when creating a Dataset from an excel file. I am using the Rest API client in a Python recipe. I've suceeded in doing so from a csv file with this code :
client = dataiku.api_client()
project = client.get_project('******')
folder_path = '/Users/dmer/Desktop/*****/'
for file in os.listdir(folder_path):
if not file.endswith('.csv'):
continue
dataset = project.create_dataset(file[:-4] # dot is not allowed in dataset names
,'Filesystem'
, params={
'connection': 'filesystem_root'
,'path': folder_path + file
}, formatType='csv'
, formatParams={
'separator': ','
,'style': 'excel' # excel-style quoting
,'parseHeaderRow': True
})
df = pd.read_csv(folder_path + file)
dataset.set_schema({'columns': [{'name': column, 'type':'string'} for column in df.columns]})
When i change it for excel format :
client = dataiku.api_client()
project = client.get_project('*****')
folder_path = '/Users/dmer/Desktop/*****/'
for file in os.listdir(folder_path):
if not file.endswith('.xls'):
continue
dataset = project.create_dataset(file[:-4] # dot is not allowed in dataset names
,'Filesystem'
, params={
'connection': 'filesystem_root'
,'path': folder_path + file
}, formatType='xls'
, formatParams={
'separator': ','
,'style': 'excel' # excel-style quoting
,'parseHeaderRow': True
})
df = pd.read_excel(folder_path + file)
dataset.set_schema({'columns': [{'name': column, 'type':'string'} for column in df.columns]})
I get this exception :
DataikuException: com.dataiku.common.server.DKUControllerBase$MalformedRequestException: Could not parse a SerializedDataset from request body
Could you please help me ?
Thanx
Hi,
The best and simple way to populate a dataset is to:
1. create an empty "managed dataset" from the flow (one time visual operation),
2. add it as output to your recipe, and then
3. write to this output using one of the dataiku method such as .write_with_schema()
See https://doc.dataiku.com/dss/latest/code_recipes/python.html for more details.
Another option if you have many files would be to add all the files in a Dataiku Folder and then use the "files in folder" functionality. (https://doc.dataiku.com/dss/latest/connecting/files-in-folder.html)
Cheers,
Alex