Community Conundrum 25:Feature Visualization is now live! Read More

How to automate uploading local files to Dataiku dataset?

Level 2
How to automate uploading local files to Dataiku dataset?
use case: you want to sync your local file with Dataiku dataset.
3 Replies
Level 3
Level 3

Hi Frank,



You should do a Python Custom Recipe in a plugin or scenario (I use to do it in scenario), with something similar to:




import dataiku
from dataikuapi import SyncRecipeCreator
from dataiku.scenario import Scenario
import pandas as pd

folder_path = "path/to/file/" # Don't forget the last /
file = "file.csv"

df = pd.read_csv(folder_path + file) # you should adapt the parameters

dataset = project.create_dataset(dataset_name, 'Filesystem', params={'connection': 'filesystem_root', 'path': folder_path + file}, formatType='csv', formatParams={'separator': ';', 'style': 'no_escape_no_quote', 'parseHeaderRow': True}) # here too

dataset.set_schema({'columns': [{'name': column, 'type': 'string'} for column in df.columns]}) # I use to set string and then change it

builder = SyncRecipeCreator("sync_output_dataset", project)
builder = builder.with_input(dataset_name)
builder = builder.with_output("output_dataset", append=False)
recipe = builder.build()

scenario.build_dataset("output_dataset", build_mode='NON_RECURSIVE_FORCED_BUILD')


 

Level 2
Author
Hi Alan,
Thanks for the reply! Appreciate it!
when I run this "df = pd.read_csv(folder_path + file)"
I got this error: file does not exist
I think it actually tries to read from the DSS server's drive not my local file.
Any thoughts?
Thanks,
Frank
0 Kudos
Level 3
Level 3
Hi,

Yes you have to upload the file to DSS server (if you are using the REST API, you have to do it too :S
0 Kudos
Labels (2)