How to automate uploading local files to Dataiku dataset?

Highlighted
Frank
Level 2
How to automate uploading local files to Dataiku dataset?
use case: you want to sync your local file with Dataiku dataset.
3 Replies
Alan_Fusté
Level 3
Re: How to automate uploading local files to Dataiku dataset?

Hi Frank,



You should do a Python Custom Recipe in a plugin or scenario (I use to do it in scenario), with something similar to:




import dataiku
from dataikuapi import SyncRecipeCreator
from dataiku.scenario import Scenario
import pandas as pd

folder_path = "path/to/file/" # Don't forget the last /
file = "file.csv"

df = pd.read_csv(folder_path + file) # you should adapt the parameters

dataset = project.create_dataset(dataset_name, 'Filesystem', params={'connection': 'filesystem_root', 'path': folder_path + file}, formatType='csv', formatParams={'separator': ';', 'style': 'no_escape_no_quote', 'parseHeaderRow': True}) # here too

dataset.set_schema({'columns': [{'name': column, 'type': 'string'} for column in df.columns]}) # I use to set string and then change it

builder = SyncRecipeCreator("sync_output_dataset", project)
builder = builder.with_input(dataset_name)
builder = builder.with_output("output_dataset", append=False)
recipe = builder.build()

scenario.build_dataset("output_dataset", build_mode='NON_RECURSIVE_FORCED_BUILD')


 

Frank
Level 2
Re: How to automate uploading local files to Dataiku dataset?
Hi Alan,
Thanks for the reply! Appreciate it!
when I run this "df = pd.read_csv(folder_path + file)"
I got this error: file does not exist
I think it actually tries to read from the DSS server's drive not my local file.
Any thoughts?
Thanks,
Frank
0 Kudos
Alan_Fusté
Level 3
Re: How to automate uploading local files to Dataiku dataset?
Hi,

Yes you have to upload the file to DSS server (if you are using the REST API, you have to do it too :S
0 Kudos
Labels (2)