How to automate uploading local files to Dataiku dataset?
Frank
Registered Posts: 11 ✭✭✭✭
use case: you want to sync your local file with Dataiku dataset.
Tagged:
Answers
-
Hi Frank,
You should do a Python Custom Recipe in a plugin or scenario (I use to do it in scenario), with something similar to:
import dataiku
from dataikuapi import SyncRecipeCreator
from dataiku.scenario import Scenario
import pandas as pd
folder_path = "path/to/file/" # Don't forget the last /
file = "file.csv"
df = pd.read_csv(folder_path + file) # you should adapt the parameters
dataset = project.create_dataset(dataset_name, 'Filesystem', params={'connection': 'filesystem_root', 'path': folder_path + file}, formatType='csv', formatParams={'separator': ';', 'style': 'no_escape_no_quote', 'parseHeaderRow': True}) # here too
dataset.set_schema({'columns': [{'name': column, 'type': 'string'} for column in df.columns]}) # I use to set string and then change it
builder = SyncRecipeCreator("sync_output_dataset", project)
builder = builder.with_input(dataset_name)
builder = builder.with_output("output_dataset", append=False)
recipe = builder.build()
scenario.build_dataset("output_dataset", build_mode='NON_RECURSIVE_FORCED_BUILD') -
Hi Alan,
Thanks for the reply! Appreciate it!
when I run this "df = pd.read_csv(folder_path + file)"
I got this error: file does not exist
I think it actually tries to read from the DSS server's drive not my local file.
Any thoughts?
Thanks,
Frank -
Hi,
Yes you have to upload the file to DSS server (if you are using the REST API, you have to do it too :S