How to automate uploading local files to Dataiku dataset?

Options
Frank
Frank Registered Posts: 11 ✭✭✭✭
use case: you want to sync your local file with Dataiku dataset.

Answers

  • Alan_Fusté
    Alan_Fusté Partner, Registered Posts: 43 Partner
    edited July 18
    Options

    Hi Frank,

    You should do a Python Custom Recipe in a plugin or scenario (I use to do it in scenario), with something similar to:


    import dataiku
    from dataikuapi import SyncRecipeCreator
    from dataiku.scenario import Scenario
    import pandas as pd

    folder_path = "path/to/file/" # Don't forget the last /
    file = "file.csv"

    df = pd.read_csv(folder_path + file) # you should adapt the parameters

    dataset = project.create_dataset(dataset_name, 'Filesystem', params={'connection': 'filesystem_root', 'path': folder_path + file}, formatType='csv', formatParams={'separator': ';', 'style': 'no_escape_no_quote', 'parseHeaderRow': True}) # here too

    dataset.set_schema({'columns': [{'name': column, 'type': 'string'} for column in df.columns]}) # I use to set string and then change it

    builder = SyncRecipeCreator("sync_output_dataset", project)
    builder = builder.with_input(dataset_name)
    builder = builder.with_output("output_dataset", append=False)
    recipe = builder.build()

    scenario.build_dataset("output_dataset", build_mode='NON_RECURSIVE_FORCED_BUILD')

  • Frank
    Frank Registered Posts: 11 ✭✭✭✭
    Options
    Hi Alan,
    Thanks for the reply! Appreciate it!
    when I run this "df = pd.read_csv(folder_path + file)"
    I got this error: file does not exist
    I think it actually tries to read from the DSS server's drive not my local file.
    Any thoughts?
    Thanks,
    Frank
  • Alan_Fusté
    Alan_Fusté Partner, Registered Posts: 43 Partner
    Options
    Hi,

    Yes you have to upload the file to DSS server (if you are using the REST API, you have to do it too :S
Setup Info
    Tags
      Help me…