Updating Datasets from 'in flow' folder

ennpi
ennpi Registered Posts: 1 ✭✭✭

Hello,

I created a folder in a flow and manually uploaded csv files to it, I then created a dataset from that folder (by going into the folder and right click => create dataset).

What I want is that when I upload a file with the same name in the same folder, for that dataset to be recreated with from the new uploaded file (only rows content is changed).

When I try and rebuild the dataset, nothing changes.

How can I achieve that?

thanks

Best Answer

  • StanG
    StanG Dataiker, Registered Posts: 52 Dataiker
    Answer ✓

    Hi,

    Actually you don't need to build the dataset to get the changes. The dataset is automatically updated with the new file you upload to the folder.
    It's just that the sample preview is not automatically reloaded. You can go the Configure sample > Save and refresh sample to see the changes after the file upload.

Answers

  • Kenza29
    Kenza29 Registered Posts: 7 ✭✭✭

    You can go the Configure sample > Save and refresh sample to see the changes after the file upload. --? can we do this with a python script in webapp ?

    I have the same issue, I am updating a managed folder from my webapp and I want to refresh and save through the webapp without going back to the flow in my project

    Please if you have any solution to propose it would be great

    Thank you!

  • StanG
    StanG Dataiker, Registered Posts: 52 Dataiker
    edited July 17

    Hi !
    As it is explained here (https://doc.dataiku.com/dss/latest/explore/sampling.html#refreshing-the-sample), you would need to rebuild the dataset in order to refresh the sample.
    However, a Dataset created from a Folder is not buildable because there is no recipe that goes from the Folder to the Dataset. You would need to create a Sync recipe after the Dataset that can be rebuilt using the API each time you need to refresh the sample.

    To run a recipe using the API, you can do:

    project = dataiku.api_client().get_default_project()
    recipe = project.get_recipe(RECIPE_NAME)
    recipe.run()
  • Kenza29
    Kenza29 Registered Posts: 7 ✭✭✭

    Alright thanks a lot, this works very good.

    I unfortunately when trying to upload different files on different managed folder through webapp

    I am using this script in order to clear the folder and upload a new file on it :

    @app.route('/upload-to-dss', methods = ['POST'])
    def upload_to_dss():
    mf = dataiku.Folder('folder1') # name of the folder in the flow
    path = mf.get_path()
    list_dir = os.listdir(path)
    for filename in list_dir:
    file_path = os.path.join(path, filename)
    #If the element is a file
    if os.path.isfile(file_path) or os.path.islink(file_path):
    print("deleting file:", file_path)
    os.unlink(file_path)
    #In case is a folder
    elif os.path.isdir(file_path):
    print("deleting folder:", file_path)
    shutil.rmtree(file_path)
    f = request.files.get('file')
    target_path = '/%s' % f.filename
    mf.upload_stream(target_path, f)
    return json.dumps({"status":"ok"})

    and I split this code as many times as the number of folders, but the code recognize only the first file and upload it in the first folder. for the others I can't see the new files that are registred.

    I s there a way to manage different folders at a time and upload different kind of files on them ?

    Thank you so much

Setup Info
    Tags
      Help me…