Updating Datasets from 'in flow' folder

Solved!
ennpi
Level 1
Updating Datasets from 'in flow' folder

Hello,

I created a folder in a flow and manually uploaded csv files to it, I then created a dataset from that folder (by going into the folder and right click => create dataset).

 What I want is that when I upload a file with the same name in the same folder, for that dataset to be recreated with from the new uploaded file (only rows content is changed).

When I try and rebuild the dataset, nothing changes.

How can I achieve that?

thanks

0 Kudos
1 Solution
StanG
Dataiker

Hi,

Actually you don't need to build the dataset to get the changes. The dataset is automatically updated with the new file you upload to the folder.
It's just that the sample preview is not automatically reloaded. You can go the Configure sample > Save and refresh sample to see the changes after the file upload. 

View solution in original post

0 Kudos
4 Replies
StanG
Dataiker

Hi,

Actually you don't need to build the dataset to get the changes. The dataset is automatically updated with the new file you upload to the folder.
It's just that the sample preview is not automatically reloaded. You can go the Configure sample > Save and refresh sample to see the changes after the file upload. 

0 Kudos
Kenza29
Level 2

You can go the Configure sample > Save and refresh sample to see the changes after the file upload.  --? can we do this with a python script in webapp ? 

I have the same issue, I am updating a managed folder from my webapp and I want to refresh and save through the webapp without going back to the flow in my project

Please if you have any solution to propose it would be great

Thank you!

0 Kudos
StanG
Dataiker

Hi !
As it is explained here (https://doc.dataiku.com/dss/latest/explore/sampling.html#refreshing-the-sample), you would need to rebuild the dataset in order to refresh the sample.
However, a Dataset created from a Folder is not buildable because there is no recipe that goes from the Folder to the Dataset. You would need to create a Sync recipe after the Dataset that can be rebuilt using the API each time you need to refresh the sample.

To run a recipe using the API, you can do:

project = dataiku.api_client().get_default_project()
recipe = project.get_recipe(RECIPE_NAME)
recipe.run()
0 Kudos
Kenza29
Level 2

Alright thanks a lot, this works very good.

I unfortunately when trying to upload different files on different managed folder through webapp

I am using this script in order to clear the folder and upload a new file on it :

@app.route('/upload-to-dss', methods = ['POST'])
def upload_to_dss():
mf = dataiku.Folder('folder1') # name of the folder in the flow
path = mf.get_path()
list_dir = os.listdir(path)
for filename in list_dir:
file_path = os.path.join(path, filename)
#If the element is a file
if os.path.isfile(file_path) or os.path.islink(file_path):
print("deleting file:", file_path)
os.unlink(file_path)
#In case is a folder
elif os.path.isdir(file_path):
print("deleting folder:", file_path)
shutil.rmtree(file_path)
f = request.files.get('file')
target_path = '/%s' % f.filename
mf.upload_stream(target_path, f)
return json.dumps({"status":"ok"})

 

 

and I split this code as many times as the number of folders, but the code recognize only the first file and upload it in the first folder. for the others I can't see the new files that are registred.

I s there a way to manage different folders at a time and upload different kind of files on them ? 

 

Thank you so much

0 Kudos