Changing datasets from filesystem_folders issue, won't register new dataset.
I am not hooked into an SFTP or database for Dataiku, I upload files manually right now. When I remove a dataset, and add a dataset, it does not register until I go to the output dataset —> settings —> and then click Test & Get Schema. I tried to set a trigger to run when the dataset in that folder gets changed, and it also is not working. Has anyone else ran into this issue, and is it because I'm using flat files and not plugged into anything?
Dataiku version used: 14.4.1
Best Answer
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,699 NeuronYes, this is expected. You really can't expect files to change format and that not to cause trouble. Having said that you can use a Python code recipe to read a file from a managed folder with an arbitrary format and write it to the output dataset without having to detect the schema. This however will NOT be the end of your problems. If you change a dataset schema you are going to have an impact on the rest of the flow so you will have to propagate schema changes and see what breaks.
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023, Circle Member Posts: 2,699 NeuronDon't upload files to a dataset, these become linked to the dataset and are harder to update. Instead create a new Dataiku managed folder and then add the files to the folder. Then add a files in folder dataset to see the data. Not only will allow you consolidate all files of the same structure but also quickly change files as needed or use a complex file selection criteria. And you can also add file metadata very easily as shown on my post.
-
@Turribeach, I have both of those, but the problem I run into is uploading a new file does not change the schema until I go to the files in folder dataset and hit test schema. Is that expected functionality?