Importer un fichier CSV
Bonjour,
Je souhaite importer un fichier csv depuis DSS. Je connais la méthode manuelle, mais si je fichier est mis à jour les mises à jour ne sont pas prise en compte dans DSS.
Y a t-il un moyen pour faire en sorte que lorsque le job d'import est relancé, les modifications apportées dans le fichier soient prises en compte ?
Merci d'avance pour vos réponses.
Operating system used: Windows
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Bonjour @Datause
,
If I understand correctly, you have a local CSV file where you sometimes make local modifications to the file, and want those modifications to then be reflected in DSS. For an uploaded dataset specifically, the data is static unless a new file is uploaded to the uploaded dataset. This is generally because you are making changes locally (i.e. on your own computer), so the DSS server won't know about any changes until you manually update the dataset on the DSS server.
There are a number of automated ways to pick up data in DSS, but it depends on your workflow what will make the most sense. How does your source file end up getting modified? Is there any automated process that updates the CSV file or are you manually modifying it locally?
Ultimately I think the real problem you're facing is likely how to automatically get your updates to a location where DSS can automatically pick them up, so you don't need to keep manually updating an uploaded dataset yourself. On that front, I wonder if you can pull whatever process you are currently doing that updates the CSV file into DSS itself, so that everything can more easily be automated? Let me know if that makes sense, and if you can give any details on the process that currently leads to the updated CSV file so that we might provide some suggestions on how to accomplish the steps in DSS.
On the more specific note of ensuring that changes to a CSV file are reflected in DSS, once you are able to get your files to a location that DSS can read from (i.e. S3, the DSS server, HDFS) you can use a Managed Folder to point to the storage location, and then create a dataset from the managed folder:Once a file is updated in the underlying managed folder location (i.e. in S3, your Hadoop filesystem), then those changes will automatically be propagated in DSS and to the dataset, "testdataset" in this example.
I hope that information makes sense. Please feel free to provide details on your current workflow and use case and what's changing the CSV file, and I would be happy to provide some more specific suggestions.
Thanks,
Sarina -
tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,601 Neuron
@SarinaS
's suggestions make a lot of sense. The other thing to consider is what kinds of changes are occuring.- Add - Adding new records
- Delete - Deleting existing records
- Update - changing current records
If you for example can guarantee that you will only be adding records. Your process may be easier to automate. DSS can handle a managed folder with more than one file. If the files all come with the same layout. The process is much easier you just need to do an append of all of the daily data sets.
However if you have changes for Delete & Update the process can be significantly more challenging. If those cases it can be very helpful if you have a guaranteed unique key for all records. So that you know which records need to be Deleted or Updated.
Good luck with your project. Let us know how you are getting on. Hope the Google Translate is helpful.
----------------
Les suggestions de @SarinaS ont beaucoup de sens. L'autre chose à considérer est les types de changements qui se produisent.
- Ajouter - Ajout de nouveaux enregistrements
- Supprimer - Suppression d'enregistrements existants
- Mise à jour - modification des enregistrements actuels
Si vous par exemple, vous pouvez garantir que vous n'ajouterez que des enregistrements. Votre processus peut être plus facile à automatiser. DSS peut gérer un dossier géré avec plusieurs fichiers. Si les fichiers viennent tous avec la même mise en page. Le processus est beaucoup plus simple, il vous suffit de faire un ajout de tous les ensembles de données quotidiens.
Cependant, si vous avez des modifications pour supprimer et mettre à jour, le processus peut être beaucoup plus difficile. Dans ces cas, il peut être très utile d'avoir une clé unique garantie pour tous les enregistrements. Pour que vous sachiez quels enregistrements doivent être supprimés ou mis à jour.
Bonne chance pour votre projet. Faites-nous savoir comment vous vous en sortez. J'espère que Google Translate est utile.