Reading multiple flat files and loading in Database tables
Here is my requirement:
1. On daily basis we are receiving multiple flat files in a folder
2, Each of the flat file has records in csv format
3. For each of the daily dump number of files and file names will be same
4. We need to load records from these files in postgresql database
5. Table Name should be same as filenames i.e. there will be one to one mapping between table name and filename
6. We need to overwrite previous records in tables with new records which are received on daily basis
Please suggest solution for above problem statement. Solution Flow with visual recipe is preferable
Operating system used: Windows
Answers
-
Emma Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 52 Dataiker
Hey @chiraggiri
,You can accomplish this task using visual recipes as follows:
From the Flow > +DATASET / Internal / Files from Folder > Read from: "YOUR FOLDER NAME HERE".
If all of the files have the same schema you can leave File selection: ALL, if they do not you will need to perform this operation for each file, File selection: Explicitly select files. (See the first screenshot). When new files are dropped into the Folder, the "Files from Folder" dataset will automatically update. IF the names are the same then there will be an override and new data is shown - if not, the data will be appended.
Then, from the newly created dataset you will use a SYNC recipe to move the dataset from local filesystem to Postgres. (See the second screenshot).
Once you have set up the Flow you will need to automate the SYNC process to run daily using a Scenario.
Hope that helps,
Emma