Inconsistent Schema in SFTP folder
I am trying to pull in an SFTP folder for my project in DataIKU however there's an error with the schema and this is inconsistent across all files in the folder. My problem is, our SFTP folder pulls from a 3rd party, meaning I can't access to amend the schema and I am cautious that when more files are added (this is a daily report run) they may be in the wrong schema - which would cause my flow to break. We have alerted the 3rd party of the inconsistency in schema however I would like this flow to be "future proof". Any ideas on how to approach this from DataIKU angle?
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,073 Neuron
If the input file is going to change and add new fields then you should look at loading the file using Python so that you can handle new fields in code and drop them or push them automatically upstrem in your flow. You can push schema changes using Dataiku's built-in schema propagation feature calling the Dataiki Python API or via the GUI. But even doing all this is not a guarantee that your flow won't break. If data types change, if key columns are removed or even adding a new column in certain cases can cause your flow to break. So I am afraid there are no silver bullets here. Changing the data inputs is not something trivial.