Check and reload schema automatically
Hi,
When using a scenario, the jobs will fail if a column has been added/deleted to one of the input tables (coming from an external Postgres DB). When this happens, I have to check the concerned table, go to settings, check the schema, reload and save it (cf screenshot).
Is there any way to automate this process ? By using a specific step in the scenario ? I've seen something like "Run checks" or "Check project consistency" but I feel like these are only checks and won't reload the tables schemas.
Thanks in advance!
Nicolas
Best Answer
-
automatically changing dataset schemas would be quite dangerous for DSS usage, with recipes or notebooks downstream failing, so DSS pushes the user to take action (manually) and make sure the dataset's settings are correct.
If you know the added/deleted columns, and when there is a change, you can use the Python API or public API in a "execute python code" step to alter the dataset's settings. Since the dataset is a postgres table, you can also fetch the schema using https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.SQLExecutor2.query_to_iter because the returned object has a get_schema() method (no need to iterate the rows)
Answers
-
-
I see! I thought it'd be possible to do it via the UI but I'll need to use the API in order to do so.
Thanks!