Check out Building a Data-Centric Culture at the ALMA Observatory on November 5th Read More

Check and reload schema automatically

Level 1
Check and reload schema automatically

Hi,

When using a scenario, the jobs will fail if a column has been added/deleted to one of the input tables (coming from an external Postgres DB). When this happens, I have to check the concerned table, go to settings, check the schema, reload and save it (cf screenshot).

Screenshot 2020-08-10 at 09.45.33.png

Is there any way to automate this process ? By using a specific step in the scenario ? I've seen something like "Run checks" or "Check project consistency" but I feel like these are only checks and won't reload the tables schemas.

Thanks in advance!

Nicolas

0 Kudos
3 Replies
Dataiker
Dataiker

Hi @Nicolas 

I think your query should be addressed here in the docs.

Good luck!

0 Kudos
Dataiker
Dataiker

automatically changing dataset schemas would be quite dangerous for DSS usage, with recipes or notebooks downstream failing, so DSS pushes the user to take action (manually) and make sure the dataset's settings are correct.

If you know the added/deleted columns, and when there is a change, you can use the Python API or public API in a "execute python code" step to alter the dataset's settings. Since the dataset is a postgres table, you can also fetch the schema using https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.SQLExecutor2.query_to_iter because the returned object has a get_schema() method (no need to iterate the rows)

0 Kudos
Level 1
Author

I see! I thought it'd be possible to do it via the UI but I'll need to use the API in order to do so.

Thanks!

0 Kudos