Check and reload schema automatically

Options
Nicolas
Nicolas Registered Posts: 2 ✭✭✭

Hi,

When using a scenario, the jobs will fail if a column has been added/deleted to one of the input tables (coming from an external Postgres DB). When this happens, I have to check the concerned table, go to settings, check the schema, reload and save it (cf screenshot).

Screenshot 2020-08-10 at 09.45.33.png

Is there any way to automate this process ? By using a specific step in the scenario ? I've seen something like "Run checks" or "Check project consistency" but I feel like these are only checks and won't reload the tables schemas.

Thanks in advance!

Nicolas

Best Answer

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Answer ✓
    Options

    automatically changing dataset schemas would be quite dangerous for DSS usage, with recipes or notebooks downstream failing, so DSS pushes the user to take action (manually) and make sure the dataset's settings are correct.

    If you know the added/deleted columns, and when there is a change, you can use the Python API or public API in a "execute python code" step to alter the dataset's settings. Since the dataset is a postgres table, you can also fetch the schema using https://doc.dataiku.com/dss/latest/python-api/sql.html#dataiku.SQLExecutor2.query_to_iter because the returned object has a get_schema() method (no need to iterate the rows)

Answers

Setup Info
    Tags
      Help me…