setup a scenario whenever the data change is detected on a hive query from hive database.

sasidharp
sasidharp Registered Posts: 27 ✭✭✭✭

I have a hive server connection, whenever the data team add data into hive server table, the query result change in data in Dataiku, whenever the data is added, i want every dataset to get upgraded and built with missing values or the remaining values.

please share me the visual process if possible than a conceptual paragraph if possible.

Thanks in advance

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    Hi,

    there is a "trigger on SQL query change" type of trigger to initiate such scenarios. Typically a query like a select count(*) from ... is used to detect when rows are added to a table. But your wording seems to imply that you want only the new rows to be processed by DSS, not the full table, which is not really possible without partitioning. Is that the case?

  • sasidharp
    sasidharp Registered Posts: 27 ✭✭✭✭

    Yes, we are not partitioning he dataset and we need to add the new data, dropping and adding the whole data is kind of long term executing process.

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker

    attached a project with an example of a scenario to rebuild a flow.

    The hive table is named data_in_hive and is fed by the editable dataset + sync recipe (this is just test harness, to more easily update a hive table's contents)

    The hive table is read as hive dataset data_from_hive, and used by the flow.

    The scenario listens on the row count in data_in_hive and whenever it changes:

    - force-rebuilds the first dataset after the hive dataset

    - smart-rebuilds the rest of the flow, ie the scenario requests from DSS to rebuild the output dataset and all intermediary datasets that could be needed

Setup Info
    Tags
      Help me…