My BQ table becomes unavailable when my workflow is running

Options
kccruz
kccruz Registered Posts: 1 ✭✭✭

Hi. I have a straightforward workflow that is scheduled to run hourly. The issue I'm currently experiencing is that the output tables become unqueryable / unusable while the workflow is running. Any information on why this happens and how I can fix this? Thanks!

The workflow goes: SQL Query --> prepare recipe --> save to google cloud storage --> sync to BQ table historical --> sql query recipe --> BQ table historical daily (getting the recent snapshot only -- so a subset of the historical table).

Answers

  • fchataigner2
    fchataigner2 Dataiker Posts: 355 Dataiker
    Options

    Hi

    for the flow you describe, at each run the output BQ table will be dropped, created anew, and filled again with a INSERT INTO from the historical table. So while the insert is running, the output BQ table is empty (and/or locked). If you need a 0 downtime scheme, you need to have a way to "swap" the table that you then query (outside DSS?). For example you can use views: load to table_in_the_back1, when it's ready (re)create view on table_in_the_back1, and use view for querying. On the next run, load to table_in_the_back2, when it's ready drop and recreate view on table_in_the_back2. That way the downtime is only the time needed to change the view. Then you alternate between table_in_the_back1 and table_in_the_back2.

Setup Info
    Tags
      Help me…