My BQ table becomes unavailable when my workflow is running

kccruz
Level 1
My BQ table becomes unavailable when my workflow is running

Hi. I have a straightforward workflow that is scheduled to run hourly. The issue I'm currently experiencing is that the output tables become unqueryable / unusable while the workflow is running. Any information on why this happens and how I can fix this? Thanks!

The workflow goes: SQL Query --> prepare recipe --> save to google cloud storage --> sync to BQ table historical --> sql query recipe --> BQ table historical daily (getting the recent snapshot only -- so a subset of the historical table).

0 Kudos
1 Reply
fchataigner2
Dataiker

Hi

for the flow you describe, at each run the output BQ table will be dropped, created anew, and filled again with a INSERT INTO from the historical table. So while the insert is running, the output BQ table is empty (and/or locked). If you need a 0 downtime scheme, you need to have a way to "swap" the table that you then query (outside DSS?). For example you can use views: load to table_in_the_back1, when it's ready (re)create view on table_in_the_back1, and use view for querying. On the next run, load to table_in_the_back2, when it's ready drop and recreate view on table_in_the_back2. That way the downtime is only the time needed to change the view. Then you alternate between table_in_the_back1 and table_in_the_back2.

0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku