Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

My BQ table becomes unavailable when my workflow is running

kccruz
Level 1
My BQ table becomes unavailable when my workflow is running

Hi. I have a straightforward workflow that is scheduled to run hourly. The issue I'm currently experiencing is that the output tables become unqueryable / unusable while the workflow is running. Any information on why this happens and how I can fix this? Thanks!

The workflow goes: SQL Query --> prepare recipe --> save to google cloud storage --> sync to BQ table historical --> sql query recipe --> BQ table historical daily (getting the recent snapshot only -- so a subset of the historical table).

0 Kudos
1 Reply
fchataigner2
Dataiker

Hi

for the flow you describe, at each run the output BQ table will be dropped, created anew, and filled again with a INSERT INTO from the historical table. So while the insert is running, the output BQ table is empty (and/or locked). If you need a 0 downtime scheme, you need to have a way to "swap" the table that you then query (outside DSS?). For example you can use views: load to table_in_the_back1, when it's ready (re)create view on table_in_the_back1, and use view for querying. On the next run, load to table_in_the_back2, when it's ready drop and recreate view on table_in_the_back2. That way the downtime is only the time needed to change the view. Then you alternate between table_in_the_back1 and table_in_the_back2.

0 Kudos