In situations where the dataset to be built and all dependencies of that dataset are SQL datasets, Smart Reconstruction does not actually rebuild any of the datasets (after the initial build). Per Clément_Stenac's post here, this is because there is no general way to determine if the data in a SQL table has changed.
While this behavior is certainly understandable, it is not typically what the user desires. At least that is our experience. In our use of DSS, flows of all SQL datasets are very common.
We most commonly run into this when creating build steps in Scenarios. Users don't realize (or forget) that they need change the build option from the default "Build required datasets" (equivalent to Smart Reconstruction) to "Force re-build dataset and dependencies".
As a result we have have a number of instances of Scenarios happily running for some time before the developer realizes that none of the tables they thought were getting refreshed actually were NOT getting refreshed. Often it is a customer telling us which is particularly embarrassing.
So the request is to improve the handling of rebuilding of flows with SQL dataset inputs. The goal is to avoid the experience of Scenarios running successfully but none of the datasets are getting updated.
It's not obvious though how to do this. Here are some ideas:
Make "force re-build" rather than "build required" the default in Scenario build steps (Scenarios are where the impacts are most problematic). This is perhaps overkill but it would solve the problem we experience.
As a variation on #1, enable companies to set the default build type for build steps in Scenarios. That way, companies like ours who mostly use SQL datasets can change the default to "force re-build".
Enhance the smart reconstruction / build required functionality to use force rebuild when all dependencies are SQL datasets.
For external SQL datasets, enable user to provide a query that will change when the underlying table is update. This would be equivalent to the SQL query trigger functionality in Scenarios. This could default to a row count (SELECT COUNT(*) FROM TABLE).
Of these options, #3 is my preferred option. This will work for flows that start with a SQL Script recipe (as ours often do since we may reference dozens of tables from our database and creating datasets for all of those would be big pain) rather than datasets pointing to external SQL tables.
Thanks to @sridarvs for posting about this issue. I had been thinking about it already but seeing another company running into the same issue prompted me to write up the idea.
Thanks for considering this.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.