API to check managed SQL dataset schema consistency
Looking for an API to do the above.
This can be done manually by going to a managed SQL dataset > settings > connection > test, or by going to dataset > settings > schema > check now.
Internally from developer console, it seems like one of the two following private API's are being called.
/dip/api/datasets/managed-sql/test/ /dip/api/datasets/test-schema-consistency
Best Answer
-
API actually exists and I was able to use it successfully:
1. Instantiate project flow with get_flow()
2. call flow.start_tool(type="CHECK_CONSISTENCY") (docs)
3. call tool.update() to get a future. Note that options are required, I used `
options={
"recheckAll": True,
"datasets": {"consistencyWithData": True},
"recipes": {"schemaConsistency": True, "otherExpensiveChecks": True},
}4. call future.wait_for_result() that returns a dict of results.
5. Parse results and voila!
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,971 Neuron
I don't believe these are available on the API. How bad you want them? There might be a way to execute them but it will need some work and it won't be supported.
There is a Flkow Actions => Check Consistency option in the Flow which will run against all the datasets.
-
@Turribeach
yes, would like an API or automated method if at all possible.We're building automation around validations and requiring manual clicks in the flow is likely to be missed when there are so many projects & developers.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,971 Neuron
So I did get the SQL test connection private API working, which is not available via the public API. It might be possible to get these ones working. But first you should raise it with Dataiku Support to confirm these are not available via the public API. Then raise the Product Idea on the community site. Then we can discuss how to take it forward.
-
Hi @Turribeach
- I raised a Product Idea here https://community.dataiku.com/t5/Product-Ideas/Expose-public-API-to-test-schema-consistency-of-managed-SQL/idi-p/38203There might be some slight overlap with your idea to test connection https://community.dataiku.com/t5/Product-Ideas/Expose-Public-API-to-test-SQL-connections/idi-p/34644
Dataiku Support confirmed that API doesn't exist today:
https://support.dataiku.com/support/tickets/56243
Let me know how to proceed.
Much appreciated!
WH
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,971 Neuron
Yes the Test Connection is totally related (you didn't vote for it though, just click on the Up arrow to vote). A few months ago I started trying to use this API (see this post) and eventually I got it working. I will post it tomorrow. But you will need to see if you can make it work for these APIs.
-
many thanks @Turribeach
, looking forward to test your method. -
@Turribeach
wondering if you had a chance to post about using the API yet? -
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 1,971 Neuron
Hi apologies for the delay, haven't forgotten but I need to re-write this at home so taking a bit of time. I will post back here when done.
-
I used your solution:
tool = flow.start_tool(type='CHECK_CONSISTENCY') options = { "recheckAll": False, "datasets": {"consistencyWithData": True}, "recipes": {"schemaConsistency": False, "otherExpensiveChecks": False}, } future = tool.update(options) future.wait_for_result()
But I got check results on "recipes" with state as checked. I would like to avoid this behavior. Do you have any idea?