Seeking Guidance on Programmatically Obtaining BigQuery Credentials
Hello Dataiku Community,
I have been exploring the integration of Google BigQuery with Dataiku, and I'm currently faced with a challenge regarding the programmatic acquisition of BigQuery credentials.
Specifically, I'm looking for recommendations on how to obtain BigQuery credentials programmatically. While I understand that manually generating and managing service account keys is one approach, I'm interested in exploring alternatives that align with a more automated and scalable workflow.
I want to use the BigQuery Python client and init a BigQuery client with the Dataiku BigQuery credential setup in "Profile & Settings" --> "Credentials".
If any of you have prior experience or insights on programmatically acquiring BigQuery credentials, I would greatly appreciate your answer.
Thank you in advance for your time and expertise.
Best regards,
Thomas
Answers
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron
Why don't you generate the service account keys on the fly and set them in the connection properties via the Dataiku Python API? The service account keys files need to be accesible to the Dataiku server but could be coming from a filer mount for instance. So for Dataiku these would look like standard service account keys but you could provisioning/generating them from your automated workflow.
-
thomaslprru Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 10 ✭
Thank you for your answer.
Yes of course, using an account service is a good way of doing it. The only problem is that it hides the identity of the person running the workflow. I'm going to use this alternative until I find a better option.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron
What are you trying to achieve exactly? There are ways to pass down the user ID if required but this may not necesarily fit when you deploy projects to the Automation node where they should be running in unattended mode and with a generic account.
-
thomaslprru Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 10 ✭
The aim is to check different things about the BigQuery tables and call different kind of queries. This project will be a unique scenario called only from DSS environment. That's why, knowing identity of people using this script can help a lot. I will start using only account service credential and check later if we can improve the scenario using personal credentials.
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron
It's still not clear to me what your actual requirement is. Are you trying to implement row level security?
-
thomaslprru Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 10 ✭
No.
In fact, the script runs in the staging environment. The aim is to be able to compare data from different environments.
To do this, we use the BigQuery Python library which will execute various queries to compare data from different environments (number of lines, schemas, etc.) in order to apply various automated actions (launch data replications, etc.).
Using service accounts is a short-term possibility. The problem is that this script is called by users, and not all of them have the same rights to BigQuery. So the challenge is to authorise queries to the tables for which the user connected to Dataiku has rights.
Service accounts don't have the same rights as users, so using user rights rather than service account rights would improve the relevance of the script.
I hope this makes sense
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,124 Neuron
If you are going to be using Python I would go in a different way. I am pretty sure there is a way to see all the tables a user can see using the BQ INFORMATION_SCHEMA. For instance when you create a BigQuery connection and create a SQL Notebook in a Dataiku project you can list all the tables that the connection account can see, so sure this can be via SQL as well. Failing this build a permissioning table which matches the Dataiku User ID with the tables the user can see.
In Python you can get the user running the Python recipe/scenario with this code:
import dataiku dataiku_client = dataiku.api_client() dataiku_client.get_auth_info()['authIdentifier']
If you need to impersonate users you can use this method:
dataiku_user = dataiku_client.get_user("the_user_to_impersonate") client_as_user = dataiku_user.get_client_as()
So finally get a service account key with access to all tables, then get the user ID running the recipe, then check what tables the user has access to and run your schema comparasion. I think it should work.