I'm working with a group that has a single full access log in to a particular database.
We would like to give Dataiku Users access to a subset of the data from this database and continue to use the ability to have a Dataiku scenario refresh the data accessible by these users.
I know that I can lock down access to the login credentials for the database by creating a connection.
However, that connection using the one login account will have significant access to the database.
That's Ok for the Dataiku Administrators. However, for other users of the system, we want to have restricted read-only access to a specifically curated sub-set of the data.
What are the best approaches folks have when using Dataiku to segment data access?
I've learned that to lock down the data connection you have to work with the Security Section. I've been trying the following.
So my Administrator accounts have access to this Data Connection and my regular users do not.
However, I'm not clear what the best method is to set up the sharing and refreshing of a subset of the data would be.
Using the Admin account I've been thinking about creating a project specifically for ETL. That syncs the data to a local PostgreSQL database and then somehow gives access to only the desired subset of extracted data. However, I'm not completely sure how to pull this off, and I'm wondering if this is the right approach, or if there is a more clever DSS way to get this done.
Help from anyone who has done something like this would be greatly appreciated.
A couple of questions:
What is the original data source?
Does the original data source have the ability to do row-level security?
Can you create views on the underlying data source?
Do the users need to run a version of the flow that the batch process needs?
Are the accounts of the end-users separately credentialed from the super account you would like to grant access too?
The data source is a mySQL database.
If row level security is possible we are unaware, or unable to make those adjustments to this is managed database.
That said the key lock down we are trying to achieve is showing a select group of objects for read only access through a Dataiku connection.
I like the idea of a view. It is attractive to me. I’m wondering if MySQL can use multiple schemas and views can be created in a specific schema, and the connection is setup to point at a special reporting schema. As new views are added to that reporting schema they might be automatically made available.
Users need to be able to create there own flows based on the results of the view or “hidden” SQL query hidden from them in Dataiku. And this source of data needs to be refreshable as new data is included in this database.
The accounts in Dataiku are separately credentialed. The database connection is shared, and has more access than we want to share to the general Dataiku Data Scientist users.
Note: I'm not a MySQL expert but have been bouncing around DBs for a while, so I can't get directly into the mechanics of MySQL.
One possible option is to define and use two sets of roles on MySQL:
In MySQL 8.0, the SQL ANSI standard of Roles was introduced. Roles are sets of permissions attached to a user in MySQL (or really most modern DBMSs). You can attach a role at any time during a connection, but likely we would want to do this at connection time.
In the new MySQL connection in DSS under advanced parameters, "post-connect commands" section, you could invoke the SET ROLE command.
Setting the Role automatically changes the permissions context of the user. This is how we can have a user with two sets of permissions depending upon the situation. The danger is the user elevating their credentials. If you create two separate DSS connections (one for the readers and one for the admins), you could apply the role to the user each time of login without worrying about carrying two different users in the DB.
A word of caution, I am unsure if there is an ability to deny a user to set a role they can access. I don't have a MySQL DB to test this out. I would look for the ability to set REVOKE SET ROLE as a permission in a role. If there is not, then the same user could elevate their role by using SQL in a notebook.
I hope this helps.
There are some very interesting clues in your comments. I'm not clear that I can implement from the notes below. However, this does give me some keywords to try to look at as part of my research process.
One of my initial concerns is once rolls are implemented what is the default behavior for an existing connection to the database that likely does not have a reference to a role set. Will that connection have to be altered in the next time the connection is made to reference a role.
We would be looking to any existing connection to have unchanged connection processes. And the new Dataiku connections to use a role which in effect moves the connections access down to a very limited read-only access.