More intuitive security and access control for DSS

User Story

As a part time administrator of a few DSS instances, I would like a more intuitive and discoverable way to setup DSS security / access control that helps me understand my use case and helps me make the adjustments in all of the various security settings spread throughout the system.  This would give me more confidence in deploying important applications to users throughout my organization.

Question:

  • Does this need to be looked at from scratch?
  • Does this suggest the need for an alternate overall security module that supplements the security settings spread throughout the system. 
  • Is an DSS Academy security course needed?

Notes:  

This idea started as a question about: 

As an administrator working with inexperienced end-users, I'd like to share an application that can have read-only shared data sets with the original project.  The need to copy datasets when using an application limits the size of the dataset that is realistic to use with applications.

COS

This should be an option.  The current behavior should continue to be the default behavior for backward compatibility.

Below please find a thread about setting up this use case.  That ended totally else where all about a group of folks working together with read access to a dataset.

--Tom
7 Comments
Manuel
Dataiker Alumni

It depends on the specific scenario, but breaking down your template project and making use of shared datasets might be a solution for what you describe. See the pattern below:

Screenshot 2021-07-05 at 14.44.39.png

It depends on the specific scenario, but breaking down your template project and making use of shared datasets might be a solution for what you describe. See the pattern below:

Screenshot 2021-07-05 at 14.44.39.png

@Manuel , I think that “shared data sets” is the key idea here.  I’m not completely clear what you are describing.  

In your example you have two projects.  On doing the data ETL and the application project consuming some data.

In the ETL project you are working with a connection to a PostgreSQL database by the authors of that project.  I understand the idea of using a scenario to keep a dataset fresh.  I don’t understand exactly know what you mean about “user group has read access to connection”.  How does group security work on connections?  Does one need multiple connections to the same underlying data source? Does one need multiple database accounts?  I’m not clear exactly how to set something like this up.  

Is there current documentation or training materials describing this set of techniques?

--Tom

@Manuel , I think that “shared data sets” is the key idea here.  I’m not completely clear what you are describing.  

In your example you have two projects.  On doing the data ETL and the application project consuming some data.

In the ETL project you are working with a connection to a PostgreSQL database by the authors of that project.  I understand the idea of using a scenario to keep a dataset fresh.  I don’t understand exactly know what you mean about “user group has read access to connection”.  How does group security work on connections?  Does one need multiple connections to the same underlying data source? Does one need multiple database accounts?  I’m not clear exactly how to set something like this up.  

Is there current documentation or training materials describing this set of techniques?

Manuel
Dataiker Alumni

The use of the shared dataset is what mitigates the need for duplicating a large dataset with every application instance. https://doc.dataiku.com/dss/latest/security/exposed-objects.html

Connections can be restricted to selected security user groups. So, when you are using a shared dataset, you need to make sure that the users that will instantiate applications have at least read access to that connection, so that they can read the shared dataset. These user's won't need access to the ETL project.

I hope this makes it clear.

The use of the shared dataset is what mitigates the need for duplicating a large dataset with every application instance. https://doc.dataiku.com/dss/latest/security/exposed-objects.html

Connections can be restricted to selected security user groups. So, when you are using a shared dataset, you need to make sure that the users that will instantiate applications have at least read access to that connection, so that they can read the shared dataset. These user's won't need access to the ETL project.

I hope this makes it clear.

@Manuel ,

I think that is making a bit more sense.  I’m not in a place where I can see a DSS instance at the moment.  So, I’m not clear that I fully have this understood. More if I have further questions.  

--Tom

@Manuel ,

I think that is making a bit more sense.  I’m not in a place where I can see a DSS instance at the moment.  So, I’m not clear that I fully have this understood. More if I have further questions.  

@Manuel,

Going somewhat beyond this initial use case (outside the scope of applications):

Is there a way to share a curated set of datasets not with other projects, but with specific other groups and users who have read-only access to specific datasets? 

Based on your description above Do you need to do something like:

  • Create and ETL Project. ->
  • Share specific data sets with a Sharing Project ->
  • Give folks who need to see the specific data sets access to that particular project.
--Tom

@Manuel,

Going somewhat beyond this initial use case (outside the scope of applications):

Is there a way to share a curated set of datasets not with other projects, but with specific other groups and users who have read-only access to specific datasets? 

Based on your description above Do you need to do something like:

  • Create and ETL Project. ->
  • Share specific data sets with a Sharing Project ->
  • Give folks who need to see the specific data sets access to that particular project.
Manuel
Dataiker Alumni

To do what you now ask, ignore the application pattern:

  • To share datasets with specific user groups, use the connection's "freely usable by" configuration
  • To make the datasets read-only, use the connection's "allow write" configuration

So, you could configure two connections pointing to the same database:

  • The first, only accessible by you, to write the curated datasets
  • The second, accessible to specific groups, to read the curated datasets

I have never had to do this, but I hope it helps.

To do what you now ask, ignore the application pattern:

  • To share datasets with specific user groups, use the connection's "freely usable by" configuration
  • To make the datasets read-only, use the connection's "allow write" configuration

So, you could configure two connections pointing to the same database:

  • The first, only accessible by you, to write the curated datasets
  • The second, accessible to specific groups, to read the curated datasets

I have never had to do this, but I hope it helps.

@Manuel , thanks 

--Tom

@Manuel , thanks