Exporting data traceble in log files

Solved!
MRvLuijpen
Exporting data traceble in log files

Hello Dataiku Community,

I was wondering if a user exports data to outside DSS, what traces can be found in the log files.

I know that there are several levels of exports, and I was wondering what can be traced in the log:

- the main project menu "Export this project", which results in a ZIP download

- inside dataset Action menu "Export", which enables the user to download/export a dataset

- inside dataset by select & Copy/Paste

- from Python/R it is possible to export data to files which are connected to DSS.

From Data Security point of view, it is sometimes necessary to restrict / trace export of sensitive information

And the second part of this question, what are the possibilities of restricting export of data. We did find this link, but it seems this is at DSS instance level: https://doc.dataiku.com/dss/latest/security/advanced-options.html#restricting-exports

 

 

1 Solution
Clรฉment_Stenac

Hi,

User actions can be traced using the DSS audit log (https://doc.dataiku.com/dss/latest/security/audit-trail.html), i.e. the JSON files in the "run/audit" folder of DSS.

Each line will contain a msgType indicating the action and additional details about the action (like name of project, dataset, ...)

  • Exporting a project can be tracked by looking for "msgType": "project-export-download"
  • Exporting a dataset can be tracked with "msgType": "dataset-export"
  • Copy-pasting is a pure client-side action that cannot be tracked by any means. You will always have dataset-read-data-sample before, though
  • For custom code, it's structurally impossible to know what people "do" with the data they get, since it's purely arbitrary code. However, you can track which datasets were read using "msgType": "dataset-read-data"

We confirm that the restrictions that you found are the ones that are implemented. It's important to understand that it's structurally impossible to completely prevent data export, if only because users can "see" data in DSS, which at the very least allows them to take a picture of their screen. It's more a matter of having appropriate level of restrictions against errors and as much tracing as feasible, but it is necessarily incomplete in a coding environment.

View solution in original post

2 Replies
Clรฉment_Stenac

Hi,

User actions can be traced using the DSS audit log (https://doc.dataiku.com/dss/latest/security/audit-trail.html), i.e. the JSON files in the "run/audit" folder of DSS.

Each line will contain a msgType indicating the action and additional details about the action (like name of project, dataset, ...)

  • Exporting a project can be tracked by looking for "msgType": "project-export-download"
  • Exporting a dataset can be tracked with "msgType": "dataset-export"
  • Copy-pasting is a pure client-side action that cannot be tracked by any means. You will always have dataset-read-data-sample before, though
  • For custom code, it's structurally impossible to know what people "do" with the data they get, since it's purely arbitrary code. However, you can track which datasets were read using "msgType": "dataset-read-data"

We confirm that the restrictions that you found are the ones that are implemented. It's important to understand that it's structurally impossible to completely prevent data export, if only because users can "see" data in DSS, which at the very least allows them to take a picture of their screen. It's more a matter of having appropriate level of restrictions against errors and as much tracing as feasible, but it is necessarily incomplete in a coding environment.

MRvLuijpen
Author

Hello Clรฉment, 

Thank you for your response.