Exporting data traceble in log files

MRvLuijpen
MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

Hello Dataiku Community,

I was wondering if a user exports data to outside DSS, what traces can be found in the log files.

I know that there are several levels of exports, and I was wondering what can be traced in the log:

- the main project menu "Export this project", which results in a ZIP download

- inside dataset Action menu "Export", which enables the user to download/export a dataset

- inside dataset by select & Copy/Paste

- from Python/R it is possible to export data to files which are connected to DSS.

From Data Security point of view, it is sometimes necessary to restrict / trace export of sensitive information

And the second part of this question, what are the possibilities of restricting export of data. We did find this link, but it seems this is at DSS instance level: https://doc.dataiku.com/dss/latest/security/advanced-options.html#restricting-exports

Best Answer

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    Answer ✓

    Hi,

    User actions can be traced using the DSS audit log (https://doc.dataiku.com/dss/latest/security/audit-trail.html), i.e. the JSON files in the "run/audit" folder of DSS.

    Each line will contain a msgType indicating the action and additional details about the action (like name of project, dataset, ...)

    • Exporting a project can be tracked by looking for "msgType": "project-export-download"
    • Exporting a dataset can be tracked with "msgType": "dataset-export"
    • Copy-pasting is a pure client-side action that cannot be tracked by any means. You will always have dataset-read-data-sample before, though
    • For custom code, it's structurally impossible to know what people "do" with the data they get, since it's purely arbitrary code. However, you can track which datasets were read using "msgType": "dataset-read-data"

    We confirm that the restrictions that you found are the ones that are implemented. It's important to understand that it's structurally impossible to completely prevent data export, if only because users can "see" data in DSS, which at the very least allows them to take a picture of their screen. It's more a matter of having appropriate level of restrictions against errors and as much tracing as feasible, but it is necessarily incomplete in a coding environment.

Answers

  • MRvLuijpen
    MRvLuijpen Partner, L2 Admin, L2 Designer, Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Dataiku DSS Adv Designer, Registered, Dataiku DSS Developer, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Frontrunner 2022 Participant, Neuron 2023 Posts: 107 Neuron

    Hello Clément,

    Thank you for your response.

Setup Info
    Tags
      Help me…