Dataset still in the server even when deleted in the web interface

Binh
Binh Registered Posts: 4 ✭✭✭✭

Hello,

When we upload our datasets directly from our browser in dss, the file can be stored in two location

  • Default location which will be stored in dss/uploads
  • filesystem_managed which will be stored in dss/managed_datasets/uploads

When we delete the file in the browser, the dataset is deleted (and doesn't appear in the flow) but when we look at the server, in the terminal we can still see the uploaded file in the folders mentionned before.

So the files are never deleted, and are taking too much space in the server. We have to delete then by hand in the terminal.

Do you know how we can systematize the process ? (when a file is deleted in the browser, it is deleted in the server)

Thank you.

Tagged:

Answers

  • will_nowak
    will_nowak Alpha Tester, Registered Posts: 2 ✭✭✭✭

    Hello! Be sure to select the `Drop Data` radio button when trying to delete data.

    It should look as follows:

    -----

    How are you trying to delete the data? This aforementioned button appears if you right click on a dataset in the flow and then select `Delete`.

  • Binh
    Binh Registered Posts: 4 ✭✭✭✭

    Hello,

    When I want to delete an uploaded file, I don't have the "drop data" box to be ticked.

    I can tick it when i put files on the server and I want to delete it. But not for the uploaded files.

    (To delete the data, i click on the data, then on the right I click on delete.)

    Binh

  • Binh
    Binh Registered Posts: 4 ✭✭✭✭
    I juste replied with a screenshot (I can't add a screenshot to a reply, so I just added a new answer)
  • will_nowak
    will_nowak Alpha Tester, Registered Posts: 2 ✭✭✭✭
    Hello!

    Indeed, this is a feature and not a bug. For files uploaded to the server, DSS doesn't allow the deletion of this data from the flow so as to protect any downstream recipes / datasets (since the flow rebuild potential is lost when initial inputs are deleted). In addition, DSS is not meant to serve as a tool to manage the data that is uploaded to the server, but rather as a tool for interacting with it once there.

    But, I can understand the desire. Perhaps one solution is to write a macro that uses public API to remove datasets from your project. https://doc.dataiku.com/dss/latest/publicapi/client-python/datasets.html#basic-operations

    Please let me know if this is of any assistance.
Setup Info
    Tags
      Help me…