Add "clear dataset data by tag" macro similar to "delete datasets by tag"

Hi there !

So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.

How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.

Tell me what you think !

Cheers,

Pierre

2 Comments
HarizoR
Developer Advocate

Hi Pierre,

It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

import dataiku

client = dataiku.api_client()
project = client.get_project("MY_PROJECT_KEY")

datasets = project.list_datasets()

for dataset in project.list_datasets():
    if "MY_TAG" in dataset["tags"]:
        print("Clearing dataset {}".format(dataset["name"]))
        dataset.clear()

 

To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.

 

Best,

Harizo

Hi Pierre,

It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

import dataiku

client = dataiku.api_client()
project = client.get_project("MY_PROJECT_KEY")

datasets = project.list_datasets()

for dataset in project.list_datasets():
    if "MY_TAG" in dataset["tags"]:
        print("Clearing dataset {}".format(dataset["name"]))
        dataset.clear()

 

To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.

 

Best,

Harizo

AshleyW
Dataiker

Hi @rnorm ,

HarizoR's quick start is a good one ! We'll mark this one as Not Planned.

Best,

Ashley

Status changed to: Rejected

Hi @rnorm ,

HarizoR's quick start is a good one ! We'll mark this one as Not Planned.

Best,

Ashley