Discover this year's submissions to the Dataiku Frontrunner Awards and give kudos to your favorite use cases and success stories!READ MORE

Add "clear dataset data by tag" macro similar to "delete datasets by tag"

Hi there !

So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.

How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.

Tell me what you think !

Cheers,

Pierre

2 Comments
HarizoR
Developer Advocate
Developer Advocate

Hi Pierre,

It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

import dataiku

client = dataiku.api_client()
project = client.get_project("MY_PROJECT_KEY")

datasets = project.list_datasets()

for dataset in project.list_datasets():
    if "MY_TAG" in dataset["tags"]:
        print("Clearing dataset {}".format(dataset["name"]))
        dataset.clear()

 

To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.

 

Best,

Harizo

AshleyW
Dataiker
Dataiker
Status changed to: Not Planned

Hi @rnorm ,

HarizoR's quick start is a good one ! We'll mark this one as Not Planned.

Best,

Ashley