Submit your use case or success story to the 2023 edition of the Dataiku Frontrunner Awards ENTER YOUR SUBMISSION

Add "clear dataset data by tag" macro similar to "delete datasets by tag"

Hi there !

So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.

How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.

Tell me what you think !



Developer Advocate

Hi Pierre,

It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

import dataiku

client = dataiku.api_client()
project = client.get_project("MY_PROJECT_KEY")

datasets = project.list_datasets()

for dataset in project.list_datasets():
    if "MY_TAG" in dataset["tags"]:
        print("Clearing dataset {}".format(dataset["name"]))


To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.




Status changed to: Not Planned

Hi @rnorm ,

HarizoR's quick start is a good one ! We'll mark this one as Not Planned.