Add "clear dataset data by tag" macro similar to "delete datasets by tag"

Registered Posts: 9 ✭✭✭✭
0
0 votes

Rejected · Last Updated

Hi there !

So I've been working on my own macro to do this, and with our upgrade from v7 to v9 I saw that a new builtin macro is available and allow users to delete datasets using tags.

How about adding a similar macro to clear the data of datasets using tags ? The macro "clear intermediate datasets" is fine for simple projects, but it has limited options when working with more complex, multi-leveled projects (example w/ multi source data prep, training, deploy and prediction). It would be rather useful for users that work in a big data environment.

Tell me what you think !

Cheers,

Pierre

Comments

  • Dataiker, Alpha Tester, Registered Posts: 138 Dataiker
    edited July 2024

    Hi Pierre,

    It would definitely be possible for you to create such a macro ! You can retrieve the tagging information using the Dataset API, apply some filtering and then clear the relevant Dataset(s). The code would look like this:

    import dataiku
    
    client = dataiku.api_client()
    project = client.get_project("MY_PROJECT_KEY")
    
    datasets = project.list_datasets()
    
    for dataset in project.list_datasets():
        if "MY_TAG" in dataset["tags"]:
            print("Clearing dataset {}".format(dataset["name"]))
            dataset.clear()
    

    To insert that in a more polished format, you can take inspiration from the "Clear intermediate datasets" macro source code.

    Best,

    Harizo

  • Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 165 Dataiker

    Hi @rnorm
    ,

    HarizoR's quick start is a good one ! We'll mark this one as Not Planned.

    Best,

    Ashley

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.