Data Set Name Alias

VMaus
VMaus Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 6 Partner

Problem: Data set names aren't user friendly in hindsight

Example: Data set name = GreatData and I do a prep step and it defaults to "GreatData_prepared" which is fine, but later I decide this is the "final data set" that should be used by others and I'd like a more intuitive name. I understand changing data set names is not recommended.

Solution: Can we have an alias name for data sets? Then I could create an alias for this data set called "User_Demographics" or "Final_GreatData" or "Dashboard_GreatData" etc. Then allow for an alias name view in the flow?

7
7 votes

Released · Last Updated

Comments

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hi @VMaus
    , I actually regularly rename datasets and haven't had a problem yet. I do need to change references manually in the associated SQL Script and Python recipes but after I do that all seems fine. I agree that the first name I select often isn't what I ultimately want and it's worth it to me to do the renaming because more accurate/ descriptive names means it's easier for me and others to understand the flow later.

    Ideally renaming datasets would be a fully supported operation.

    Marlan

  • VMaus
    VMaus Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 6 Partner

    Thanks for that feedback @Marlan
    I've been too afraid to change the names and end up adding descriptions to help with this.

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    @Marlan
    ,

    Years ago when I first started using DSS (somewhere maybe V4 or V5). I made swiss cheese out of a project by renaming a dataset and could never recover it again. Since then I've avoided changing names.

    I agree, default names are never the actual idea of the data set when you are going to make the project production or turn it over to someone else, the dataset names never really make any sense.

    I have been known to walk through parts of projects connecting new datasets with better names to existing steps, then running the step, and then connecting the next visual step to the newly created dataset with a better name. However, that is all kinds of ways painful.

    Based on your comments, and 4 or 5 versions worth of bugs having been cleaned up in DSS. I might try to rename datasets again.

    Making Refactoring DSS Project element Names in all areas of the system would be a great help for re-usability and discoverability.

  • Marlan
    Marlan Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Dataiku Frontrunner Awards 2021 Participant, Neuron 2023 Posts: 319 Neuron

    Hi @tgb417
    , note that the context in which I've renamed datasets has always been with associated SQL Script or Python recipes. Renaming might be more risky with visual recipes. Just wanted to share that caveat.

    Marlan

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron
  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker

    Thanks for your idea, @VMaus
    . Your idea meets the criteria for submission, we'll reach out should we require more information.

    If you’re reading this post and think that being able to easily rename datasets would be a great capability to add to DSS, be sure to kudos the original post! Feel free to leave a comment in the discussion about how this capability would help you or your team.

    Take care,
    Ashley

  • VMaus
    VMaus Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 6 Partner
  • natejgardner
    natejgardner Neuron, Registered, Neuron 2022, Neuron 2023 Posts: 151 Neuron
    edited July 17

    I frequently overflow the length limit for table names in my database. While I really appreciate Dataiku's design decision to semantically name tables, which is definitely better for non-Dataiku users in our data environment than just naming them with a hash, I'm working today with a dataset named

    MASTSCHD_1_copy_by_LINE_NUM_stacked_by_LINE_NUM_joined_filtered_by_model_joined_by_load_min_min_joined_prepared

    I think an alias like this would be useful, especially if it can be inherited by downstream datasets in lieu of the underlying name. That said, ultimately, a safe rename would be even better for my use-case. I've usually been able to get away with renaming a dataset before it has references, but I've also run into issues depending on where the dataset is used.

    But Dataiku has really good reference tracking- in every aspect of the UI I can think of, there's already a dedicated field listing the input datasets, and the rename feature already searches these automatically and rewires everything, with the exception so far of code recipes. I wonder if a safe rename feature could be implemented to make the search exhaustive, allowing us to just rename any dataset no matter how many references there are. Downstream datasets could also be renamed automatically, making it quick to clean up large projects. Ideally, even the underlying tables, which currently are not renamed by the rename feature, would also be renamed on the next build. Even broken flows (which, though an anti-pattern, I've occasionally needed), where I've imported a managed dataset I generated elsewhere in my project directly as though it were unmanaged, could have their references automatically updated with a more powerful renaming feature.

    Also, an alias feature could be really useful for condensed references downstream- it'd be nice to set an alias upstream so the full name of a dataset is descriptive, but then refer to in an abbreviated way downstream, especially in the names of other datasets, similarly to the pattern that's already common with SQL aliasing.

  • Ashley
    Ashley Dataiker, Alpha Tester, Dataiku DSS Core Designer, Registered, Product Ideas Manager Posts: 161 Dataiker

    Congrats - we're adding this to our roadmap! While timelines are always tricky, we'll let you know how it's progressing as updates are available.

    If you've kudoed the post or added some comments about your particular use case, we may reach out to get some feedback.

    Take care,
    Ashley

  • VMaus
    VMaus Partner, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 6 Partner

    Great news! Thanks @AshleyW
    !

  • Katie
    Katie Dataiker, Registered, Product Ideas Manager Posts: 106 Dataiker

    Hello all,

    Apologies for the delayed update here, but as you may already be aware, as of version 11.2, renaming datasets is now a supported operation, available directly from the right panel of datasets. See in the release notes here.

    Let us know if you have any questions!

    Katie

  • tgb417
    tgb417 Dataiku DSS Core Designer, Dataiku DSS & SQL, Dataiku DSS ML Practitioner, Dataiku DSS Core Concepts, Neuron 2020, Neuron, Registered, Dataiku Frontrunner Awards 2021 Finalist, Neuron 2021, Neuron 2022, Frontrunner 2022 Finalist, Frontrunner 2022 Winner, Dataiku Frontrunner Awards 2021 Participant, Frontrunner 2022 Participant, Neuron 2023 Posts: 1,598 Neuron

    This has been a real help to me. Thank you Dataiku team.

Setup Info
    Tags
      Help me…