Prompt for commit message and descriptions by default

0 Kudos

Dataiku has a somewhat hidden feature that allows users to use a custom commit message when saving using the dropdown menu. But in the spirit of collaboration, this would ideally be the default (or at least configurable as default in project settings). Upon every save, the user can be prompted to describe changes.

Dataset descriptions also have been underutilized at my company, I think largely because it's easy to neglect them when datasets are first created.

Ideally, projects could also be configured to prompt users to describe what a recipe or dataset is for upon creation. It may also be helpful to highlight undocumented datasets in some way to show they need descriptions.

4 Comments

@natejgardner ,

I find... I don't know about you, or others.  I don't really know what a dataset is actually going to end up being about when I first create it.  I wouldn't want to be required to know such things and enter them in a way that I could not easily change over time. 

For example one of the challenges I have found with DSS is that dataset names are fixed when you create the dataset.  When I get a month or two into a project dataset's names end up being inaccurate.  Or worse confusing.  Changing Dataset names is hard in DSS.  

I like the idea of prompting for documentation.  However, insisting on a well-thought explanation about an object in DSS and then not making it easy to change would make my life in Data Science worse, rather than not better.

Thoughts?

--Tom

@natejgardner ,

I find... I don't know about you, or others.  I don't really know what a dataset is actually going to end up being about when I first create it.  I wouldn't want to be required to know such things and enter them in a way that I could not easily change over time. 

For example one of the challenges I have found with DSS is that dataset names are fixed when you create the dataset.  When I get a month or two into a project dataset's names end up being inaccurate.  Or worse confusing.  Changing Dataset names is hard in DSS.  

I like the idea of prompting for documentation.  However, insisting on a well-thought explanation about an object in DSS and then not making it easy to change would make my life in Data Science worse, rather than not better.

Thoughts?

I agree, often the purpose of a dataset is unclear until a recipe has been completely configured. I think the current workflow is part of the reason it's difficult to meaningfully name datasets (and why we end up with projects full of ("my_dataset_prepared_joined_stacked"). Currently, the user is prompted to create and name the output dataset before any transformation has been done. But if the workflow were to start with the recipe design screen, users were allowed to save recipes as drafts (with commit messages on each save), then could enter names and descriptions for the output datasets as the last step before deploying to the flow, names and descriptions would become much more meaningful and thoughtful, hopefully without interrupting the workflow. Since the system really discourages renames after dataset creation, I think it's really important for both the naming and description steps to happen after a user knows what a dataset is going to look like.

That said, I do think both description and commit message prompts by default should be optional (and separate options) at a project level (but probably enabled by default).

I agree, often the purpose of a dataset is unclear until a recipe has been completely configured. I think the current workflow is part of the reason it's difficult to meaningfully name datasets (and why we end up with projects full of ("my_dataset_prepared_joined_stacked"). Currently, the user is prompted to create and name the output dataset before any transformation has been done. But if the workflow were to start with the recipe design screen, users were allowed to save recipes as drafts (with commit messages on each save), then could enter names and descriptions for the output datasets as the last step before deploying to the flow, names and descriptions would become much more meaningful and thoughtful, hopefully without interrupting the workflow. Since the system really discourages renames after dataset creation, I think it's really important for both the naming and description steps to happen after a user knows what a dataset is going to look like.

That said, I do think both description and commit message prompts by default should be optional (and separate options) at a project level (but probably enabled by default).

AshleyW
Dataiker

Hi @natejgardner ,

Thanks for the suggested idea. In reading your comments, I think there are two ideas there that are also worth submitting related to how we help people document their work:

  1. Reorder the steps for recipe creation: this would be to make it easier for people to accurately name datasets and to add descriptions once they've defined what the recipe will do.
  2. Ability to draft and deploy recipes to the Flow

Feel free to submit those two as well.

Best,

Ashley

Hi @natejgardner ,

Thanks for the suggested idea. In reading your comments, I think there are two ideas there that are also worth submitting related to how we help people document their work:

  1. Reorder the steps for recipe creation: this would be to make it easier for people to accurately name datasets and to add descriptions once they've defined what the recipe will do.
  2. Ability to draft and deploy recipes to the Flow

Feel free to submit those two as well.

Best,

Ashley

AshleyW
Dataiker

Thanks for your idea, @natejgardner 

Your idea meets the criteria for submission, we'll reach out should we require more information. 

If you’re reading this and would also like to see the idea in a future Dataiku DSS release, be sure to kudos the original post!

Status changed to: In the Backlog

Thanks for your idea, @natejgardner 

Your idea meets the criteria for submission, we'll reach out should we require more information. 

If you’re reading this and would also like to see the idea in a future Dataiku DSS release, be sure to kudos the original post!