Discover all of the brand-new features and improvements to existing capabilities in the Dataiku 11.3 updateLET'S GO

ability to remove items from a bundle (notebooks, flow zones, etc.)

Bundles correspond to a snapshot of a project. They allow to deploy a project from the Design Node (the "dev" environment) to the Automation Node (the "prod" environment).

Bundles automatically include some elements of the projects, while other elements (typically data heavy items) are excluded by default but can be added. See this image from the dataiku academy for a recap:

bundle.jpg

 

It would be very appreciated if the user could choose some items to remove from what is today automatically added.

I am typically thinking of:

  • notebooks:
    • Notebooks are by design an exploration environment
    • the notebook area in dataiku tends to get very messy as soon as several users collaborate (there is no way to tidy up this space in folders), which discourages even further users to use them in production (who would want his project to rely on a notebook hidden under dozens of other other users unknown notebooks?!)
    • All in all, at least in my experience, notebooks rarely end up in production to the contrary of the flow which materializes the data science pipeline. I would be happy not to "stage" them in bundle snapshot.
  • flow zones:
    • When developing a new feature or exploring a new idea, one would typically do that in a new flow zone.
    • That flow zone could remain in a "dev" stage for a long time before being ready for "production" (or before being abandoned and deleted).
    • it would be great if a (or several) flow zone(s) that aren't ready for production could be optionally removed from the bundle snapshot.

 

4 Comments
Turribeach
Level 6

Duplicate of this idea: https://community.dataiku.com/t5/Product-Ideas/Exclude-certain-portions-of-the-flow-in-bundle/idi-p/...

Some comments on your points: 

  • Notebooks are by design an exploration environment/notebooks rarely end up in production: Not at all! If you use Python recipes and you Edit them in a Notebook you will have lots of Python Notebook recipes which you will want to have deployed to Production. I do agree that there are cases where you do exploratory notebook you don't want deployed in Production though. In our case we end up having to delete some of these exploratory notebooks as we get import errors when the connections are missing. We do not use connection remapping as this is dangerous. Say you have a notebook that uses SQL to delete all records from a table you are playing with in a Development Database. Would you want that connection remapped to Production? Certainly not! So we use generic connection names which point to different databases depending on the node that the code is deployed to. And in certain cases we allow specific connections to point to specific database environments. In any case I support the idea to be able to exclude items from the bundle. 
fsergot
Dataiker
Status changed to: Needs Info

Hello,

I have been thinking about this idea, and not sure how to process it.

As for notebooks, I tend to say that their presence is not an issue in itself. And adding a capability to remove them + asking users to actually remove them seems too much here, compare to just ignoring them.

Regarding Flow zones, this is an interesting ideas, raised also as another request. I will record it. Note that it is not trivial as flow zones are more of a presentation layer than a proper content split. There are potentially many other objects that are using datasets or even recipes that not concerned by flow zones (like dashboard insights, scenarios, webapps..).

tanguy
Level 4

@fsergot : as suggested by @Turribeach, adding the ability to (un-)select flow zones in a bundle would indeed be the biggest improvement from my point of view. The rest is nice to have.

@CoreyS: sorry for initiating a duplicate thread. Can you merge it with Turribeach's idea?

MichaelG
Community Manager
Community Manager
Status changed to: Gathering Input