Problems with bundles and plugins in Dataiku migration
Povilas
Registered Posts: 23 ✭✭✭✭
Hi.
We are migrating to Dataiku 5.0.2. To do that we are exporting all the projects from our current instance and then importing them to new the instance. There are couple of problems/questions we have:
- Some our projects have bundles. However after export-import I cannot see the bundles anymore. Maybe you can suggest a way to export-import bundles as well?
- We have some custom plugins created. Is there any way to export-import them? Or should we recreate each plugin?
- Also we have some R/Python code environments. Is there a way to export-import them?
Thanks!
Tagged:
Answers
-
Hi Povilas, In order to migrate an existing Dataiku installation from one instance to another, we advise to migrate the 'data directory'. This way there is no need for manual ad-hoc migration actions for each project, code environment, configurations, etc. This is documented in more details on https://doc.dataiku.com/dss/latest/installation/migrations.html#migrating-the-data-directory. Hope it helps, Alexandre
-
Hi Alex. Thank you! Just to make sure, will this solve our bundles problem too?
-
Hi, this procedure is strictly for migrating projects in a DSS Design node, not Automation node. Is that what you want to achieve? Or do you want to do that from a Design node to an Automation node?
-
No, we are migrating from design node to other design node. Just this new node will be on Dataiku 5.0
-
Migrating the data directory can only work with the same Dataiku version. We recommend first to migrate the data directory to another instance with the same version. Then to perform the DSS version upgrade. Hope that clarifies the matter.
-
Well, we already have upgraded instance.
However, let's back to bundles. If copy of "data directory" would solve bundles issue (let's say from one instance to the other, both with same DSS version) then those bundles should be stored somewhere in "data directory". Am I correct? If yes, where exactly it is placed? -
Hi Povilas, bundles is a term only referred to for automation nodes. Are you still referring to a situation with design nodes only? So to clarify you have already upgraded the origin design node to the same version as the destination design node?
-
Hi. Yes, I am talking only about design nodes. I don't know how bundles are related with automation node but we used bundles for archiving our projects in design node. For example, we have a project example_project and we want to signicantly upgrade/renew this project. However we don't want to lose our old project or to create "duplicates" like project example_project_v2. So we were creating new bundle for a project everytime we want to make this big upgrade/renew. This was like the way to archive projects. And we had a possibility to restore a project any time to any older version (or bundle).
So basically now we have that history in our design node and we want to be able to tranform that to other instance. Is that possible? I believe it should be, those bundles should be stored somewhere. -
Maybe I already found it. There is a directory '/data/DATA_DIR/bundle_activation_backups/'. I see the bundles here. Maybe it is enough to simply copy this one to new instance?
-
A bit off topic question but still... Does DSS 5.0 contains any new features regarding source control, git and versioning? I cannot find something among major updates.
-
To clarify the terminology: bundles relate to moving a project from design to automation. For saving a project in a design node we use the term project export, not bundles. Both are very similar artifacts, but with different usage and goal. In both cases, you can only move the artifact between nodes of the same version. In your case, if you want to archive versions of projects in the design node, we recommend to perform regular project exports. If you want to migrate instance, then migrating the data directory is the recommended way.
-
The path you mention (bundle_activation_backups) has no link with the project export or the design node migration.
-
Finally related to git integration, we plan major new features in DSS5.1 (upcoming). In general, we advise checking our release notes for more details: https://doc.dataiku.com/dss/latest/release_notes/index.html
-
Project export is not an option for archiving. Because of couple reasons. First of all, backup should be stored somewhere outside DSS which is not convenient. Export file could be lost, you need place to store it, everyone should have access to it and so on. Secondly, you cannot import bundle without creating a new project (correct me if am I wrong). So you have duplicate projects. Off course, you can delete old project and then import different one. But still that is not convenient.
Bundles were an option for this because you create a bundle and just switch between bundles when you need to change version. Not so nice as a simple git branch stuff but still it was relatively ok. But as we see, until it is ok only until the time when you need to migrate...
I understand that "bundle_activation_backups has no link with the project export or the design node migration" as you said.. But are you sure it does not contain all info about the bundles? I see all the recipes, datasets and other stuff inside those folders... Off course, I don't know what would happen if I just copy that to new instance. Unfortunately I don't have ssh connection to our servers to try it now. -
To answer your points: 1. Project exports can be stored anywhere using the Dataiku python api or the cli. 2a. Could you detail your current process of using bundles, step-by-step? Normally bundles can only deal with automation nodes. 2b. What would be the ideal process for you to manage versions of project in the design node?
-
1. That does not solves export-import inconvenience. You have duplicate projects or you should delete projects. I mentioned that in my previous comment.
2a. Yes, I can. Go to the project. At the top (in black) you see project name, button to go to flow, notebooks and others. Further on you can see Bundles. Here you create new bundle that saves current version of your project. After that you can create more bundles and restore (revert) to previous. I don't want to go into every detail but I think it quite straightforward. But everything is happening in this Bundles section.
2b. Ideal process is not something special. Typical usage of source control systems. Creating branches for each new version and reverting to specific one when requires. -
Thanks, that is interesting. I will have a chat with our product team and gather some thoughts on the right way to manage versioning of projects in the design node. Having said that, for your question of how to migrate design node instance, that does not change our recommendation to migrate the whole data directory.
-
I understand that. However I am afraid we don't have this possibility anymore.
btw, I have one more question about migration. I saw and tried that there is a possibility to export and import customer code envs. Is there a possibility to export default R code env? Or maybe make a copy of it and export that copy? -
One more question: how to specify R package version in R code env. I tried to specify according to example: "xgboost","0.6-4". However, in Actually installed packages I see "xgboost","0.71.2". Do you have any idea why this could happen?
-
Hi, it is not possible to get an older package version using the R code environment UI. Unfortunately, that's the way that install.package works. You could use alternatively install_version from the devtools packages as explained in https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages. To do so, you would need to manually execute the code in an R notebook or recipe using the specific code env.
-
Yes, you're are correct about install.packages and install_version. I used install_version to install older version.
However, talking about managing libraries in Dataiku code envs. You provide an example how to select older version. It is here: https://doc.dataiku.com/dss/latest/code-envs/operations-r.html#manage-packages
but that example does not work properly. When I say properly I mean like install_version. For this function you specify exact version (let's x.y.z) and it installs version x.y.z. In your interface there's an option to do this (it is written like that) but it is not working. At first, I thought that Dataiku installs latest version available but later I tried some more examples and found out that it is not true. I don't how this stuff selects a version... It is shame because your functionality is not working or documentation is misleading (or even incorrect). -
Thanks for the feedback, I have logged this so we can improve our documentation.