Hi Povilas, In order to migrate an existing Dataiku installation from one instance to another, we advise to migrate the 'data directory'. This way there is no need for manual ad-hoc migration actions for each project, code environment, configurations, etc. This is documented in more details on https://doc.dataiku.com/dss/latest/installation/migrations.html#migrating-the-data-directory. Hope it helps, Alexandre
Migrating the data directory can only work with the same Dataiku version. We recommend first to migrate the data directory to another instance with the same version. Then to perform the DSS version upgrade. Hope that clarifies the matter.
However, let's back to bundles. If copy of "data directory" would solve bundles issue (let's say from one instance to the other, both with same DSS version) then those bundles should be stored somewhere in "data directory". Am I correct? If yes, where exactly it is placed?
Hi Povilas, bundles is a term only referred to for automation nodes. Are you still referring to a situation with design nodes only? So to clarify you have already upgraded the origin design node to the same version as the destination design node?
Hi. Yes, I am talking only about design nodes. I don't know how bundles are related with automation node but we used bundles for archiving our projects in design node. For example, we have a project example_project and we want to signicantly upgrade/renew this project. However we don't want to lose our old project or to create "duplicates" like project example_project_v2. So we were creating new bundle for a project everytime we want to make this big upgrade/renew. This was like the way to archive projects. And we had a possibility to restore a project any time to any older version (or bundle).
So basically now we have that history in our design node and we want to be able to tranform that to other instance. Is that possible? I believe it should be, those bundles should be stored somewhere.
To clarify the terminology: bundles relate to moving a project from design to automation. For saving a project in a design node we use the term project export, not bundles. Both are very similar artifacts, but with different usage and goal. In both cases, you can only move the artifact between nodes of the same version. In your case, if you want to archive versions of projects in the design node, we recommend to perform regular project exports. If you want to migrate instance, then migrating the data directory is the recommended way.
Finally related to git integration, we plan major new features in DSS5.1 (upcoming). In general, we advise checking our release notes for more details: https://doc.dataiku.com/dss/latest/release_notes/index.html
Project export is not an option for archiving. Because of couple reasons. First of all, backup should be stored somewhere outside DSS which is not convenient. Export file could be lost, you need place to store it, everyone should have access to it and so on. Secondly, you cannot import bundle without creating a new project (correct me if am I wrong). So you have duplicate projects. Off course, you can delete old project and then import different one. But still that is not convenient.
Bundles were an option for this because you create a bundle and just switch between bundles when you need to change version. Not so nice as a simple git branch stuff but still it was relatively ok. But as we see, until it is ok only until the time when you need to migrate...
I understand that "bundle_activation_backups has no link with the project export or the design node migration" as you said.. But are you sure it does not contain all info about the bundles? I see all the recipes, datasets and other stuff inside those folders... Off course, I don't know what would happen if I just copy that to new instance. Unfortunately I don't have ssh connection to our servers to try it now.
To answer your points: 1. Project exports can be stored anywhere using the Dataiku python api or the cli. 2a. Could you detail your current process of using bundles, step-by-step? Normally bundles can only deal with automation nodes. 2b. What would be the ideal process for you to manage versions of project in the design node?
1. That does not solves export-import inconvenience. You have duplicate projects or you should delete projects. I mentioned that in my previous comment.
2a. Yes, I can. Go to the project. At the top (in black) you see project name, button to go to flow, notebooks and others. Further on you can see Bundles. Here you create new bundle that saves current version of your project. After that you can create more bundles and restore (revert) to previous. I don't want to go into every detail but I think it quite straightforward. But everything is happening in this Bundles section.
2b. Ideal process is not something special. Typical usage of source control systems. Creating branches for each new version and reverting to specific one when requires.
Thanks, that is interesting. I will have a chat with our product team and gather some thoughts on the right way to manage versioning of projects in the design node. Having said that, for your question of how to migrate design node instance, that does not change our recommendation to migrate the whole data directory.
I understand that. However I am afraid we don't have this possibility anymore.
btw, I have one more question about migration. I saw and tried that there is a possibility to export and import customer code envs. Is there a possibility to export default R code env? Or maybe make a copy of it and export that copy?
One more question: how to specify R package version in R code env. I tried to specify according to example: "xgboost","0.6-4". However, in Actually installed packages I see "xgboost","0.71.2". Do you have any idea why this could happen?
Hi, it is not possible to get an older package version using the R code environment UI. Unfortunately, that's the way that install.package works. You could use alternatively install_version from the devtools packages as explained in https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages. To do so, you would need to manually execute the code in an R notebook or recipe using the specific code env.