Improve scenario history : Make them usefull to compare step change
As we know, on DSS we have a Project version control as a builtin "Git-based" version control and we have a kind of lite version of that for any recipe and editable object as known as “History”.
Which inside we can check each commit and compare them easily. This seems working well for any kind of recipe with the various types of request, code or file (.json/.ipynb) that can contain. As git you can read line by line any change…
So, my question is, why is it that when we want to explode the history function of scenarios, we systematically have a one-line commit for each step object that includes all the differences?
It becomes problematic for python customs step, which simply displays that the step's .json file is different from one code change to the next.
Example, for highlighting.
I just applied a space on a line of a custom recipe and here I am with this in the history of the scenario in question.
Picture 1 attached, history_sc_change;
→ You can see the commit is about only a files changed and it rudely shows the overwritten .json file.
Picture 2 attached, sc_change_json_1caract_delta ;
→You can see the “compare” action between these 2 versions with a simple change of one character throughout the code, but the function displays the two json in line 4 and 5 as if it were the commit diff about the “script” as whole python code.
I'm aware that this is because the scenario only summarizes the changes to its structure, i.e. the steps added or removed inside his file. However, it would be (really) interesting to be able to see the changes to the underlying functions/codes . In other words, to have a compare at step level, of the underlying “scripts”.
Best,
Comments
-
Turribeach Dataiku DSS Core Designer, Neuron, Dataiku DSS Adv Designer, Registered, Neuron 2023 Posts: 2,175 Neuron
While this is a sensible and valid request it is not an easy change and does not only affect Scenario script steps but every part of the product where code recipes can be used. The root of the problem is Dataiku's decision to store Python, R, SQL, etc code embedded in a JSON file. I wish Dataiku would change this but I suspect it's probably too late now and this change will never happen. One option you have is to store your Python scripts in an external Git repo and deploy them via the Python API. This will give you full visibility of code changes and full Git version history at the cost of having to deploy the code and develop outside Dataiku.