How to update development/custom algo plugin to the latest version in a Visual Analysis?

rmios
rmios Registered Posts: 19 ✭✭✭✭
edited July 16 in Using Dataiku

Hello everyone,

I am developing a custom plugin to be used in DSS Visual Analysis. After changing a plugin parameter, I incremented the version number of the plugin in the plugin.json file. However, even though the appearance in the model desigh is updated (analysis -> models -> Design -> Algorihms -> Custom algo ...), the model gives an error due to old/obsolete parameters being passed:

TypeError: __init__() got an unexpected keyword argument 'num_features'

This parameter does not exist any more in the lastest version and I can also see this message in the log file:

[2020/11/11-17:39:05.301] [MRT-20907] [WARN] [dku.ml.plugins]  - Using a plugin algorithm ('Custom algo visual-vtux_model') for which version has changed. It was created with version '0.0.1' and now it's '0.0.2'

Unfortunately I am unable to find any "update plugin to version X" button. How can this be solved?

On a sidenote: if this is intentional design to allow tracking plugin updates thoughout the model experiments/sessions, shouldn't the plugin parameter UI also be kept unchanged? The UI already shows the new parameter and the old parameter (here: num_features) cannot be seen/changed anymore.

Best Answer

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker
    Answer ✓

    Hello,

    This is a known limitation of plugin algorithms. You cannot use at the same time different incompatible versions of a plugin.

    If you create a new model, it should work.

    More detailed explanation

    In your case:

    * your model was created with version 0.0.1, so the saved parameters follow the parameter organization for this version

    * you then updated the version to 0.0.2. DSS has only this version installed now, so when going to the UI, you see the new parameters, and when you train it will use the new code, hence raising the error

    If you create a new model, it will be initialized with the good version and the good parameters, so training should work.

    Models trained with old (and incompatible) plugin versions will also not work, because DSS will use the new definition of the model when unserializing them for scoring/retraining.

    Hope this helps,

    Best regards,

Answers

  • tim-wright
    tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
    edited July 17

    Edited Response

    @rmios
    I was in the process of editing my response when @Nicolas_Servel
    responded...apparently it is not possible. @Nicolas_Servel
    is the approach I took valid in any situation? If it is not, I will remove it. If it can be of help in another circumstance, I'd prefer to leave it here so someone else might stumble upon it.

    -------------------------------------------------------------------------------------------------------------------------

    Original response

    @rmios
    If I understand you correctly you created a new version of the plugin (which now takes a different set of parameters than previously). When you upgrade the plugin on your server, existing projects that were using that prior version of the plugin are generating errors.

    I suspect that the DSS metadata around your plugin was not changed even though your plugin did (so DSS probably still has a reference to previously defined variables). If you have a small number of existing uses for the plugin, you can probably go in and delete the existing ones and recreate them (or create new one and delete old one). If the existing plugin recipes are tied into a larger number of scenarios, steps, automation, etc. and you don't want to delete them I think you can edit them with the APIs.

    While there may be a much "simpler" way to do this - others may chime in - , I have whipped up something with Python and the Dataiku APIs. Note I put this together for a visual recipe plugin (but not an ML algorithm plugin). I tested it locally and it appears to work (at least it does reset the parameters for the plugin in the front end - but I was using the same version of the plugin - not a new version). I'd be curious if it will work across different versions of the plugin.

    General Approach:

    1. Find usages of the plugin in DSS (first function)

    2. Remove specific parameters (for you it'd be ['num_features']) from specific projectKey, recipeID (second function)

    3. Combine 1. and 2. to correct a specific (or all) references to particular plugin. (this is the main function that you would call -- 1 and 2 are just helper functions called here)

    ***I STRONGLY SUGGEST YOU DUPLICATE AN EXISTING PROJECT WHERE YOU HAVE THE ISSUE AND THEN TRY RUNNING THE FUNCTION (OR SLIGHT MODIFICATION) ON THAT ONE DUPLICATED PROJECT ONLY FIRST***

    def get_plugin_recipe_usages(pluginId):
        """ 
        Helper method that takes a pluginId and returns a python dictionary where the 
        key is DSS projectKey and value is a list of Recipes in that project where pluginId is used.
        """
        plugin = client.get_plugin(pluginId)
        usages = plugin.list_usages().get_raw()['usages']
        uses=defaultdict(list)
        for usage in usages: 
            if usage['objectType'] == 'RECIPE':
                uses[usage['projectKey']].append(usage['objectId'])
        return uses
    
    def remove_plugin_parameters_from_recipe(projectKey, recipeId, params_to_delete):
        """ 
        Helper method that takes a projectKey, recipeId and list of params_to_delete. 
        The method will leverage the dataiku apis to look at the existing parameters of the dataset
        and remove the ones that are in the parmas_to_delete list. If the params_to_delete values
        are do not exist on the recipe, nothing will happen. Method will print out any parameters that
        are removed,
        """
        project = client.get_project(projectKey)
        recipe = project.get_recipe(recipeId)
        settings = recipe.get_settings()
        data = settings.data
        
        for param in params_to_delete:
            removed = data['recipe']['params']['customConfig'].pop(param,None)
            if removed:
                print("Removing param {} = {} from ProjectKey = {}, recipeId = {}:\t".format(param, removed, projectKey, recipeId))
            settings.data = data
            settings.save()
    
            
    def remove_custom_plugin_input_parameter(pluginId = 'some-plugin', projectKey = 'TEMPLATE', params_to_delete = ['parameter1']):
        """
        Alters any current uses of the pluginId by removing params_to_delete from thier metadata. 
        
        * if you pass 'all_projects' as the projectKey, it will do this for all projects in the 
        """
        # Get the usage of plugin across all projects 
        uses = get_plugin_recipe_usages(pluginId)
        
        # If the projectKey arg != 'all_projects' we are expecting to use a single projectKey redefined uses to reference only that projectKey
        if projectKey != 'all_projects' and projectKey in uses:
            uses = {projectKey: uses[projectKey]}
        
        for projectKey in uses.keys():
            print('Altering recipe parameters for project {}'.format(projectKey))
            project = client.get_project(projectKey)   # use api to get Project object
            for recipeId in uses[projectKey]:   # iterate through any of the plugin recipes in that Project
                remove_plugin_parameters_from_recipe(projectKey=projectKey, recipeId=recipeId, params_to_delete=params_to_delete)

    You should be able to run this from a code notebook on DSS to fix the issue. FWIW I suspect this WILL NOT take care of the warning about Version numbers.

    Let me know if this makes sense to you or if I completely misunderstood your issue.

  • tim-wright
    tim-wright Partner, L2 Designer, Snowflake Advanced, Neuron 2020, Registered, Neuron 2021, Neuron 2022 Posts: 77 Partner
    edited July 17

    @Nicolas_Servel
    would something like what I have below work for @rmios
    specific example? If not for his algorithm plugin recipe, would it work for other types of plugin recipes (I developed it assuming a visual recipe for transforming data)? Manually recreating the recipes would probably be fine, but it could get tedious if there are many usages and/or if the recipes are referenced in many scenarios.

    My thought was that DSS probably has reference in its metadata to the original set of model parameters (which include `num_features`). I thought maybe we could use the API layer to programmatically update the parameters in DSS (that may not be visible to the plugin in the front end).

    1. Find usages of Plugin in DSS (helper function)

    2. Update metadata "settings" to remove any of the parameters that existed in earlier plugin but no longer (helper function)

    3. Main function integrating 1 and 2 across possibly all uses of the plugin at one time.

    import dataiku
    from collections import defaultdict
    
    client = dataiku.api_client()
    
    def get_plugin_recipe_usages(pluginId):
        """ 
        Helper method that takes a pluginId and returns a python dictionary where the 
        key is DSS projectKey and value is a list of Recipes in that project where pluginId is used.
        """
        plugin = client.get_plugin(pluginId)
        usages = plugin.list_usages().get_raw()['usages']
        uses=defaultdict(list)
        for usage in usages: 
            if usage['objectType'] == 'RECIPE':
                uses[usage['projectKey']].append(usage['objectId'])
        return uses
    
    def remove_plugin_parameters_from_recipe(projectKey, recipeId, params_to_delete):
        """ 
        Helper method that takes a projectKey, recipeId and list of params_to_delete. 
        The method will leverage the dataiku apis to look at the existing parameters of the dataset
        and remove the ones that are in the parmas_to_delete list. If the params_to_delete values
        are do not exist on the recipe, nothing will happen. Method will print out any parameters that
        are removed,
        """
        project = client.get_project(projectKey)
        recipe = project.get_recipe(recipeId)
        settings = recipe.get_settings()
        data = settings.data
        
        for param in params_to_delete:
            removed = data['recipe']['params']['customConfig'].pop(param,None)
            if removed:
                print("Removing param {} = {} from ProjectKey = {}, recipeId = {}:\t".format(param, removed, projectKey, recipeId))
            settings.data = data
            settings.save()
    
            
    def remove_custom_plugin_input_parameter(pluginId = 'some-plugin', projectKey = 'TEMPLATE', params_to_delete = ['parameter1']):
        """
                    MAIN FUNCTION
        Alters any current uses of the pluginId by removing params_to_delete from thier metadata. 
        
        * if you pass 'all_projects' as the projectKey, it will do this for all projects in the 
        """
        # Get the usage of plugin across all projects 
        uses = get_plugin_recipe_usages(pluginId)
        
        # If the projectKey arg != 'all_projects' we are expecting to use a single projectKey redefined uses to reference only that projectKey
        if projectKey != 'all_projects' and projectKey in uses:
            uses = {projectKey: uses[projectKey]}
        
        for projectKey in uses.keys():
            print('Altering recipe parameters for project {}'.format(projectKey))
            project = client.get_project(projectKey)   # use api to get Project object
            for recipeId in uses[projectKey]:   # iterate through any of the plugin recipes in that Project
                remove_plugin_parameters_from_recipe(projectKey=projectKey, recipeId=recipeId, params_to_delete=params_to_delete)

    @Nicolas_Servel
    If this approach is valid (even if not specifically for this use) I might choose to leave it here for someone else to stumble onto. If it is invalid in all cases, I will remove it for clarity.

  • rmios
    rmios Registered Posts: 19 ✭✭✭✭

    Thanks four your answers! It would be interresting to know if your proposed solution violates any design specifications and would potentially break other internal assumptions of DSS. In the meantime, I am fine with creating a new model for now.

    Are there any plans to remove this known limitation in the future?

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    Hello @rmios
    ,

    The main limitation here is not to be able to have multiple versions of the same plugin installed at the same time in one DSS instance. It is not planned at the moment to make it possible.

    That being said, a few remarks:

    * I am assuming you are currently building your plugin, i.e. playing with new parameters, trying on toy data and making a lot of changes. At this stage, it is expected to have to throw everything when you do a breaking change and start over on a new analysis.

    * Once an actual version of your plugin has been released, i.e. some users are using it for real, best practices recommend not to introduce breaking changes. This means to try as much as possible to make old code still work with new parameters when you work on a new version. If you manage to do that, it means that old models continue to work, while newly created ones can leverages improvements of the new version.

    @tim-wright
    your solution seems to work for plugin recipes (i did not try the code myself though), but it would be complex to reproduce it for ML models. When a model has completed training, it is serialized (pickled) and reused afterwards for scoring. So it is not only a matter of parameters, but also of the code of the actual model.

    And again, your script looks like a migration script when there are breaking changes, it is much less risky to build the plugin avoiding breaking changes.

    Hope this helps

  • rmios
    rmios Registered Posts: 19 ✭✭✭✭

    Thanks I understand. The only concern is that old trained models would lose reproducibility, would they not? Meaning, if I update the plugin and its configuration is backwards-compatible while the internal algorithm changes (e.g. some internal state/seed), the resulting model would differ from the original result without me noticing except for maybe a score-difference.

    I think it's not really an issue if you know that this is happening though.

  • Nicolas_Servel
    Nicolas_Servel Dataiker Posts: 37 Dataiker

    The best approach would be that the behaviour stays the same for old models.

    I do not know the complexity of your algorithm, but rather than removing the parameter, in the python code, you could for example change the default value to "None":

    * If some value is passed (old model), you behave as the old version

    * if no value is passed, you use the new approach

    This can lead to complex code difficult to maintain. That is why building a stable API or a software library in general is a complex topic.

Setup Info
    Tags
      Help me…