How to fetch all Dataiku Plugins available into a dataset

Turribeach
How to fetch all Dataiku Plugins available into a dataset

I wanted something to be able to see all plugins as it is not easy to keep track of what plugins are available, when they get updated, when new ones come out. I knew DSS fetched the list of plugins somewhere so while I could have asked Dataiku Support to see if they would tell me I armed myself with the excellent Proxyman and I was able to intercept the SSL traffic and catch the URL that DSS uses to fetch the plugins:

https://update.dataiku.com/dss/11/plugins/list.json

The Plugins URL is version specific so for v12 it would use 12 in the URL. It doesn't work on v10 on below so I suspect this is a new URL used by v11 and above only. The URL got me a nice JSON which with a little bit of Python I produced two datasets: plugins and plugin_releases. 


plugin_releases.PNGplugins.PNG

I was wondering how to share this project with others but I just realised it will much easier for me to share the Python recipe which will be here for ever and will not depend on any file sharing tools so here it goes:

 

 

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
from IPython.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
import requests, json
from sys import platform

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Specify your CA bundle if your OS supports it
if platform == "linux" or platform == "linux2":
    # linux  
    os.environ["REQUESTS_CA_BUNDLE"] = '/etc/ssl/certs/ca-bundle.crt'
elif platform == "darwin":
    # OS X
    print("Use pip to install certifi")
elif platform == "win32":
    # Windows
    print("Use pip to install certifi")

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
client = dataiku.api_client()
dataiku_version = client.get_instance_info().raw['dssVersion'].split(".")[0]

if int(dataiku_version) < 11:
    raise Exception('This only works for Dataiku v11 and above')

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
dataiku_plugins_url = f'https://update.dataiku.com/dss/{dataiku_version}/plugins/list.json'
headers = {'Content-Type': 'application/json'}

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
status_code = 0
status_reason = ''
status_text = ''
    
try:
    response = requests.get(dataiku_plugins_url, headers=headers, verify=True, timeout=(1, 3))
    
    status_code = response.status_code
    status_reason = response.reason
    status_text = str(status_code) + ' - ' + str(status_reason)

    # Raise an exception if the response status code is not successful
    response.raise_for_status()
    
except requests.exceptions.BaseHTTPError:
    error_text = "Base HTTP Error: " + status_text
except requests.exceptions.HTTPError:
    error_text = "HTTP Error: " + status_text
except requests.exceptions.Timeout:
    error_text = "The request timed out"
except requests.exceptions.ConnectionError:
    error_text = "Connection Error"
except requests.exceptions.RequestException:
    error_text = "Unknown error occurred: " + str(status_code)    
    
# If the request was successful
if status_code == 200:
    # Parse the response as JSON
    json_object = response.json()
    
    # Debug: Print the whole JSON object
    # print(json.dumps(json_object, indent=4))

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
df_plugins = pd.DataFrame(columns=['ID', 'Label', 'Description', 'Author', 'Icon', 'Size', 'Store_Version', 'URL', 'Download_URL', 'Support_Level', 'License_Info', 'Downloadable', 'tutorials', 'sampleProjects', 'javaPreparationProcessors', 'javaFormulaFunctions', 'customDatasets', 
                                   'customCodeRecipes', 'customPythonProbes', 'customPythonChecks', 'customSQLProbes', 'customFormats', 'customExporters', 'customPythonSteps', 'customPythonTriggers', 'customRunnables', 'customWebApps', 'customFSProviders', 'customDialects', 
                                   'customJythonProcessors', 'customPythonClusters', 'customParameterSets', 'customFields', 'customJavaPolicyHooks', 'customWebAppExpositions', 'customPythonPredictionAlgos', 'customStandardWebAppTemplates', 'customBokehWebAppTemplates', 
                                   'customShinyWebAppTemplates', 'customRMarkdownReportTemplates', 'customPreBuiltNotebookTemplates', 'customPythonNotebookTemplates', 'customRNotebookTemplates', 'customScalaNotebookTemplates', 'customPreBuiltDatasetNotebookTemplates', 
                                   'customPythonDatasetNotebookTemplates', 'customRDatasetNotebookTemplates', 'customScalaDatasetNotebookTemplates'])

df_plugin_releases = pd.DataFrame(columns=['ID', 'Label', 'Version', 'Release_Date_Time', 'Release_Notes'])

for item in json_object['items']:

    for release in item['revisions']:
        plugin_release_record = pd.DataFrame.from_dict({'ID': [item['id']], 'Label': [item['meta'].get('label', '')], 'Version': [release.get('version', '')], 'Release_Date_Time': [release.get('releaseTime', '')], 'Release_Notes': [release.get('releaseNotes', '')]})
        df_plugin_releases = pd.concat([df_plugin_releases, plugin_release_record], ignore_index=True, sort=False)
    
    plugin_record = pd.DataFrame.from_dict({'ID': [item['id']], 'Label': [item['meta'].get('label', '')], 'Description': [item['meta'].get('description', '')], 'Author': [item['meta'].get('author', '')], 'Icon': [item['meta'].get('icon', '')], 'Size': [item['size']],
                                         'Store_Version': [item['storeVersion']], 'URL': [item['meta'].get('url', '')], 'Download_URL': [item['downloadURL']], 'Support_Level': [item['meta'].get('supportLevel', '')], 'License_Info': [item['meta'].get('licenseInfo', '')],
                                         'Downloadable': [item['storeFlags'].get('downloadable', '')], 'tutorials': [len(item['content']['tutorials'])], 'sampleProjects': [len(item['content']['sampleProjects'])], 'javaPreparationProcessors': [len(item['content']['javaPreparationProcessors'])],
                                         'javaFormulaFunctions': [len(item['content']['javaFormulaFunctions'])], 'customDatasets': [len(item['content']['customDatasets'])], 'customCodeRecipes': [len(item['content']['customCodeRecipes'])],
                                         'customPythonProbes': [len(item['content']['customPythonProbes'])], 'customPythonChecks': [len(item['content']['customPythonChecks'])], 'customSQLProbes': [len(item['content']['customSQLProbes'])], 'customFormats': [len(item['content']['customFormats'])],
                                         'customExporters': [len(item['content']['customExporters'])], 'customPythonSteps': [len(item['content']['customPythonSteps'])], 'customPythonTriggers': [len(item['content']['customPythonTriggers'])], 'customRunnables': [len(item['content']['customRunnables'])],
                                         'customWebApps': [len(item['content']['customWebApps'])], 'customFSProviders': [len(item['content']['customFSProviders'])], 'customDialects': [len(item['content']['customDialects'])], 'customJythonProcessors': [len(item['content']['customJythonProcessors'])],
                                         'customPythonClusters': [len(item['content']['customPythonClusters'])], 'customParameterSets': [len(item['content']['customParameterSets'])], 'customFields': [len(item['content']['customFields'])],
                                         'customJavaPolicyHooks': [len(item['content']['customJavaPolicyHooks'])], 'customWebAppExpositions': [len(item['content']['customWebAppExpositions'])], 'customPythonPredictionAlgos': [len(item['content']['customPythonPredictionAlgos'])],
                                         'customStandardWebAppTemplates': [len(item['content']['customStandardWebAppTemplates'])], 'customBokehWebAppTemplates': [len(item['content']['customBokehWebAppTemplates'])], 'customShinyWebAppTemplates': [len(item['content']['customShinyWebAppTemplates'])],
                                         'customRMarkdownReportTemplates': [len(item['content']['customRMarkdownReportTemplates'])], 'customPreBuiltNotebookTemplates': [len(item['content']['customPreBuiltNotebookTemplates'])],
                                         'customPythonNotebookTemplates': [len(item['content']['customPythonNotebookTemplates'])], 'customRNotebookTemplates': [len(item['content']['customRNotebookTemplates'])], 'customScalaNotebookTemplates': [len(item['content']['customScalaNotebookTemplates'])],
                                         'customPreBuiltDatasetNotebookTemplates': [len(item['content']['customPreBuiltDatasetNotebookTemplates'])], 'customPythonDatasetNotebookTemplates': [len(item['content']['customPythonDatasetNotebookTemplates'])],
                                         'customRDatasetNotebookTemplates': [len(item['content']['customRDatasetNotebookTemplates'])], 'customScalaDatasetNotebookTemplates': [len(item['content']['customScalaDatasetNotebookTemplates'])]})

    df_plugins = pd.concat([df_plugins, plugin_record], ignore_index=True, sort=False)

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
df_plugin_releases['Release_Date_Time'] = pd.to_datetime(df_plugin_releases['Release_Date_Time'],unit='ms')

# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE


# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE
# Recipe outputs
plugins = dataiku.Dataset("plugins")
plugins.write_with_schema(df_plugins)
plugin_releases = dataiku.Dataset("plugin_releases")
plugin_releases.write_with_schema(df_plugin_releases)

 

 

So to add this to a project, add a Python recipe, set two outputs as follows: plugins and plugin_releases and click Create Recipe. Run it and you will have the two new datasets populated. Now you have an easy way to explore Dataiku plugins and see when they get changed/released. Obviously the Plugins URL has not been formaly published by Dataiku but considering every DSS v11 and v12 is using this URL I would think it's pretty safe to use, even if unsupported. Also if this project breaks is not the end of the world, we are not trying to predict anything here, it's an information tool.

In our case I think I am going to build a scenario to check for new plugin releases daily or weekly, and then post a notification on a Team's channel so our users and myself get notified when new plugin versions get released.

Hope it helps!

1 Reply
AlexT
Dataiker

Thanks for Sharing @Turribeach !
In case anyone runs into this in the future URL is slightly different for DSS 12:
https://update.dataiku.com/dss/12/plugins/list.json

0 Kudos