How to fetch all Dataiku Plugins available into a dataset
I wanted something to be able to see all plugins as it is not easy to keep track of what plugins are available, when they get updated, when new ones come out. I knew DSS fetched the list of plugins somewhere so while I could have asked Dataiku Support to see if they would tell me I armed myself with the excellent Proxyman and I was able to intercept the SSL traffic and catch the URL that DSS uses to fetch the plugins:
https://update.dataiku.com/dss/11/plugins/list.json
The Plugins URL is version specific so for v12 it would use 12 in the URL. It doesn't work on v10 on below so I suspect this is a new URL used by v11 and above only. The URL got me a nice JSON which with a little bit of Python I produced two datasets: plugins and plugin_releases.
I was wondering how to share this project with others but I just realised it will much easier for me to share the Python recipe which will be here for ever and will not depend on any file sharing tools so here it goes:
# -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE from IPython.display import display, HTML display(HTML("<style>.container { width:100% !important; }</style>")) # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE import dataiku from dataiku import pandasutils as pdu import pandas as pd import requests, json from sys import platform # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # Specify your CA bundle if your OS supports it if platform == "linux" or platform == "linux2": # linux os.environ["REQUESTS_CA_BUNDLE"] = '/etc/ssl/certs/ca-bundle.crt' elif platform == "darwin": # OS X print("Use pip to install certifi") elif platform == "win32": # Windows print("Use pip to install certifi") # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE client = dataiku.api_client() dataiku_version = client.get_instance_info().raw['dssVersion'].split(".")[0] if int(dataiku_version) < 11: raise Exception('This only works for Dataiku v11 and above') # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE dataiku_plugins_url = f'https://update.dataiku.com/dss/{dataiku_version}/plugins/list.json' headers = {'Content-Type': 'application/json'} # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE status_code = 0 status_reason = '' status_text = '' try: response = requests.get(dataiku_plugins_url, headers=headers, verify=True, timeout=(1, 3)) status_code = response.status_code status_reason = response.reason status_text = str(status_code) + ' - ' + str(status_reason) # Raise an exception if the response status code is not successful response.raise_for_status() except requests.exceptions.BaseHTTPError: error_text = "Base HTTP Error: " + status_text except requests.exceptions.HTTPError: error_text = "HTTP Error: " + status_text except requests.exceptions.Timeout: error_text = "The request timed out" except requests.exceptions.ConnectionError: error_text = "Connection Error" except requests.exceptions.RequestException: error_text = "Unknown error occurred: " + str(status_code) # If the request was successful if status_code == 200: # Parse the response as JSON json_object = response.json() # Debug: Print the whole JSON object # print(json.dumps(json_object, indent=4)) # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE df_plugins = pd.DataFrame(columns=['ID', 'Label', 'Description', 'Author', 'Icon', 'Size', 'Store_Version', 'URL', 'Download_URL', 'Support_Level', 'License_Info', 'Downloadable', 'tutorials', 'sampleProjects', 'javaPreparationProcessors', 'javaFormulaFunctions', 'customDatasets', 'customCodeRecipes', 'customPythonProbes', 'customPythonChecks', 'customSQLProbes', 'customFormats', 'customExporters', 'customPythonSteps', 'customPythonTriggers', 'customRunnables', 'customWebApps', 'customFSProviders', 'customDialects', 'customJythonProcessors', 'customPythonClusters', 'customParameterSets', 'customFields', 'customJavaPolicyHooks', 'customWebAppExpositions', 'customPythonPredictionAlgos', 'customStandardWebAppTemplates', 'customBokehWebAppTemplates', 'customShinyWebAppTemplates', 'customRMarkdownReportTemplates', 'customPreBuiltNotebookTemplates', 'customPythonNotebookTemplates', 'customRNotebookTemplates', 'customScalaNotebookTemplates', 'customPreBuiltDatasetNotebookTemplates', 'customPythonDatasetNotebookTemplates', 'customRDatasetNotebookTemplates', 'customScalaDatasetNotebookTemplates']) df_plugin_releases = pd.DataFrame(columns=['ID', 'Label', 'Version', 'Release_Date_Time', 'Release_Notes']) for item in json_object['items']: for release in item['revisions']: plugin_release_record = pd.DataFrame.from_dict({'ID': [item['id']], 'Label': [item['meta'].get('label', '')], 'Version': [release.get('version', '')], 'Release_Date_Time': [release.get('releaseTime', '')], 'Release_Notes': [release.get('releaseNotes', '')]}) df_plugin_releases = pd.concat([df_plugin_releases, plugin_release_record], ignore_index=True, sort=False) plugin_record = pd.DataFrame.from_dict({'ID': [item['id']], 'Label': [item['meta'].get('label', '')], 'Description': [item['meta'].get('description', '')], 'Author': [item['meta'].get('author', '')], 'Icon': [item['meta'].get('icon', '')], 'Size': [item['size']], 'Store_Version': [item['storeVersion']], 'URL': [item['meta'].get('url', '')], 'Download_URL': [item['downloadURL']], 'Support_Level': [item['meta'].get('supportLevel', '')], 'License_Info': [item['meta'].get('licenseInfo', '')], 'Downloadable': [item['storeFlags'].get('downloadable', '')], 'tutorials': [len(item['content']['tutorials'])], 'sampleProjects': [len(item['content']['sampleProjects'])], 'javaPreparationProcessors': [len(item['content']['javaPreparationProcessors'])], 'javaFormulaFunctions': [len(item['content']['javaFormulaFunctions'])], 'customDatasets': [len(item['content']['customDatasets'])], 'customCodeRecipes': [len(item['content']['customCodeRecipes'])], 'customPythonProbes': [len(item['content']['customPythonProbes'])], 'customPythonChecks': [len(item['content']['customPythonChecks'])], 'customSQLProbes': [len(item['content']['customSQLProbes'])], 'customFormats': [len(item['content']['customFormats'])], 'customExporters': [len(item['content']['customExporters'])], 'customPythonSteps': [len(item['content']['customPythonSteps'])], 'customPythonTriggers': [len(item['content']['customPythonTriggers'])], 'customRunnables': [len(item['content']['customRunnables'])], 'customWebApps': [len(item['content']['customWebApps'])], 'customFSProviders': [len(item['content']['customFSProviders'])], 'customDialects': [len(item['content']['customDialects'])], 'customJythonProcessors': [len(item['content']['customJythonProcessors'])], 'customPythonClusters': [len(item['content']['customPythonClusters'])], 'customParameterSets': [len(item['content']['customParameterSets'])], 'customFields': [len(item['content']['customFields'])], 'customJavaPolicyHooks': [len(item['content']['customJavaPolicyHooks'])], 'customWebAppExpositions': [len(item['content']['customWebAppExpositions'])], 'customPythonPredictionAlgos': [len(item['content']['customPythonPredictionAlgos'])], 'customStandardWebAppTemplates': [len(item['content']['customStandardWebAppTemplates'])], 'customBokehWebAppTemplates': [len(item['content']['customBokehWebAppTemplates'])], 'customShinyWebAppTemplates': [len(item['content']['customShinyWebAppTemplates'])], 'customRMarkdownReportTemplates': [len(item['content']['customRMarkdownReportTemplates'])], 'customPreBuiltNotebookTemplates': [len(item['content']['customPreBuiltNotebookTemplates'])], 'customPythonNotebookTemplates': [len(item['content']['customPythonNotebookTemplates'])], 'customRNotebookTemplates': [len(item['content']['customRNotebookTemplates'])], 'customScalaNotebookTemplates': [len(item['content']['customScalaNotebookTemplates'])], 'customPreBuiltDatasetNotebookTemplates': [len(item['content']['customPreBuiltDatasetNotebookTemplates'])], 'customPythonDatasetNotebookTemplates': [len(item['content']['customPythonDatasetNotebookTemplates'])], 'customRDatasetNotebookTemplates': [len(item['content']['customRDatasetNotebookTemplates'])], 'customScalaDatasetNotebookTemplates': [len(item['content']['customScalaDatasetNotebookTemplates'])]}) df_plugins = pd.concat([df_plugins, plugin_record], ignore_index=True, sort=False) # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE df_plugin_releases['Release_Date_Time'] = pd.to_datetime(df_plugin_releases['Release_Date_Time'],unit='ms') # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # -------------------------------------------------------------------------------- NOTEBOOK-CELL: CODE # Recipe outputs plugins = dataiku.Dataset("plugins") plugins.write_with_schema(df_plugins) plugin_releases = dataiku.Dataset("plugin_releases") plugin_releases.write_with_schema(df_plugin_releases)
So to add this to a project, add a Python recipe, set two outputs as follows: plugins and plugin_releases and click Create Recipe. Run it and you will have the two new datasets populated. Now you have an easy way to explore Dataiku plugins and see when they get changed/released. Obviously the Plugins URL has not been formaly published by Dataiku but considering every DSS v11 and v12 is using this URL I would think it's pretty safe to use, even if unsupported. Also if this project breaks is not the end of the world, we are not trying to predict anything here, it's an information tool.
In our case I think I am going to build a scenario to check for new plugin releases daily or weekly, and then post a notification on a Team's channel so our users and myself get notified when new plugin versions get released.
Hope it helps!
Answers
-
Alexandru Dataiker, Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 1,225 Dataiker
Thanks for Sharing @Turribeach
!
In case anyone runs into this in the future URL is slightly different for DSS 12:
https://update.dataiku.com/dss/12/plugins/list.json