General Discussion
- Hello everyone! So, i am working on a project where we have a webApp in bokeh and ml pipeline in flow section of DataIku. So, the application has several input fields which should be supplied to the m…Last answer by pmasiphelps
Hi,
Here's some python code using the API that assumes you have a flow starting with an "Uploaded File" type dataset. This assumes you've written the code to read in an uploaded file from the webapp user - then it places this file in the initial dataset in your flow.
import dataiku from dataiku import pandasutils as pdu import pandas as pd client = dataiku.api_client() proj = client.get_project("PROJECT_KEY") ds = proj.get_dataset("UPLOADED_FILE_DATASET_NAME") #clear any existing file from this dataset ds.clear() upload_file_path = "your_path_here" with open(upload_file_path, "rb") as f: ds.uploaded_add_file(f, upload_file_path)
Then, assuming you've created a scenario in your project that runs your ML pipeline, you can run this scenario via another API call in your webapp.
scenario = proj.get_scenario("SCENARIO_ID") trigger_fire = scenario.run()
There are a number of variants to running scenarios via the API - doc here: https://doc.dataiku.com/dss/latest/python-api/scenarios.html#run-a-scenario
If using Bokeh, the way to load the latest trained model from the flow would indeed be via the API.
If you're interested in developing applications using flow components (replacing files in datasets, running scenarios, pulling ML model info) without having to code them yourself using the APIs - project applications would be a great thing to check out. Here's some hands-on tutorials: https://academy.dataiku.com/dataiku-applications-tutorials-open
Best,
Pat
- Hello We want to import multiple projects and while importing into new environment, we would like to bulk replace connection from "file system" to "HDFS". Standard Export - Import only allows to choos…Last answer by Manuel
Even if there are multiple projects, at the project level it is fairly easy to change the connection in bulk:
- In your flow,
- (bottom left) View > Connections > connection > Select all checked
- (bottom right) Other Actions > Change Connection
See the image below.
I hope this helps.
- I have a PySpark recipe which reads a dataset, and extracts a column based on first index (first row). In a scenario when the input dataset partition is empty, it throws a normal error: 'index out of …Last answer by