What is the difference between the Flow and Analysis steps? Are the Analysis steps applied to the 'Web Service' before the transformations? Like having a preprocessing function before a sklearn.pipeline()?
def preprocessing(data): # do some preprocessing... but not transforms... retrun data
Hi, The flow is the main interface for the current project. The analysis and notebooks are for experimentation and prototyping. Once one of these prototypes is ready, you 'deploy' the model to the flow or 'convert' the notebook to recipe. This way of working allows to keep a clean and concise flow and run many experiments at the same time. From my experience it makes collaboration on the same project much easier. Regarding the analysis steps before a visual machine learning model, you are correct. They are pipelined along with the model when you deploy it in real time in an API service. Note that if you use visual machine learning models, we try to optimize the whole scoring pipeline using Java to make it faster than Python. More details on: https://doc.dataiku.com/dss/latest/machine-learning/scoring-engines.html. Hope it helps, Alexandre