I am looking for ways to extract the exact train/test/validation sets used in visual ML. This not only implies the data splits, but also datasets that include all new features created as a result of the data processing in visual ML. For example, if dummy encoding is used for a text column, I would like the train/test/validation sets to include all the additional columns created before the datasets are passed to an ML model.
While exporting the model as a notebook exports some of the steps involved required to create these sets, it does not include all.
The use case is carrying out additional model stress testing/explainability analysis for which we are using custom code. For this analysis, in addition to the ML model, we also require the exact train/test/validation sets that were passed to the ML model.
Exporting train/test preprocessed data is not currently possible in Dataiku but we are looking into it. We'll take note of this post and will keep you posted if future product improvements can help solving your issue.