Try your hand at analyzing royal sentiment in Dataiku DSS! Learn more

How to apply the same transformations to test and train without duplicating the flow?

Level 3
How to apply the same transformations to test and train without duplicating the flow?
Are filters applied in the visual analysis also deployed together with the model and thus will apply the same filters at prediction time?

If not then how to accomplish this typical pipeline behaviour?
0 Kudos
2 Replies
Dataiker
Dataiker
Hi,

I recommend using a stack recipe to get both train and test in the same dataset. In the stack recipe, you can add a new column specifying the origin: "train/test". Then you would have a single transformation pipeline, until the ML model where you would specify the train/test split using filters on the origin column defined in the beginning.

Happy to provide more details if needed,

Alex
0 Kudos
Dataiker
Dataiker
Or if it's possible, you do the data preparation at the end, in the script attached to the model. It will be packaged with the model.
Mattsco
0 Kudos