Announcing the winners & finalists of the Dataiku Frontrunner Awards 2021! Read their inspiring stories

How to apply the same transformations to test and train without duplicating the flow?

tjh
Level 3
How to apply the same transformations to test and train without duplicating the flow?
Are filters applied in the visual analysis also deployed together with the model and thus will apply the same filters at prediction time?

If not then how to accomplish this typical pipeline behaviour?
0 Kudos
2 Replies
Alex_Combessie
Dataiker Alumni
Hi,

I recommend using a stack recipe to get both train and test in the same dataset. In the stack recipe, you can add a new column specifying the origin: "train/test". Then you would have a single transformation pipeline, until the ML model where you would specify the train/test split using filters on the origin column defined in the beginning.

Happy to provide more details if needed,

Alex
0 Kudos
Mattsco
Dataiker
Dataiker
Or if it's possible, you do the data preparation at the end, in the script attached to the model. It will be packaged with the model.
Mattsco
0 Kudos
Labels (3)
A banner prompting to get Dataiku DSS