How to apply the same transformations to test and train without duplicating the flow?

tjh Registered Posts: 20 ✭✭✭✭
Are filters applied in the visual analysis also deployed together with the model and thus will apply the same filters at prediction time?

If not then how to accomplish this typical pipeline behaviour?


  • Alex_Combessie
    Alex_Combessie Alpha Tester, Dataiker Alumni Posts: 539 ✭✭✭✭✭✭✭✭✭

    I recommend using a stack recipe to get both train and test in the same dataset. In the stack recipe, you can add a new column specifying the origin: "train/test". Then you would have a single transformation pipeline, until the ML model where you would specify the train/test split using filters on the origin column defined in the beginning.

    Happy to provide more details if needed,

  • Mattsco
    Mattsco Administrator, Dataiker Posts: 125 Administrator
    Or if it's possible, you do the data preparation at the end, in the script attached to the model. It will be packaged with the model.
Setup Info
      Help me…