Is there a way to split a dataset based on a column value?

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hello all,

I have a dataset which contains many rows for a given event, which id is in the "event_id" column. There are of course many events in the dataset.

Is there a way to split this dataset more easily than manually defining the output datasets using the split visual recipe? There are hundreds of events... (it would be a bit painful, or at least time-consuming).

I am using DSS 2.2.2



Thanks in advance!

Answers

  • jereze
    jereze Alpha Tester, Dataiker Alumni Posts: 190 ✭✭✭✭✭✭✭✭

    Hi Alex,

    There is not really a better way to do this than with the split recipe. If you want to have one dataset per event, you need anyway to create these datasets. Maybe you could create the datasets with DSS API but is still not ideal.

    The best option would be to change your strategy. You should keep a single dataset and create a partition on the event_id column. To learn more about it, you can read Working with partitions and Repartitioning a non-partitioned dataset.

    I hope that helps,

    Jeremy

Setup Info
    Tags
      Help me…