Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
There is currently no way to do that in a visual preparation recipe* (because a visual recipe more or less works row by row, and it cannot work on a full column, as it is designed for big data).
It's possible to do so in a visual GROUP recipe: click “Show mass actions”, select all columns, click “use as grouping keys”. If the csv is very big, I suggest synchronizing to a SQL DB first.
You can also do so in coding recipes:
* There is actually one way to do it in a visual preparation recipe, with a custom Python function, but that will not work all the time (if the recipe is multi-threaded), so I would not recommend this trick:
I hope that helps,
In a python recipe I would do:
# -*- coding: utf-8 -*-
import pandas as pd
# Recipe inputs
df = dataiku.Dataset("input_dataset").get_dataframe()
# to use all columns to compare for duplicates
# Recipe outputs
out = dataiku.Dataset("output_dataset").write_with_schema(df)