Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Simon,
Unfortunately, this is not a feature that is builtin in Dataiku (yet), ie not something that you can do without coding.
This is definitely something we are considering currently, and it might become available soon.
Especially if your dataset is large and/or unsorted, the best way to do that would indeed be to use SQL partitioning. A SQL query recipe with something like:
SELECT category, numberdata, RANK() OVER (PARTITION BY category ORDER BY numberdata ASC);
would do the trick.
If your dataset is already ordered (for example, it's a file, only one file), you can also use the visual data preparation with a custom Python processor. Something like:
current_category = None
current_rank = 0
# Modify the process function to fit your needs
def process(row):
global current_category, current_rank
if current_category is None or row["category"] != current_category:
# New category seen
current_rank = 1
current_category =row["category"]
else:
current_rank += 1
row["rank"] = current_rank
Hope this helps,