Generate sequence, restart at 1 each time a given column changes

UserBird
UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
Hi,

If I have data like this:

A 1
A 5
A 7
B 1
B 10
B 20



I want to generate a third column with a sequence:

A 1 1
A 5 2
A 7 3
B 1 1
B 10 2
B 20 3

How can I do it in DSS (other than R, Python or SQL)?

Answers

  • Clément_Stenac
    Clément_Stenac Dataiker, Dataiku DSS Core Designer, Registered Posts: 753 Dataiker
    edited July 17

    Hi Simon,



    Unfortunately, this is not a feature that is builtin in Dataiku (yet), ie not something that you can do without coding.

    This is definitely something we are considering currently, and it might become available soon.



    Especially if your dataset is large and/or unsorted, the best way to do that would indeed be to use SQL partitioning. A SQL query recipe with something like:


    SELECT category, numberdata, RANK() OVER (PARTITION BY category ORDER BY numberdata ASC);



    would do the trick.



    If your dataset is already ordered (for example, it's a file, only one file), you can also use the visual data preparation with a custom Python processor. Something like:


    current_category = None
    current_rank = 0

    # Modify the process function to fit your needs
    def process(row):
    global current_category, current_rank

    if current_category is None or row["category"] != current_category:
    # New category seen
    current_rank = 1
    current_category =row["category"]
    else:
    current_rank += 1

    row["rank"] = current_rank





    Hope this helps,

  • Simon
    Simon Registered Posts: 2 ✭✭✭✭
    Thanks. I would add to this using sequence from R (using datatables as an example)
    DT.dt[, seq_var := sequence(.N), by = "A"])

    To SQL I would also add that I should more probably use DENSE_RANK or even ROW_NUMBER if the SQL supports its.
Setup Info
    Tags
      Help me…