This website uses cookies. By browsing this website, you consent to the use of cookies. Learn more.

Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page

Highlighted
######

Hi,

If I have data like this:

A 1

A 5

A 7

B 1

B 10

B 20

I want to generate a third column with a sequence:

A 1 1

A 5 2

A 7 3

B 1 1

B 10 2

B 20 3

How can I do it in DSS (other than R, Python or SQL)?

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Generate sequence, restart at 1 each time a given column changes

If I have data like this:

A 1

A 5

A 7

B 1

B 10

B 20

I want to generate a third column with a sequence:

A 1 1

A 5 2

A 7 3

B 1 1

B 10 2

B 20 3

How can I do it in DSS (other than R, Python or SQL)?

2 Replies

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Generate sequence, restart at 1 each time a given column changes

Hi Simon,

Unfortunately, this is not a feature that is builtin in Dataiku (yet), ie not something that you can do without coding.

This is definitely something we are considering currently, and it might become available soon.

Especially if your dataset is large and/or unsorted, the best way to do that would indeed be to use SQL partitioning. A SQL query recipe with something like:

SELECT category, numberdata, RANK() OVER (PARTITION BY category ORDER BY numberdata ASC);

would do the trick.

If your dataset is already ordered (for example, it's a file, only one file), you can also use the visual data preparation with a custom Python processor. Something like:

current_category = None

current_rank = 0

# Modify the process function to fit your needs

def process(row):

global current_category, current_rank

if current_category is None or row["category"] != current_category:

# New category seen

current_rank = 1

current_category =row["category"]

else:

current_rank += 1

row["rank"] = current_rank

Hope this helps,

- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Email to a Friend
- Report Inappropriate Content

Re: Generate sequence, restart at 1 each time a given column changes

DT.dt[, seq_var := sequence(.N), by = "A"])

To SQL I would also add that I should more probably use DENSE_RANK or even ROW_NUMBER if the SQL supports its.