topic Generate sequence, restart at 1 each time a given column changes in Using Dataiku DSS
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/323#M49
Hi,<BR /><BR />If I have data like this:<BR /><BR />A 1<BR />A 5<BR />A 7<BR />B 1<BR />B 10<BR />B 20<BR /><BR /> <BR /><BR />I want to generate a third column with a sequence:<BR /><BR />A 1 1<BR />A 5 2<BR />A 7 3<BR />B 1 1<BR />B 10 2<BR />B 20 3<BR /><BR />How can I do it in DSS (other than R, Python or SQL)?
Thu, 28 May 2015 16:01:50 GMT
UserBird
20150528T16:01:50Z

Generate sequence, restart at 1 each time a given column changes
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/323#M49
Hi,<BR /><BR />If I have data like this:<BR /><BR />A 1<BR />A 5<BR />A 7<BR />B 1<BR />B 10<BR />B 20<BR /><BR /> <BR /><BR />I want to generate a third column with a sequence:<BR /><BR />A 1 1<BR />A 5 2<BR />A 7 3<BR />B 1 1<BR />B 10 2<BR />B 20 3<BR /><BR />How can I do it in DSS (other than R, Python or SQL)?
Thu, 28 May 2015 16:01:50 GMT
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/323#M49
UserBird
20150528T16:01:50Z

Re: Generate sequence, restart at 1 each time a given column changes
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/324#M50
<P>Hi Simon,<BR /><BR /><BR /><BR />Unfortunately, this is not a feature that is builtin in Dataiku (yet), ie not something that you can do without coding.<BR /><BR />This is definitely something we are considering currently, and it might become available soon.<BR /><BR /><BR /><BR />Especially if your dataset is large and/or unsorted, the best way to do that would indeed be to use SQL partitioning. A SQL query recipe with something like:<BR /><BR /> </P><BR /><BR /><PRE><BR />SELECT category, numberdata, RANK() OVER (PARTITION BY category ORDER BY numberdata ASC);<BR /></PRE><BR /><BR /><P><BR /><BR />would do the trick.<BR /><BR /><BR /><BR />If your dataset is already ordered (for example, it's a file, only one file), you can also use the visual data preparation with a custom Python processor. Something like:<BR /><BR /> </P><BR /><BR /><PRE><BR />current_category = None<BR />current_rank = 0<BR /><BR /># Modify the process function to fit your needs<BR />def process(row):<BR /> global current_category, current_rank<BR /> <BR /> if current_category is None or row["category"] != current_category:<BR /> # New category seen<BR /> current_rank = 1<BR /> current_category =row["category"]<BR /> else:<BR /> current_rank += 1<BR /> <BR /> row["rank"] = current_rank<BR /> </PRE><BR /><BR /><P><BR /><BR /><BR /><BR />Hope this helps,</P>
Thu, 28 May 2015 16:44:57 GMT
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/324#M50
ClĂ©ment_Stenac
20150528T16:44:57Z

Re: Generate sequence, restart at 1 each time a given column changes
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/325#M51
Thanks. I would add to this using sequence from R (using datatables as an example)<BR />DT.dt[, seq_var := sequence(.N), by = "A"])<BR /><BR />To SQL I would also add that I should more probably use DENSE_RANK or even ROW_NUMBER if the SQL supports its.
Thu, 28 May 2015 17:51:37 GMT
https://community.dataiku.com/t5/UsingDataikuDSS/Generatesequencerestartat1eachtimeagivencolumnchanges/mp/325#M51
Simon
20150528T17:51:37Z