Simplify text in Prepare recipe having a bug?
Hello community,
Has sorting option in simplify text step of prepare recipe having a bug? I could see sorting is not giving expected result .
I've taken this example below
live poultry ducks breeding ducklings
after applying stemwords,clearstopwords,sortwords alphabetically options I'm getting output as
breed duckl duck live poultri
but it's the expected result , below is the expected result
breed duck duckl live poultri--> duck should sort first and then duckl should come next
can anyone help me out how these sorting happening? stemwords,clearstopwords,sortwords alphabetically are these step sequentially?? if yes then why not it's giving expected result? or it's sorting the data first later steemming , clearing stopwords happening?
Anyone aswer this as soon as possible?
Thanks in advance for investing time on this!!
Answers
-
Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer, Registered Posts: 317 Dataiker
Hi @jp1
,
Thank you for reporting the sorting issue you found when using the "Simplify text" processor including "sort words alphabetically". I indeed can reproduce the issue you are reporting. I will pass this along to our engineering team for further investigation.
In the meantime, adding a secondary "Simplify text" processor step with "sort words alphabetically" seems to lead to expected results, so I would suggest adding the processor twice for your use case:
Thanks,
Sarina