Simplify text in Prepare recipe having a bug?

Options
jp1
jp1 Dataiku DSS Core Designer, Dataiku DSS ML Practitioner, Dataiku DSS Adv Designer, Registered Posts: 8

Hello community,

Has sorting option in simplify text step of prepare recipe having a bug? I could see sorting is not giving expected result .
I've taken this example below

live poultry ducks breeding ducklings

after applying stemwords,clearstopwords,sortwords alphabetically options I'm getting output as

breed duckl duck live poultri

but it's the expected result , below is the expected result

breed duck duckl live poultri--> duck should sort first and then duckl should come next

can anyone help me out how these sorting happening? stemwords,clearstopwords,sortwords alphabetically are these step sequentially?? if yes then why not it's giving expected result? or it's sorting the data first later steemming , clearing stopwords happening?

Anyone aswer this as soon as possible?

Thanks in advance for investing time on this!!

Answers

  • Sarina
    Sarina Dataiker, Dataiku DSS Core Designer, Dataiku DSS Adv Designer Posts: 315 Dataiker
    Options

    Hi @jp1
    ,

    Thank you for reporting the sorting issue you found when using the "Simplify text" processor including "sort words alphabetically". I indeed can reproduce the issue you are reporting. I will pass this along to our engineering team for further investigation.

    In the meantime, adding a secondary "Simplify text" processor step with "sort words alphabetically" seems to lead to expected results, so I would suggest adding the processor twice for your use case:

    Screenshot 2023-12-15 at 12.31.35 PM.png


    Thanks,
    Sarina 

Setup Info
    Tags
      Help me…