Ready for Dataiku 9? Try out the Crash Course on new features! GET STARTED

Transform String processor in a Prepare recipe not removing leading whitespace

Solved!
silviutofan
Dataiker
Dataiker
Transform String processor in a Prepare recipe not removing leading whitespace
Getting some unusual behaviour from a "Transform String" shaker. When applied to a string column, it does not successfully remove leading/trailing whitespace. When encoding the text to UTF-8 (via the shaker), I can see that there is a non-breaking whitespace encoded there.

Is this behaviour intended? What can we do to go around this?
0 Kudos
1 Solution
silviutofan
Dataiker
Dataiker
Author

This is the expected behavior as the Trim processor uses the classic definition of whitespaces, not the extended Unicode definition (which includes the dozens of ways there exist to represent spaces).



To remove it, you have to use a Find/Replace with a regular expression



Replace:



\p{Zs}*$



By nothing



 



This means "any number of characters of the Zs Unicode character class immediately followed by the end of the string ($)". To learn more about Unicode character classes, see https://en.wikipedia.org/wiki/Unicode_character_property 

View solution in original post

0 Kudos
1 Reply
silviutofan
Dataiker
Dataiker
Author

This is the expected behavior as the Trim processor uses the classic definition of whitespaces, not the extended Unicode definition (which includes the dozens of ways there exist to represent spaces).



To remove it, you have to use a Find/Replace with a regular expression



Replace:



\p{Zs}*$



By nothing



 



This means "any number of characters of the Zs Unicode character class immediately followed by the end of the string ($)". To learn more about Unicode character classes, see https://en.wikipedia.org/wiki/Unicode_character_property 

View solution in original post

0 Kudos
Labels (2)
A banner prompting to get Dataiku DSS