Normalize text without lowercase

UserBird
Dataiker
Normalize text without lowercase
The "Simplify text" processor (and a handful of others) have a "Normalize text" option that "transforms to lowercase, removes accents and performs Unicode normalization (Cafรฉ -> cafe)." Anyone figure out a way to remove accents and perform unicode normalization but not change the case? Cafรฉ -> Cafe
0 Kudos
3 Replies
Clรฉment_Stenac
Hi,

This is not possible via the Simplify text processor. You could do it with a custom Python processor
0 Kudos
UserBird
Dataiker
Author
Is a "custom Python processor" different than just the "Python function" processor? I couldn't get the latter to work with the unicodedata module as described here: https://stackoverflow.com/a/16467505/612166 I suspect that's a limitation of the Jython executor?

What normalization form does DSS use, anyhow?
0 Kudos
Mattsco
Dataiker
Hi,
You can use a replace in the python processor with this:
spec = u"ยฒร€รร‚รƒร„ร…ร รกรขรฃรครฅฤ€ฤฤ‚ฤƒฤ„ฤ…ร‡รงฤ†ฤ‡ฤˆฤ‰ฤŠฤ‹ฤŒฤรรฐฤŽฤฤฤ‘รˆร‰รŠร‹รจรฉรชรซฤ’ฤ“ฤ”ฤ•ฤ–ฤ—ฤ˜ฤ™ฤšฤ›ฤœฤฤžฤŸฤ ฤกฤขฤฃฤคฤฅฤฆฤงรŒรรŽรรฌรญรฎรฏฤจฤฉฤชฤซฤฌฤญฤฎฤฏฤฐฤฑฤดฤตฤถฤทฤธฤนฤบฤปฤผฤฝฤพฤฟล€ลล‚ร‘รฑลƒล„ล…ล†ล‡ลˆล‰ลŠล‹ร’ร“ร”ร•ร–ร˜รฒรณรดรตรถรธลŒลลŽลลล‘ล”ล•ล–ล—ล˜ล™ลšล›ลœลลžลŸล ลกลฟลขลฃลคลฅลฆลงร™รšร›รœรนรบรปรผลจลฉลชลซลฌลญลฎลฏลฐลฑลฒลณลดลตรรฝรฟลถลทลธลนลบลปลผลฝลพ"
norm = u"2AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz"
Mattsco
0 Kudos