Community Conundrums are live! Learn more

Normalize text without lowercase

Dataiker
Dataiker
Normalize text without lowercase
The "Simplify text" processor (and a handful of others) have a "Normalize text" option that "transforms to lowercase, removes accents and performs Unicode normalization (Café -> cafe)." Anyone figure out a way to remove accents and perform unicode normalization but not change the case? Café -> Cafe
0 Kudos
3 Replies
Dataiker
Dataiker
Hi,

This is not possible via the Simplify text processor. You could do it with a custom Python processor
0 Kudos
Dataiker
Dataiker
Hi,
You can use a replace in the python processor with this:
spec = u"²ÀÁÂÃÄÅàáâãäåĀāĂ㥹ÇçĆćĈĉĊċČčÐðĎďĐđÈÉÊËèéêëĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħÌÍÎÏìíîïĨĩĪīĬĭĮįİıĴĵĶķĸĹĺĻļĽľĿŀŁłÑñŃńŅņŇňʼnŊŋÒÓÔÕÖØòóôõöøŌōŎŏŐőŔŕŖŗŘřŚśŜŝŞşŠšſŢţŤťŦŧÙÚÛÜùúûüŨũŪūŬŭŮůŰűŲųŴŵÝýÿŶŷŸŹźŻżŽž"
norm = u"2AAAAAAaaaaaaAaAaAaCcCcCcCcCcDdDdDdEEEEeeeeEeEeEeEeEeGgGgGgGgHhHhIIIIiiiiIiIiIiIiIiJjKkkLlLlLlLlLlNnNnNnNnnNnOOOOOOooooooOoOoOoRrRrRrSsSsSsSssTtTtTtUUUUuuuuUuUuUuUuUuUuWwYyyYyYZzZzZz"
Mattsco
0 Kudos
Dataiker
Dataiker
Author
Is a "custom Python processor" different than just the "Python function" processor? I couldn't get the latter to work with the unicodedata module as described here: https://stackoverflow.com/a/16467505/612166 I suspect that's a limitation of the Jython executor?

What normalization form does DSS use, anyhow?
0 Kudos
Labels (4)