Is there a way to count specific types of characters in a text cell?

UserBird Dataiker, Alpha Tester Posts: 535 Dataiker
I am trying to enrich a dataset containing product names and descriptions and I would like to extract the number of words / letters capitalized and non-capitalized / numbers in certain columns.

Is there any way to do this easily?

Best Answer

  • Thomas
    Thomas Dataiker Alumni Posts: 19 ✭✭✭✭✭
    Answer ✓

    Hi Vincent,

    One way to do it is to use a Custom Python Script in Analyze. You can easily implement your logic this way. For example, if you want to test for specific values in a string, you could do the following:

    import json <BR /><BR />def process(row):<BR /> <BR /> # Initialize counters<BR /> _uppers = 0<BR /> _lowers = 0<BR /> _commas = 0<BR /> _digits = 0<BR /> <BR /> for character in row['name']:<BR /> if character.isupper(): # check for uppercase values<BR /> _uppers = _uppers + 1<BR /> if character.islower(): # check for lowercase values<BR /> _lowers = _lowers + 1<BR /> if character == ',': # check for commas<BR /> _commas = _commas + 1<BR /> if character.isdigit(): # check for numbers<BR /> _digits = _digits + 1<BR /> <BR /> return json.dumps({<BR /> 'count_uppercase_values': _uppers,<BR /> 'count_lowercase_values': _lowers,<BR /> 'count_commas': _commas,<BR /> 'count_digits': _digits,<BR /> })

    The cool thing is that you output as many counts as you want and pass it to a Flatten JSON processor to create your columns.

Setup Info
      Help me…