Is there a way to count specific types of characters in a text cell?

Highlighted
UserBird Dataiker
Dataiker
Is there a way to count specific types of characters in a text cell?
Jump to solution
I am trying to enrich a dataset containing product names and descriptions and I would like to extract the number of words / letters capitalized and non-capitalized / numbers in certain columns.

Is there any way to do this easily?
1 Solution

Accepted Solutions
Thomas Dataiker
Dataiker
Re: Is there a way to count specific types of characters in a text cell?
Jump to solution

Hi Vincent, 



One way to do it is to use a Custom Python Script in Analyze. You can easily implement your logic this way. For example, if you want to test for specific values in a string, you could do the following:




import json

def process(row):

# Initialize counters
_uppers = 0
_lowers = 0
_commas = 0
_digits = 0

for character in row['name']:
if character.isupper(): # check for uppercase values
_uppers = _uppers + 1
if character.islower(): # check for lowercase values
_lowers = _lowers + 1
if character == ',': # check for commas
_commas = _commas + 1
if character.isdigit(): # check for numbers
_digits = _digits + 1

return json.dumps({
'count_uppercase_values': _uppers,
'count_lowercase_values': _lowers,
'count_commas': _commas,
'count_digits': _digits,
})


The cool thing is that you output as many counts as you want and pass it to a Flatten JSON processor to create your columns. 

View solution in original post

1 Reply
Thomas Dataiker
Dataiker
Re: Is there a way to count specific types of characters in a text cell?
Jump to solution

Hi Vincent, 



One way to do it is to use a Custom Python Script in Analyze. You can easily implement your logic this way. For example, if you want to test for specific values in a string, you could do the following:




import json

def process(row):

# Initialize counters
_uppers = 0
_lowers = 0
_commas = 0
_digits = 0

for character in row['name']:
if character.isupper(): # check for uppercase values
_uppers = _uppers + 1
if character.islower(): # check for lowercase values
_lowers = _lowers + 1
if character == ',': # check for commas
_commas = _commas + 1
if character.isdigit(): # check for numbers
_digits = _digits + 1

return json.dumps({
'count_uppercase_values': _uppers,
'count_lowercase_values': _lowers,
'count_commas': _commas,
'count_digits': _digits,
})


The cool thing is that you output as many counts as you want and pass it to a Flatten JSON processor to create your columns. 

View solution in original post

Labels (2)