Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi all, I need to mask some columns of personal data while retaining the first 2 and last 2 letters.
Please see my mock data
NAME | ADDRESS | CONTACT NO | |
Emily Brown | 21 Annabelle Street New Jersey | emily.brown@gmail.com | |
Amit Balakrishnan Junior | Blk 999, Hougang Str 99 #01-221 Singapore 221999 | Amit_Bala@hotmail.sg | 91234567 |
Farhan Bin Musa | No 24, Jalan Segamat Selangor | F.musa@temp.com.my;farhanm@yahoo.com; | +60 11 12345678 |
Kit Ng | Blk 2 Tingkat 7 unit 02 Kawasan Perumahan Jalan Bukit Jalil Malaysia | kit-ng@src.com;ng_yee_long@gmail.com; | 020 700 11111 |
Alexander Bartholomew Desdemona | 2 Kitten St QLD | admin@sugar.com | +61400111222 |
Operating system used: ios
Operating system used: ios
Hi,
Just to expand on the suggestions from @tgb417. You can indeed use formula with regex to achieve this
To break down in case I did a concatenation of the first 2 characters, then using regex to replace all characters in the middle, and then also adding the last 2 characters based on the string length to get the final string.
The regex is using \w but if you want to replace spaces you can adapt the regex in the replace.
concat(slice(ip_address_country,0,2),replace(slice(ip_address_country,2,length(ip_address_country)-2),/\w/,"*"),slice(ip_address_country,(length(ip_address_country)-2),length(ip_address_country)))
Hope that helps.
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
DATAmask = dataiku.Dataset("DATAmask")
df = DATAmask.get_dataframe()
for i in range(0, df.count()[2]):
o=""
for s in df.at[i, 'email'].split(";"):
o+=";"+ s[0:2]+"*"*(len(s)-4)+s[-2:] if len(s) >2 else s
df.at[i, 'email'] = o[1:]
print(df)
Maskdata_df = df
# Write recipe outputs
Maskdata = dataiku.Dataset("Maskdata")
Maskdata.write_with_schema(Maskdata_df)
The output of this will be F.**************my;fa*************om.
So there are two way's I might consider doing something like this.
Hi,
Just to expand on the suggestions from @tgb417. You can indeed use formula with regex to achieve this
To break down in case I did a concatenation of the first 2 characters, then using regex to replace all characters in the middle, and then also adding the last 2 characters based on the string length to get the final string.
The regex is using \w but if you want to replace spaces you can adapt the regex in the replace.
concat(slice(ip_address_country,0,2),replace(slice(ip_address_country,2,length(ip_address_country)-2),/\w/,"*"),slice(ip_address_country,(length(ip_address_country)-2),length(ip_address_country)))
Hope that helps.
Hi AlexT, extremely grateful and really appreciate your explanation.
Will try it out!!
Kindest regards
Aminmin
Hi Tom, thank you for taking time to reply me.
I will read through your suggestion and try it together with AlexT's input.
Appreciate your help
Kindest regards
Aminmin
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
DATAmask = dataiku.Dataset("DATAmask")
df = DATAmask.get_dataframe()
for i in range(0, df.count()[2]):
o=""
for s in df.at[i, 'email'].split(";"):
o+=";"+ s[0:2]+"*"*(len(s)-4)+s[-2:] if len(s) >2 else s
df.at[i, 'email'] = o[1:]
print(df)
Maskdata_df = df
# Write recipe outputs
Maskdata = dataiku.Dataset("Maskdata")
Maskdata.write_with_schema(Maskdata_df)
The output of this will be F.**************my;fa*************om.
Dear Catalina S, my apologies for the late reply and thank you for your kind guidance.
Regards
Aminmin