The Dataiku Frontrunner Awards have just launched to recognize your achievements! Submit Your Entry

Extract text from html stored in column

UserBird
Dataiker
Dataiker
Extract text from html stored in column
How would one extract the text and strip all the html. parseHTML() gives me just the html back, and htmlText() gives me the html as text (no brackets)
1 Reply
cperdigou
Dataiker
Dataiker
Object functions of the formula language have some more advanced capabilities, https://doc.dataiku.com/dss/latest/advanced/formula.html?highlight=parsehtml#object-functions

To do better you will need to use code, the easiest is to use Python, the package BeautifulSoup will help you.
0 Kudos
Labels (3)
A banner prompting to get Dataiku DSS