Extract text from html stored in column
UserBird
Dataiker, Alpha Tester Posts: 535 Dataiker
How would one extract the text and strip all the html. parseHTML() gives me just the html back, and htmlText() gives me the html as text (no brackets)
Tagged:
Answers
-
Object functions of the formula language have some more advanced capabilities, https://doc.dataiku.com/dss/latest/advanced/formula.html?highlight=parsehtml#object-functions
To do better you will need to use code, the easiest is to use Python, the package BeautifulSoup will help you. -
htmlText(parseHtml(field to parse)) worked for me