The India User Group is live! Be a part of our first Indian user event: JOIN THE EVENT

Text extraction with plugins

pranauv
Level 1
Text extraction with plugins

Hi,

We are trying to do a POC where we would like to extract a specific word, let us say "I need to get the number of hours worked by the employees" from a sentence in the text data.
For example:
Person 1 says : Hi, I have worked for 40 hrs last week.
Person 2 says : Hi, I was on leave for 2 days and so I have worked for 24 hours. 

So from the text input I would like to get 40 hrs and 24 hours as output so that I can aggregate the total number of hours worked by them.

Can you give us an idea on how to fetch the exact content irrespective of the sentence format used and also let us know whether we achieve this either with NLP plugins or is there any other way?  

0 Kudos
1 Reply
tgb417
Neuron
Neuron

@pranauv,

DSS has the number extractor that is part of visual recipes.

Visual Recipie Number Extractor.jpg

However, for your example, it appears that you need more than just number extraction. It appears that you need some understanding of time units, days, hours... or even some understanding of language.

I'm wondering if there is a Python Library or R Package that is designed to extract time values from free text.  

I found datefinder on GitHub for dates. This article "2 PACKAGES FOR EXTRACTING DATES FROM A STRING OF TEXT IN PYTHON" looks interesting.  Dataiku can use code recipes to integrate snipits of Python and R code into flows.

However, that may not be correct for your use case.  You may be actually wanting a time finder. And given your second example some NLP understanding.

I'm wondering if there are any NLP folks who know enough about things like spaCy and others of the ML tools to be able to comment.

 

--Tom
0 Kudos
A banner prompting to get Dataiku DSS
Public