We are trying to do a POC where we would like to extract a specific word, let us say "I need to get the number of hours worked by the employees" from a sentence in the text data.
Person 1 says : Hi, I have worked for 40 hrs last week.
Person 2 says : Hi, I was on leave for 2 days and so I have worked for 24 hours.
So from the text input I would like to get 40 hrs and 24 hours as output so that I can aggregate the total number of hours worked by them.
Can you give us an idea on how to fetch the exact content irrespective of the sentence format used and also let us know whether we achieve this either with NLP plugins or is there any other way?
DSS has the number extractor that is part of visual recipes.
However, for your example, it appears that you need more than just number extraction. It appears that you need some understanding of time units, days, hours... or even some understanding of language.
I'm wondering if there is a Python Library or R Package that is designed to extract time values from free text.
I found datefinder on GitHub for dates. This article "2 PACKAGES FOR EXTRACTING DATES FROM A STRING OF TEXT IN PYTHON" looks interesting. Dataiku can use code recipes to integrate snipits of Python and R code into flows.
However, that may not be correct for your use case. You may be actually wanting a time finder. And given your second example some NLP understanding.
I'm wondering if there are any NLP folks who know enough about things like spaCy and others of the ML tools to be able to comment.