Question Answering

deeplearnyogi
Level 2
Question Answering

However, my dataset includes the questions and answers and I want the model to learn the answer based on the question.

Do you have any advice to develop models to take the question as input and set the question as the target even without a body context?

0 Kudos
4 Replies
Alex_Combessie
Dataiker Alumni

Hi,

There are many code libraries for NLP and specifically question answering. Any of them can be used in DSS, as long as they are written in Python, R or Scala. Actually, using shell recipes, you can call any executables installed on your server.

Among libraries which can be useful, I would personally list:

- Tensorflow (https://tfhub.dev/s?module-type=text-question-answering&subtype=module,placeholder)

- PyTorch (https://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.html)

- HuggingFace (https://huggingface.co/transformers/usage.html#extractive-question-answering)

This list is not exhaustive, as this is a very dynamic area of development in the research / open source community.

The general approach is to start with one of the many pretrained models for question answering, and fine tune it on your dataset.

Hope it helps,

Alex

0 Kudos
CoreyS
Dataiker Alumni

You may find this resource helpful in terms of NLP, the first Online Event: NLP Used for Prediction

Hopefully this helps as well!

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos
deeplearnyogi
Level 2
Author

I was able to download a pre-trained BERT large model uncased into the tensorflow environment and was able to generate answers based on question and paragraph. However my question is fine tuning BERT to my own datasets so it can learn from it. My question is how do I load a pre-trained model into DSS so I can integrate with the datasets.

I tried to fine tune with a dataset and I get the following errors: 

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[6,16,100,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node model/tf_bert_model_1/bert/encoder/layer_._8/attention/self/Softmax (defined at /home/vinhdiesal/anaconda3/lib/python3.7/site-packages/transformers/modeling_tf_bert.py:251) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Is this a memory problem with my GPU, I have an RTX 2080 TI Founder's edition  which has 11GB of ram. 

0 Kudos
Alex_Combessie
Dataiker Alumni

Hi,

Could you please detail where the error is happening? If this is a recipe, please attach a job diagnosis (https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html).

If this is a Machine Learning visual analysis, please attach the training log (https://doc.dataiku.com/dss/latest/troubleshooting/problems/ml-train-fails.html).

Best regards,

Alex

0 Kudos