Join us on Wednesday, June 3rd for a deep dive into Customer Predictive Analytics Learn more

Question Answering

Level 2
Question Answering

However, my dataset includes the questions and answers and I want the model to learn the answer based on the question.

Do you have any advice to develop models to take the question as input and set the question as the target even without a body context?

0 Kudos
4 Replies
Dataiker
Dataiker

Hi,

There are many code libraries for NLP and specifically question answering. Any of them can be used in DSS, as long as they are written in Python, R or Scala. Actually, using shell recipes, you can call any executables installed on your server.

Among libraries which can be useful, I would personally list:

- Tensorflow (https://tfhub.dev/s?module-type=text-question-answering&subtype=module,placeholder)

- PyTorch (https://pytorch.org/tutorials/beginner/deep_learning_nlp_tutorial.html)

- HuggingFace (https://huggingface.co/transformers/usage.html#extractive-question-answering)

This list is not exhaustive, as this is a very dynamic area of development in the research / open source community.

The general approach is to start with one of the many pretrained models for question answering, and fine tune it on your dataset.

Hope it helps,

Alex

0 Kudos
Community Manager
Community Manager

You may find this resource helpful in terms of NLP, the first Online Event: NLP Used for Prediction

Hopefully this helps as well!

Don't forget to mark as "Accepted Solution" when someone provides the correct answer to your question.
0 Kudos
Level 2
Author

I was able to download a pre-trained BERT large model uncased into the tensorflow environment and was able to generate answers based on question and paragraph. However my question is fine tuning BERT to my own datasets so it can learn from it. My question is how do I load a pre-trained model into DSS so I can integrate with the datasets.

I tried to fine tune with a dataset and I get the following errors: 

ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted:  OOM when allocating tensor with shape[6,16,100,100] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node model/tf_bert_model_1/bert/encoder/layer_._8/attention/self/Softmax (defined at /home/vinhdiesal/anaconda3/lib/python3.7/site-packages/transformers/modeling_tf_bert.py:251) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Is this a memory problem with my GPU, I have an RTX 2080 TI Founder's edition  which has 11GB of ram. 

0 Kudos
Dataiker
Dataiker

Hi,

Could you please detail where the error is happening? If this is a recipe, please attach a job diagnosis (https://doc.dataiku.com/dss/latest/troubleshooting/problems/job-fails.html).

If this is a Machine Learning visual analysis, please attach the training log (https://doc.dataiku.com/dss/latest/troubleshooting/problems/ml-train-fails.html).

Best regards,

Alex

0 Kudos