It would be very useful to know why a model classified a particular text into a particular class(es), for e.g., the model made this decision because of so and so tokens/words/phrases. Refer attachment.
The challenge would be to have this feature even if I use, for e.g. Universal Sentence Encoder (for feature extraction) and SVM for classification, where the classifier does not have access to the tokens seen during training. โ

โ