Zero-Shot Text Classification

Stojancho Tudjarski
Netcetera Tech Blog
3 min readAug 30, 2020

--

Image by Gerd Altmann from Pixabay

Introduction: The Stone-Age AI Era

In the beginning, in the AI stone-age era, you take several thousands of observations and train an old-style fully-connected neural network, And, if you were happy, you would have something useful. That time of training our own neural networks is done a long time ago, measured in AI-speed time frames. Next step: transfer learning.

Transfer Learning

Several years ago, the general availability of Internet-connected machines and the clouds made it possible to gain access over a massive amount of data. This data can be used for training neural networks for various purposes. And that exactly happened. Over the last several years, it is a trend to empower transfer-learning. That means that you take some general-purpose neural network trained over several hundreds of thousands/millions of observations, then fine-tune it for more specific tasks using only several hundreds/thousands of task-related observations.

Example: http://image-net.org/ and neural network for image classification trained with 14.197.122 images.

Using ImageNet neural network as a starting point, and fine-tune it with several tens of images per person, it was possible several years ago to train a neural network with capacity for face recognition with an accuracy of approximately 75%. I know, not that wow. When I run this neural network against my image, it classified me appropriately, with the top-ranked probability for me for 0.33. The second-ranked class was rhododendron, with a probability of 0.31. That means that the neural network wasn’t quite sure what is it talking about. But, ok, that was a long time ago.

One-Shot Learning

The next logical step in the AI (r)evolution was one-shot learning. We have one well-trained neural network, then we use its already gained knowledge and fine-tune it with one image only. That opens the possibility to implement a face recognition application that recognizes faces of persons which images we used for fine-tuning, with an accuracy of almost 90%. That was something … Ok, it made few mistakes, but that were situations when we had an image of a person with a beard for training, and we try to recognize the same person without a beard. I would make the same mistake.

The keword here is FaceNet:

Zero-Shot Learning

And finally, few months ago, I read for zero-shoot learning for the first time. Zero shot learning is the approach when the neural network is forced to make classification for classes it was never trained for.

That is possible in NLP due to the latest huge breakthrough from the last year: BERT. Simplified, it is a general-purpose language model trained over a massive amount of text corpora and available as pre-trained for various languages. Now, HuggingFace made it possible to use it for text classification on a zero shoot learning way of doing it:

That is possible due to the well-trained BERT model as a general-purpose language model. It is capable of connecting texts that are subjects for classification, with the classes described with one or a few words. That information is simply bound in its weights (a LOT of decimal numbers).

from transformers import pipeline
classifier = pipeline('zero-shot-classification',
model='facebook/bart-large-mnli')
hypothesis_template = 'This text is about {}.' # the template used in this demo
classifier(sequence, labels,
hypothesis_template=hypothesis_template,
multi_class=multi_class)
{'sequence' …, 'labels': …, 'scores': …}

Enjoy playing with it. I did 🙂

--

--

ML and AI enthusiast, learning new things all the time and looking at how to make something useful with them.