Interacting with a language model#

This short video shows a user interacting with a Large Language Model (LLM). In this case it is the famous GPT-3. This LLM is a generative model because it is able to produce novel texts based on a prompt, i.e. a description of the requested task. Generative models are so good, especially in the fluidity and naturalness of their texts, that their output is generally not distinguishable from human generated texts, at least by untrained evaluators (see here an interesting study on evaluation). Generative models are the BIG thing in artificial intelligence in 2022.

For this video I wrote a simple command line interface. It allows me to access GPT-3 comfortably and interactively from my computer. Once this is possible, you can integrate it in larger application. Anybody with a couple of hours of python practice can do it. Or, for demo purposes, you can simply head to the GTP-3 webpage and use their easy to use UI.

I asked the model to perform some advanced linguistic actions. In particular:

write a short blog post about the role of translation in society
write 10 tweets about how to become a translator
translate a sentence between 2 languages

It is worth understanding what Large Language Models really are, how they work, and what their limitations are in the face of these amazing results. Let’s have a look.

The most interesting thing to know about language models is that they are not software programmed to do anything specific. Rather, they are a mathematical representation of language which is learned by absorbing an incredible amount of text in one or more languages. A mathematical representation means that the language model is - in principle - nothing more than word statistics. So, it essentially works on the surface of the language without knowing what those words really mean in the real world. This is a subtle distinction.

Large Language Models are complex systems and complex systems can exhibit emergent behaviors, i.e. something that is a non-obvious new ability that was not intentionally taught to the model. For example, many think that consciousness is an emergent phenomenon (epiphenomenon) of the biological activities that take place in the human brain (i.e. the brain has not been ‘programmed’ to generated consciousness). Similarly, LLM demonstrates capabilities beyond what you would expect given the data they were trained on. To be clear, while LLMs are fascinating, they are (now) not conscious (read this article about a Google scientist who claims a bot is sentient). Yet they demonstrate skills that they were not directly trained to learn. That’s remarkable.

Take the ability to translate: Unlike traditional machine translation engines, which are specifically trained to perform this task using parallel data, i.e. texts and their translations, LLMs are not explicitly trained for this type of activity. However, if you are asked to do a translation, the translation is simply done (see my video).

Even more interesting, LLMs don’t expect clear commands to perform this task. Instead, they use what we call prompts. Prompts are not codified commands like you would expect for traditional software. They’re rather task expressed in natural language, and there aren’t even clear restrictions on how you can express them.

While the results shown in the video are remarkable in terms of text quality and adequacy, there is no doubt that LLM has many limitations. Since language models do not really know much about the real world, they can be factually incorrect (say things that don’t make sense and are potentially harmful, see for example the infamous message Alexa said to a kid). They can uncover issues like gender and racial bias, use of inappropriate language, etc. For these reasons, an ‘unfiltered’ use of such systems is not possible or recommended. Their output needs to be filtered by computational means (see for example how Cicero, the META’s latest AI system that achieves spectacular results in the game of Diplomacy, uses filters to avoid nonsensical dialogue, or how the latest ChatGTP answers followup questions, admits its mistakes, challenges incorrect premises, and rejects inappropriate requests) or by human-in-the loop editing (see this article on the augmented copy editor).

The usefulness of Large Language Models lies in the ability to integrate them into complex applications to perform complex tasks involving languages. Many applications are still waiting to be invented.