Since generative AI catapulted into the mainstream at the end of 2022, many people have gained a basic understanding of the technology and how it uses natural language to help us interact more easily with computers. Some of us can even throw around buzzwords like “prompts” and “machine learning” over coffee with friends.
But as AI continues to evolve, so does its lexicon. Do you know the difference between large and small language models? Or what the “GPT” stands for in ChatGPT? Or what a RAG has to do with cleaning up fabrications? We’re here to help with a next-level breakdown of AI terms to get you up to speed.
Reasoning/Planning
Computers using AI now can solve problems and accomplish tasks by employing patterns they’ve learned from historical data to make sense of information, something akin to reasoning. The most advanced systems are showing the ability to go a step further, tackling increasingly complex problems by creating plans, devising a sequence of actions to reach an objective.
Training/inference
To create and use an AI system, there are two steps: training and inference. Training is sort of like an AI system’s education, when it is fed a dataset and learns to perform tasks or make predictions based on that data. For example, it might be given a list of prices for homes recently sold in a neighborhood, along with the number of bedrooms and bathrooms in each and a multitude of other variables. During training, the system adjusts its internal parameters, which are values that determine how much weight to give each of those factors in influencing pricing. Inference is when it uses those learned patterns and parameters to come up with a price prediction for a new home about to go on the mark.
SLM/small language model
Small language models, or SLMs, are pocket-sized versions of large language models (LLMs). They both use machine learning techniques to help them recognize patterns and relationships so they can produce realistic, natural language responses. But while LLMs are enormous and need a hefty dose of computational power and memory, SLMs such as Phi-3 are trained on smaller, curated datasets and have fewer parameters, so are more compact and can even be used offline, without an internet connection. That makes them great for apps on devices like a laptop or phone.
Grounding
Generative AI systems can compose stories, poems and jokes, as well as answer research questions. But sometimes they face challenges separating fact from fiction, or their training data is outdated, and then they can give inaccurate responses referred to as hallucinations. Developers work to help AI interact with the real world accurately through the process of grounding, which is when they connect and anchor their model with data and tangible examples to improve accuracy and produce more contextually relevant and personalized output.
Retrieval Augmented Generation (RAG)
When developers give an AI system access to a grounding source to help it be more accurate and current, they use a method called Retrieval Augmented Generation, or RAG. The RAG pattern saves time and resources by adding extra knowledge without having to retrain the AI program. For instance, if you’ve got a clothing company and want to create a chatbot that can answer questions specific to your merchandise, you can use the RAG pattern over your product catalog to help customers find the perfect green sweater from your store.
Orchestration
AI programs have a lot on their plate as they process people’s requests. The orchestration layer is what steers them through all their tasks in the right order to get to the best response. If you ask Microsoft Copilot who Ada Lovelace is, for example, and then ask it when she was born, the AI’s orchestrator is what stores the chat history to see that the “she” in your follow-up query refers to Lovelace. The orchestration layer can also follow a RAG pattern by searching the internet for fresh information to add into the context and help the model come up with a better answer.
Memory
Today’s AI models don’t technically have memory. But AI programs can have orchestrated instructions that help them “remember” information by following specific steps with every single transaction — such as temporarily storing previous questions and answers in a chat and then including that context in the current request of the model, or using grounding data from the RAG pattern to make sure the response has the most current information. Developers are experimenting with the orchestration layer to help AI systems know if they need to temporarily remember a breakdown of steps, for example — short-term memory, like jotting a reminder on a sticky note — or if it would be useful to remember something for a longer period of time by storing it in a more permanent location.
Transformer models and diffusion models
People have been teaching AI systems to understand and generate language for decades, but one of the breakthroughs that accelerated recent progress was the transformer model. Among generative AI models, transformers are the ones that understand context and nuance the best and fastest. They’re eloquent storytellers, paying attention to patterns in data and weighing the importance of different inputs to help them quickly predict what comes next, which enables them to generate text. A transformer’s claim to fame is that it’s the T in ChatGPT — Generative Pre-trained Transformer. Diffusion models, generally used for image creation, add a twist by taking a more gradual and methodical journey, diffusing pixels from random positions until they’re distributed in a way that forms a picture asked for in a prompt. Diffusion models keep making small changes until they create something that works.
Frontier models
Frontier models are large-scale systems that push the boundaries of AI and can perform a wide variety of tasks with new, broader capabilities. They can be so advanced that they sometimes surprise us with what they’re able to accomplish. Tech companies including Microsoft formed a Frontier Model Forum to share knowledge, set safety standards and help everyone understand these powerful AI programs to ensure safe and responsible development.
GPU
A GPU, which stands for Graphics Processing Unit, is basically a turbocharged calculator. GPUs were originally designed to smooth out fancy graphics in video games, and now they’re the muscle cars of computing. The chips have lots of tiny cores, or networks of circuits and transistors, that tackle math problems together, called parallel processing. Since that’s basically what AI is — solving tons of calculations at massive scale to be able to communicate in human language and recognize images or sounds — GPUs are indispensable for AI tools for both training and inference. In fact, today’s most advanced models are trained using enormous clusters of interconnected GPUs — sometimes numbering tens of thousands spread across giant data centers — like those Microsoft has in Azure, which are among the most powerful computers ever built.