Back in the early 1990s, I worked in a large telecoms research lab, as part of the Advanced Local Loop group. Our problem domain was the “last mile”—getting services to peoples’ homes. One of my research areas involved thinking about what might happen when the network shift from analog to digital services was complete.
I spent a great deal of time in the lab’s library, contemplating what computing would look like in a future of universal bandwidth. One of the concepts that fascinated me was ubiquitous computing, where computers disappear into the background and software agents become our proxies, interacting with network services on our behalf. That idea inspired work at Apple, IBM, General Magic, and many other companies.
One of the pioneers of the software agent concept was MIT professor Pattie Maes. Her work crossed the boundaries between networking, programming, and artificial intelligence, and focused on two related ideas: intelligent agents and autonomous agents. These were adaptive programs that could find and extract information for users and change their behavior while doing so.
It has taken the software industry more than 30 years to catch up with that pioneering research, but with a mix of transformer-based large language models (LLMs) and adaptive orchestration pipelines, we’re finally able to start delivering on those ambitious original ideas.
Semantic Kernel as an agent framework
Microsoft’s Semantic Kernel team is building on OpenAI’s Assistant model to deliver one kind of intelligent agent, along with a set of tools to manage calling multiple functions. They’re also providing a way to manage the messages sent to and from the OpenAI API, and to use plugins to integrate general purpose chat with grounded data-driven integrations using RAG.
The team is starting to go beyond the original LangChain-like orchestration model with the recent 1.01 release and is now thinking of Semantic Kernel as a runtime for a contextual conversation. That requires a lot more management of the conversation and prompt history used. All interactions will go through the chat function, with Semantic Kernel managing both inputs and outputs.
There’s a lot going on here. First, we’re seeing a movement towards an AI stack. Microsoft’s Copilot model is perhaps best thought of as an implementation of a modern agent stack, building on the company’s investment in AI-ready infrastructure (for inference as well as training), its library of foundation models, all the way up to support for plugins that work across Microsoft’s and OpenAI’s platforms.
The role of Semantic Kernel plugins
One key aspect of recent updates to Semantic Kernel simplifies chats with LLM user interfaces, as there’s no longer any need to explicitly manage histories. That’s now handled by Semantic Kernel as soon as you define the AI services you’ll use for your application. The result is code that’s a lot easier to understand, abstracted away from the underlying model.
By managing conversation state for you, Semantic Kernel becomes the agent of context for your agent. Next it needs a way of interacting with external tools. This is where plugins add LLM-friendly descriptions to methods. There’s no need to do more than add this metadata to your code. Once it’s there, a chat can trigger actions via an API, such as turning up the heat using a smart home platform like Home Assistant.
When you add a plugin to the Semantic Kernel kernel object, it becomes available for chat-based orchestration. The underlying LLM provides the language understanding necessary to run the action associated with the most likely plugin description. That ensures that users running your agent don’t need to be tediously accurate. A plugin description “Set the room temperature” could be triggered by “Make the room warmer” or “Set the room to 17C.” Both indicate the same intent and instruct Semantic Kernel to call the appropriate method.
Alternatively, you will be able to use OpenAI plugins, support for which is currently experimental. These plugins use OpenAPI specifications to access external APIs, with calls tied to semantic descriptions. The semantic description of an API call allows OpenAI’s LLMs to make the appropriate call based on the content of a prompt. Semantic Kernel can manage the overall context and chain calls to a series of APIs, using its own plugins and OpenAI plugins. Semantic Kernel can even mix models and use them alongside its own semantic memory, using vector searches to ground the LLM in real-world data.
Microsoft’s work here takes the language capabilities of a LLM and wraps it in the context of the user, data, and API. This is where it becomes possible to start calling Semantic Kernel a tool for constructing intelligent agents, as it uses your prompts and user chats to dynamically orchestrate queries against data sources and internet-hosted resources.
Can LLM-based agents be autonomous?
Another set of Semantic Kernel functions begins to implement a form of autonomy. This is where things get really interesting, because by managing context our Semantic Kernel agent can select the appropriate plugins from its current library to deliver answers.
Here we can take advantage of Semantic Kernel’s planners to create a workflow. The recently released Handlebars planner can dynamically generate an orchestration that includes loops and conditional statements. When a user creates a task in a chat, the planner creates an orchestration based on those instructions, calling plugins as needed to complete the task. Semantic Kernel draws on only those plugins defined in your kernel code, using a prompt that ensures that only those plugins are used.
There are issues with code that operates autonomously. How can you be sure that it remains grounded, and avoids inaccuracies and errors? One option is to work with the Prompt Flow tool in Azure AI Studio to build a test framework that evaluates the accuracy of your planners and plugins. It’s able to use a large array of benchmark data to determine how your agent works with different user inputs. You may need to generate synthetic queries to get enough data, using an LLM to produce the initial requests.
Microsoft’s Copilots are an example of intelligent agents in action, and it’s good to see the Semantic Kernel team using the term. With more than 30 years of research into software agents, there’s a lot of experience that can be mined to evaluate and improve the results of Semantic Kernel orchestration, and to guide developers in building out the user experiences and framings that these agents can offer.
Intelligent agents 30 years on
It’s important to note that Semantic Kernel’s agent model differs from the original agent concept in one significant way: You are not sending out intelligent code to run queries on remote platforms. But in the last 30 years or so, we’ve seen a major revolution in distributed application development that has changed much of what’s needed to support agent technologies.
The result of this new approach to development is that there’s no longer any need to run untrusted, arbitrary code on remote servers. Instead, we can take advantage of APIs and cloud resources to treat an agent as an orchestrated workflow spanning distributed systems. Further, that agent can intelligently reorganize that orchestration based on previous and current operations. Modern microservices are an ideal platform for this, building on service-oriented architecture concepts with self-documenting OpenAPI and GraphQL descriptions.
This seems to be the model that Semantic Kernel is adopting, by providing a framework to host those dynamic workflows. Mixing API calls, vector searches, and OpenAI plugins with a relatively simple programmatic scaffolding gives you a way to construct a modern alternative to the original agent premise. After all, how could we distinguish benign agents from malware? In 1994 computer viruses were a rare occurrence, and network attacks were the stuff of science fiction.
Today we can use OpenAPI definitions to teach LLMs how to query and extract data from trusted APIs. All of the code needed to make those connections is delivered by the underlying AI: All you need is a prompt and a user question. Semantic Kernel provides the prompts, and delivers the answers in natural language, in context with the original question.
You can think of this as a modern approach to realizing those early agent concepts, running code in one place in the cloud, rather than on many different systems. Using APIs reduces the load on the systems that provide information to the agent and makes the process more secure.
As these technologies evolve, it’s important not to treat them as something totally new. This is the result of decades of research work, work that’s finally meeting its intended users. There’s a lot in that research that could help us deliver reliable, user-friendly, intelligent agents that serve as our proxies in the next-generation network—much as those initial researchers intended back in the 1990s.