Microsoft Agent Framework: Multi-Turn Conversation

https://www.pexels.com/photo/smiling-woman-talking-via-laptop-in-kitchen-4049992/
https://www.pexels.com/photo/smiling-woman-talking-via-laptop-in-kitchen-4049992/

This blog explores how to build a multi-turn conversation agent using the Microsoft Agent Framework (MAF), with the ability to remember and resume past conversations.

A typical deployment of a conversational agent involves handling multiple exchanges with users. The is a front-end interface that captures user inputs and displays the agent's responses. Behind the scenes, the agent processes these inputs, generates responses, and maintains the context of the conversation. For every turn in the conversation, the agent needs to remember what has been discussed so far to provide coherent and contextually relevant replies. To achieve this, we can leverage MAF's capabilities to serialize and deserialize the conversation state.


The serialization of state is required because a new instance of the agent is created for each turn in the conversation. By serializing the state after each response, we can capture the entire context of the conversation up to that point. When a user sends a follow-up message, we can deserialize the saved state to restore the conversation context, allowing the agent to respond appropriately based on the previous exchanges.

Let look at how to implement this using MAF. We will create a custom chat message store that can serialize and deserialize the conversation state. This store will be used by the agent to maintain the context across multiple turns.

import asyncio
from typing import Any, Sequence

from agent_framework import ChatMessage, ChatMessageStoreProtocol

from maf_workflow.hosting import container
from maf_workflow.protocols.i_azure_open_ai_chat_client_service import (
    IAzureOpenAIChatClientService,
)

chat_client = container[IAzureOpenAIChatClientService].get_client()
SYSTEM_PROMPT = (
    "You are a helpful assistant that helps to answer questions around generativeAI."
)


class ChatMessageStore(ChatMessageStoreProtocol):
    # for convenience, we just store the messages in memory.
    # the proper way is to store it in redis.
    def __init__(self):
        self.messages: list[ChatMessage] = []

    # this is called every time when the agent run
    # see the code below `agent.run(...)`
    async def list_messages(self) -> list[ChatMessage]:
        return self.messages

    # this is called before and after response generation by the agent
    async def add_messages(self, messages: Sequence[ChatMessage]) -> None:
        for msg in messages:
            self.messages.append(msg)

    # this is called when we restore the thread context.
    # `resume_thread = await agent.deserialize_thread(chat_data)`
    @classmethod
    async def deserialize(
        cls, serialized_store_state: Any, **kwargs: Any
    ) -> "ChatMessageStore":
        instance = ChatMessageStore()
        instance.messages = [
            ChatMessage.from_dict(msg_dict)
            for msg_dict in serialized_store_state.get("messages", [])
        ]
        return instance

    async def update_from_state(
        self, serialized_store_state: Any, **kwargs: Any
    ) -> None:
        self.messages += serialized_store_state.messages

    # to serialize the thread
    async def serialize(self, **kwargs: Any) -> Any:
        return {"messages": self.messages}


async def multi_turns():
    agent = chat_client.create_agent(
        system_prompt=SYSTEM_PROMPT,
        chat_message_store_factory=lambda: ChatMessageStore(),
    )
    thread1 = agent.get_new_thread()  # let's create a conversation thread

    message = input("User: ")
    while message.lower() not in ["exit", "quit"]:
        response = await agent.run(message, thread=thread1)
        print(f"Assistant: {response}")
        message = input("User: ")

    chat_data = await thread1.serialize()  # serialize the conversation thread

    # resume the conversation
    # deserialize function is called
    resume_thread = await agent.deserialize_thread(chat_data)

    print("\nResuming the conversation...\n")
    message = input("User: ")
    while message.lower() not in ["exit", "quit"]:
        response = await agent.run(message, thread=resume_thread)
        print(f"Assistant: {response}")
        message = input("User: ")


if __name__ == "__main__":
    asyncio.run(multi_turns())
```
You can found the source code in https://github.com/dennisseah/maf-workflow

As framework are evolving and improving, implementation becomes simpler. :-)

To help you understand better, I have recorded a video where I step through the code. Please watch it in full screen and there are no audio.



 



Comments