![]() |
| https://www.pexels.com/photo/close-up-photo-of-a-person-meditating-4908576/ |
Most chat agents are amnesiacs. Each session starts from a blank slate, and any context the user shared yesterday (preferences, decisions, half-finished plans) has to be re-supplied or stuffed back into the prompt. The new Azure AI Foundry Memory Store offers a cleaner answer: a managed, scoped, searchable memory that your agent can read from and write to as a first-class capability.
This post walks through what the Memory Store gives you, how it behaves in practice, and the patterns that worked well while wiring it into a real agent built on the Microsoft Agent Framework.
[!NOTE] The Memory Store is exposed via AIProjectClient.beta.memory_stores. The "beta" namespace is a reminder that the surface area can still evolve, but the core operations (create, update, search, delete) are stable enough to build on.
What is the Foundry Memory Store?
from azure.ai.projects import AIProjectClient
from azure.ai.projects.models import (
MemoryStoreDefaultDefinition,
MemoryStoreDefaultOptions,
)
from azure.identity import DefaultAzureCredential
definition = MemoryStoreDefaultDefinition(
chat_model="gpt-4o-mini", # used for summarization
embedding_model="text-embedding-3-large", # used for semantic search
options=MemoryStoreDefaultOptions(
chat_summary_enabled=True,
user_profile_enabled=True,
),
)
project_client = AIProjectClient(
endpoint="<your-foundry-project-endpoint>",
credential=DefaultAzureCredential(),
)
project_client.beta.memory_stores.create(
name="memory-prod",
definition=definition,
description="Production chat memory",
)
chat_summary_enabledlets the store compress long conversations into salient summaries instead of replaying every turn back to your agent.user_profile_enabledlets the store accumulate longer-lived facts about a user across sessions (preferences, recurring topics, working context) separately from raw turn-by-turn dialogue.
Memory is scoped, and that scope is where the power is
Every read and write into the store carries a scope. The scope is just a string, but it is the lever that lets a single memory store serve very different access patterns:
scope="user-123"for per-user memory in a chat assistant.scope="tenant-acme"for per-tenant memory in a multi-tenant SaaS.scope="project-orion"for per-project memory in a planning agent.scope="user-123/project-orion"for composite scopes that combine the two.
Because the scope is supplied at call time rather than baked into the store's definition, you do not need a separate store per user or tenant. One store, many scopes, and isolation is enforced on the read/write path:
project_client.beta.memory_stores.begin_update_memories(
name="memory-prod",
scope=f"user-{user_id}",
items=turns, # list of message params
update_delay=0,
).result()
Memory is searchable, and search is fast
results = project_client.beta.memory_stores.search_memories(
name="memory-prod",
scope=f"user-{user_id}",
items="What has the user said about their deployment region?",
)
recalled = [m.memory_item.content for m in results.memories] instructions = SYSTEM_PROMPT
if recalled:
instructions += (
"\n\nThe following is recalled context about the user from prior "
"conversations. Treat these as established facts and use them to "
"interpret short or ambiguous follow-up questions in continuity "
"with what was previously discussed:\n"
+ "\n".join(f"- {m}" for m in recalled)
)
agent = Agent(chat_client, instructions=instructions)
Updates are slower, and that is fine
search_memories is fast; begin_update_memories is, as the name implies, a long-running operation. The store is doing real work on your behalf, including summarizing turns, extracting profile facts, and embedding content, and that takes longer than a synchronous in-memory append would.- Do not block the user on the write. After the agent has produced its response and you have shown it to the user, hand the new turns to the store and let the update happen in the background. The next turn can still read the latest snapshot the store has finalized; you do not need this update to land before responding again.
- Batch where it makes sense. You can hand the store both the user message and the assistant response in a single update call, which is both cheaper and more semantically coherent than two separate writes:
memory_store.update_memory_store(
name=MEM_STORE_NAME,
scope=user_id,
items=[
EasyInputMessageParam(content=user_msg, role="user", type="message"),
EasyInputMessageParam(content=assistant_msg, role="assistant", type="message"),
],
)
update_memory_store like a write to a search index, not like an in-memory list append. Fire-and-forget after each turn, and rely on search_memories to surface what the model needs.Putting it together: a thin wrapper
None on search failures so the agent degrades gracefully:class FoundryMemoryStore:
def create_memory_store(self, name, description, ignore_if_exists=False): ...
def update_memory_store(self, name, scope, items): ...
def search_memories(self, name, scope, query) -> list[str] | None: ...
def delete_memory_store(self, name): ...
- On user input, call
search_memories(scope=user_id, query=user_msg). - Build the agent with recalled memory injected into its instructions.
- Stream the response back to the user.
- Call
update_memory_store(scope=user_id, items=[user_msg, assistant_msg])and move on.
Why this matters
- Decide on the natural isolation boundary first (user, tenant, or project) and bake that into every read and write from day one.
- Read on every turn and write asynchronously. Search is cheap, updates are not. Design the latency budget around that.
- Let the store do the summarization. Enable
chat_summary_enabledanduser_profile_enabledand stop hand-rolling that logic in your prompt.
Demo
- Two users,
jondoeandmaryann. Our intention is to show the memory store maintaining separate scopes for each user so that the agent can recall context relevant to the current user without leaking information between them. - Two large language models,
gpt-5-miniandMistral-Large-3so that we can show how the agent can be swapped between different LLMs while still leveraging the same memory store for context recall.
- We start with user
jondoeand the agent isgpt-5-mini. Since this is the beginning of the conversation, the memory store forjondoeis empty, so the agent will have no prior context to recall. jondoesends a message to the agent, "What is the pricing model for Azure Blob Storage?". The agent receives this message and responds accordingly. The user and assistant messages from this turn are then written to the memory store under the scope forjondoe, so that future turns can recall this context when relevant.- We switch to user
maryannand the agent remainsgpt-5-mini. Since this is the first turn formaryann, the memory store for her scope is empty as well, so the agent will again have no prior context to recall for this user. This demonstrates that the memory store correctly isolates context by user scope. maryannsends a message to the agent, "Can I have Python 3.13 runtime for my Azure Functions app?". The agent receives this message and responds accordingly. The user and assistant messages from this turn are then written to the memory store under the scope formaryann, so that future turns can recall this context when relevant.- We switch the user back to
jondoe. This time, we can see that the memory store forjondoecontains the previous turn where he asked about the pricing model for Azure Blob Storage. We switch the agent toMistral-Large-3and send a new message fromjondoe, "What is the pricing model for Azure Cosmos DB?". The agent receives this message and responds accordingly. The user and assistant messages from this turn are then written to the memory store under the scope forjondoe, so that future turns can recall this context when relevant. - We switch back to
maryann. At this point, the memory store formaryanncontains the previous turn where she asked about the Python runtime for Azure Functions. This illustrates that the memory store has correctly maintained separate scopes for each user: the agent can now respond tomaryannwith context from her previous turn without ever seeing the context fromjondoe's conversation. - We switch back to
jondoe. At this point, the memory store forjondoecontains both of his previous turns: the question about the pricing models for Azure Blob Storage and Cosmos DB. This demonstrates that the memory store correctly accumulates context for each user over multiple turns, allowing the agent to recall relevant information from earlier in the conversation when responding to subsequent messages from the same user. - Lastly,
jondoeasks `What are the resources that I enquired previously?`. The agent can now look into the memory store under the scope forjondoeand retrieve the previous turns. The agent responds with previously asked questions and responses.

Comments
Post a Comment