Antigravity Keeps Forgetting Context? Add Persistent Memory

Nicola·May 11, 2026

Antigravity Keeps Forgetting Context? Add Persistent Memory

You're 45 minutes into a debugging session with Antigravity. You've explained the architecture, walked through the failing test, identified the root cause, and started implementing the fix. Then Antigravity suggests a change that contradicts everything you discussed 20 minutes ago. It's referencing a pattern you explicitly ruled out. It's forgotten the constraint you mentioned three times.

This is the context degradation problem, and it gets worse the longer you work. The tool that's supposed to help you becomes less helpful with every message.

Every AI coding assistant suffers from this to some degree. But in Antigravity, the problem is particularly frustrating because the tool positions itself as a deep reasoning engine — and deep reasoning requires remembering what was already reasoned about.

The Context Degradation Problem

Context degradation is not a bug. It's a physics constraint. Every AI assistant operates within a finite context window — a fixed number of tokens it can hold in working memory at any given moment. For most models powering Antigravity, that's somewhere between 128K and 200K tokens.

That sounds like a lot until you do the math. A typical coding session generates 2,000-4,000 tokens per exchange (your message plus the AI's response). A 45-minute session with active back-and-forth easily generates 30-50 exchanges, consuming 60,000-200,000 tokens. Once you hit the ceiling, something has to give.

What gives is your earlier context. The system compresses, summarizes, or outright drops messages from the beginning of the conversation to make room for new ones. That detailed architecture explanation you provided at minute 5? Compressed to a one-line summary by minute 30. Gone entirely by minute 45.

How Compression Destroys Nuance

The compression isn't random — it's algorithmically prioritized. Recent messages are kept in full. Older messages are summarized. The problem is that summaries lose the nuance that makes context valuable.

Your original message: "The auth service uses a custom JWT implementation because we needed to support both RS256 and ES256 signatures, and the standard library only handles RS256. The custom implementation is in `crypto/jwt.ts` and it has a known issue with token refresh where the signature algorithm isn't preserved across refresh cycles."

The compressed version: "Auth service uses custom JWT."

That summary is technically accurate and practically useless. The critical details — the dual-algorithm requirement, the specific file location, the known refresh bug — are exactly the information the AI needs to avoid suggesting the wrong fix. And they're exactly the information that gets compressed away.

The Paperweight Effect

Here's the cruel irony: the more you use Antigravity in a session, the less useful it becomes.

In the first 10 minutes, the AI has full context of everything discussed. Suggestions are accurate, edits are consistent, and the tool feels genuinely productive. You're moving fast.

By minute 30, the AI has lost the beginning of the conversation. It starts making suggestions that conflict with earlier decisions. You spend time re-explaining constraints you already covered.

By minute 60, you're fighting the tool more than collaborating with it. Every third suggestion requires correction. You've re-explained your architecture twice. The session that started productive has become a drag.

Developers report losing 20-35% of productive time in extended AI sessions to context re-explanation and correcting degradation-induced errors. On a 3-hour session, that's 35-60 minutes of wasted effort — time spent telling the AI what it already knew an hour ago.

This is the paperweight effect: the tool gets heavier (less useful) the longer you carry it. Eventually the cost of maintaining the AI's understanding exceeds the benefit of its suggestions.

Google's Knowledge Base: A Partial Solution

Antigravity's Knowledge Base feature attempts to address context degradation by learning patterns from your coding behavior over time. It observes your edits, notes your conventions, and builds a persistent understanding of your project.

This helps with certain classes of context loss:

Code style preferences — indentation, naming conventions, import ordering
Common patterns — how you typically structure components, handle errors, write tests
Project-specific terminology — what you mean by "service," "handler," "repository"

But Knowledge Base has fundamental limitations that prevent it from solving the core problem.

It learns patterns, not facts. Knowledge Base can learn that you prefer `async/await` over `.then()` chains. It cannot learn that your database migration in PR #247 changed the `users` table schema and three services need updating.

It has no structural understanding. Knowledge Base doesn't know your dependency graph. It can't tell Antigravity that changing `PaymentService` affects `OrderProcessor`, `InvoiceGenerator`, and `RefundHandler`. Structural relationships require explicit graph analysis, not pattern learning.

It's slow to update. Knowledge Base learns from repeated observations. If you change your authentication approach in a single commit, Knowledge Base won't reflect that change until it's observed the new pattern enough times. Meanwhile, it may still suggest the old approach.

It can learn wrong patterns. If you wrote a workaround three months ago that you've since replaced, Knowledge Base may have learned the workaround as a "preferred pattern." There's no mechanism to tell it "this was temporary, stop suggesting it."

What Persistent Memory Actually Means

Persistent memory is fundamentally different from both context windows and learned patterns. It's the ability to store specific observations, decisions, and facts that survive session boundaries and remain accurate over time.

A persistent memory system has three properties that context windows and pattern learning lack:

Explicit storage. Information is deliberately captured, not inferred from behavior. When you note "we switched from MongoDB to PostgreSQL in January," that fact is stored as-is — not as a pattern probability.

Session independence. Memories persist across conversation sessions. When you start a new chat tomorrow, the AI already knows about today's decisions without re-explanation. No more "as I mentioned earlier" when there is no "earlier."

Verifiability. Stored memories can be inspected, searched, and validated. You can check what the AI "remembers" and correct inaccuracies. Unlike pattern learning, which is opaque, persistent memory is transparent.

How External Memory Systems Work

The most effective persistent memory implementations share a common architecture: they link observations to the code graph rather than storing them as free-floating text.

Code-Graph-Linked Observations

Instead of storing "the auth service has a refresh bug," a graph-linked memory stores the observation *attached to the specific symbol* — `refreshToken()` in `crypto/jwt.ts`. When the AI later encounters code that depends on `refreshToken()`, the observation surfaces automatically.

This is powerful because relevance is structural, not keyword-based. If you're working on `SessionManager` and it calls `refreshToken()`, the memory about the refresh bug appears — even though "SessionManager" and "refresh bug" share no keywords.

Cross-Session Search

Persistent memory systems maintain a searchable index of all observations across all sessions. When you start a new session and ask about authentication, the system retrieves every observation related to auth — from this session, last week's session, and the session from three months ago where you documented the token refresh edge case.

This eliminates the "cold start" problem where every new session begins with zero context about past work. The AI starts informed because it has access to the accumulated knowledge from every previous session.

Staleness Detection

Code changes. Observations about code can become stale. A robust memory system detects when the code a memory is attached to has changed and flags the observation as potentially outdated.

If you stored an observation about `refreshToken()` and then refactored that function, the memory system notes the discrepancy. Instead of confidently surfacing stale context, it either suppresses the observation or presents it with a staleness warning. This prevents the "learned wrong patterns" problem that plagues Knowledge Base approaches.

How vexp Adds Persistent Memory to Antigravity

vexp's session memory system integrates with Antigravity through MCP (Model Context Protocol), adding all three persistent memory capabilities.

Automatic Observation Capture

During coding sessions, vexp automatically captures observations: decisions made, bugs found, architectural constraints discussed. These observations are linked to specific symbols in the code graph. You don't need to manually document anything — the system captures context as it happens.

You can also save explicit observations using `save_observation` when you want to record something specific: "This endpoint must return within 200ms because the mobile client has a hard timeout."

Cross-Session Retrieval

When you start a new Antigravity session, vexp's `run_pipeline` tool automatically retrieves relevant memories from all previous sessions. Ask about the payment flow, and vexp surfaces every observation about payment-related code — including the note from two weeks ago about the Stripe webhook race condition you never fully resolved.

This means your first message in a new session gets context-rich responses. No warm-up period, no re-explanation, no "let me tell you about our architecture" prelude.

Memory Search

Need to find a specific decision or observation? `search_memory` queries across all sessions:

"Why did we choose PostgreSQL over MongoDB?" — surfaces the session where that decision was discussed
"What's the known issue with token refresh?" — finds the observation linked to `refreshToken()`
"What did we decide about the caching strategy?" — retrieves the architectural decision from three sessions ago

This is organizational knowledge management at the individual developer level. Every decision, every constraint, every "we tried X and it didn't work because Y" is preserved and searchable.

Practical Workflow for Maintaining Context

Session Start

Begin every Antigravity session by letting vexp load relevant memories. Your first prompt should describe the task, and vexp will automatically surface past observations related to that area of the codebase.

Instead of spending 5-10 minutes re-explaining your architecture, you get immediate context-aware responses.

During the Session

As you work, vexp captures observations automatically. When you make a decision ("let's use optimistic locking instead of pessimistic"), it's linked to the relevant code and available in future sessions.

For critical decisions you want to ensure are captured, add an explicit observation: "We chose optimistic locking for the order table because write conflicts are rare and read throughput is the priority."

Session End

No action needed. All observations are persisted automatically. When you start a new session tomorrow — even if it's about a different part of the codebase — vexp has the full history available.

Long-Running Projects

Over weeks and months, vexp accumulates a rich memory of your project's evolution. New team members using vexp can benefit from observations captured by others (on team plans). The AI doesn't just know the current code — it knows *why* the code is the way it is.

Beyond the Context Window

The context window is a hardware constraint. It's not going away anytime soon — even as windows expand to 1M+ tokens, coding sessions will eventually fill them. The solution isn't a bigger window. It's selective retrieval from persistent storage.

A 200K-token context window filled with structurally relevant context (dependency graphs, targeted memories, specific observations) outperforms a 1M-token window filled with raw conversation history. The window size matters less than what's in it.

Persistent memory transforms AI coding assistants from stateless tools into stateful collaborators. They remember what you've discussed, what you've decided, and why. They don't ask the same question twice. They don't contradict yesterday's decisions. They don't forget the constraint you explained 40 minutes ago.

The paperweight effect disappears. Instead of getting heavier with use, the tool gets lighter — more informed, more accurate, more valuable with every session.

Frequently Asked Questions

What causes Antigravity to forget context during long sessions?

Antigravity operates within a fixed context window (typically 128K-200K tokens). As your conversation generates more tokens than the window can hold, older messages are compressed or dropped to make room for new ones. This compression loses nuanced details — specific file paths, architectural constraints, decision rationale — while preserving only surface-level summaries. The result is that the AI progressively forgets the detailed context you established early in the session.

How is persistent memory different from Antigravity's Knowledge Base?

Knowledge Base learns behavioral patterns from your coding over time (style preferences, common patterns, conventions). Persistent memory stores explicit facts, decisions, and observations linked to specific code symbols. Knowledge Base might learn that you prefer async/await. Persistent memory stores "we chose PostgreSQL over MongoDB because of transaction requirements" attached to the specific database module. Knowledge Base is probabilistic and opaque; persistent memory is deterministic and searchable.

Does vexp's memory slow down Antigravity's response time?

No. vexp retrieves memories as part of its context capsule, which typically uses 65-70% fewer tokens than raw file inclusion. Because the context is smaller and more relevant, model inference is actually faster. Memory retrieval from vexp's index takes milliseconds. The net effect is faster, more accurate responses because the model processes less irrelevant information.

Can I see and edit what vexp remembers about my codebase?

Yes. vexp's memories are fully searchable using the `search_memory` tool, and you can review session-specific observations with `get_session_context`. If an observation is outdated or incorrect, you can save a new observation that supersedes it. vexp also automatically detects staleness — when the code a memory references has changed, it flags the observation as potentially outdated.

How many sessions of memory does vexp retain?

vexp retains observations from all sessions without a fixed limit. Memories are indexed and ranked by relevance, recency, and structural connection to the current task. When you query about a specific area of code, vexp surfaces the most relevant observations regardless of when they were captured — whether from yesterday or three months ago. Staleness detection ensures that outdated observations are deprioritized or flagged.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Cost & Optimization

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Nicola·May 25, 2026

Windsurf

Windsurf Credits Running Out? How to Use Fewer Tokens Per Task

Windsurf credits deplete fast because the AI processes too much irrelevant context. Reduce what it needs to read and your credits last 2-3x longer.

Nicola·May 14, 2026

Antigravity

Antigravity Knowledge Base: How the IDE Learns (And Where It Falls Short)

Antigravity's knowledge base feature learns your codebase over time. But it misses dependency relationships and cross-file connections that matter most.

Nicola·May 12, 2026

Antigravity Keeps Forgetting Context? Add Persistent Memory

The Context Degradation Problem

How Compression Destroys Nuance

The Paperweight Effect

Google's Knowledge Base: A Partial Solution

What Persistent Memory Actually Means

How External Memory Systems Work

Code-Graph-Linked Observations

Cross-Session Search

Staleness Detection

How vexp Adds Persistent Memory to Antigravity

Automatic Observation Capture

Cross-Session Retrieval

Memory Search

Practical Workflow for Maintaining Context

Session Start

During the Session

Session End

Long-Running Projects

Beyond the Context Window

Frequently Asked Questions

Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

Windsurf Credits Running Out? How to Use Fewer Tokens Per Task

Antigravity Knowledge Base: How the IDE Learns (And Where It Falls Short)