Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

Nicola·May 22, 2026

Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

Your AI coding agent is only as good as its ability to find the right code. On a 10K-line project, it can brute-force — read every file, hold the whole codebase in context. On a 200K-line monorepo, brute-force fails. The context window fills with noise, the model hallucinates nonexistent functions, and each task burns $2-5 in tokens because the agent reads 40 files to modify 3.

Code indexing solves this by pre-organizing the codebase so the agent can find relevant code without reading everything. But not all indexing approaches are equal. Embeddings, dependency graphs, and RAG pipelines each have fundamentally different trade-offs in accuracy, speed, maintenance cost, and token efficiency.

Choosing the wrong indexing strategy costs you 30-60% more tokens and measurably degrades output quality. Here's how each approach works, where it excels, and where it breaks down.

Why Code Indexing Matters

Without indexing, an AI agent navigating a large codebase does the equivalent of reading every book in a library to answer a question about one chapter. It searches filenames, greps for keywords, reads directory trees, and explores file by file until it finds what it needs.

This exploration is expensive. On Claude Code API, a single task that triggers 15-20 file reads burns $1.50-3.00 in tokens — and 60-70% of those reads are irrelevant files the agent checked and discarded. On Cursor Pro, the same exploration consumes fast requests that could be spent on actual code generation.

Indexing is the library catalog. It tells the agent where to look before it starts reading, reducing exploration to near-zero and token waste to the minimum.

The question isn't whether to index. It's how.

Three Indexing Approaches Explained

Embeddings: Vector Representations of Code

Embedding-based indexing converts code into high-dimensional vectors using a neural encoder model. Each chunk of code (typically a function, class, or fixed-size block) becomes a point in vector space where semantically similar code is geometrically close.

How it works:

Parse the codebase into chunks (by function, class, or fixed-size blocks with overlap)
Run each chunk through an embedding model (e.g., OpenAI `text-embedding-3-large`, Voyage Code 3)
Store vectors in a vector database (Pinecone, Qdrant, Chroma, or a local FAISS index)
At query time, embed the user's task description, find the k nearest vectors, return those chunks

The intuition: Code that does similar things produces similar vectors. A query about "user authentication" retrieves code chunks related to authentication, even if they use different variable names or terminology.

Dependency Graphs: Structural Relationship Maps

Dependency graph indexing parses code to extract symbols (functions, classes, types, modules) and their structural relationships (calls, imports, type references, inheritance). The result is a directed graph where nodes are code entities and edges are dependencies.

How it works:

Parse every file with a structural parser (tree-sitter, LSP, or language-specific AST tools)
Extract all symbols: function declarations, class definitions, type aliases, variable declarations
Resolve references: when function A calls function B, create an edge A → B
Build a graph with metadata (file location, symbol kind, scope, visibility)
At query time, identify entry-point symbols and traverse the graph outward along dependency edges

The intuition: Code relevance follows structural connections. If you're modifying a function, its callers, callees, and shared types are relevant — regardless of whether they look textually similar.

RAG: Retrieval Pipelines Combining Search and Generation

RAG (Retrieval-Augmented Generation) is not an indexing method per se — it's a pipeline architecture that combines any retrieval method with generation. The retrieval component can use embeddings, keyword search, graph traversal, or any combination.

How it works:

Index the codebase using one or more retrieval methods
When a task arrives, retrieve relevant code chunks using the retrieval method
Assemble retrieved chunks into a context prompt
Feed context + task to the LLM for generation

The intuition: Separate retrieval from generation. Let specialized retrieval methods find the right code, then let the LLM focus on generating the right output.

The key insight is that RAG's quality depends entirely on its retrieval method. A RAG pipeline using vector retrieval inherits vector retrieval's limitations. A RAG pipeline using graph retrieval inherits graph retrieval's strengths. RAG is an architecture, not a solution — the retrieval method inside it determines the outcome.

Embeddings: Pros and Cons

Pros:

Semantic understanding: Captures conceptual similarity beyond keyword matching. "User login validation" can match code labeled "credential verification."
Language-agnostic: The same embedding model works across programming languages without language-specific parsers.
Natural-language queries: Handles queries like "how does caching work?" that don't map to specific symbols.
Mature ecosystem: Well-established vector databases, embedding models, and tooling.

Cons:

No structural awareness: Embeddings don't encode call relationships, type dependencies, or import graphs. Two structurally unrelated functions with similar variable names produce similar embeddings.
Chunk boundary problems: Functions split across chunks lose context. A class constructor in chunk 1 and the method with the bug in chunk 3 are retrieved independently, breaking the relationship.
Re-embedding cost: Every code change requires re-embedding the affected chunks. For active codebases with frequent commits, this creates index staleness or continuous re-embedding costs ($0.01-0.05 per 1M tokens, adding up across large codebases).
Scale degradation: As codebases grow, the embedding space gets crowded. Top-k retrieval captures a decreasing fraction of relevant code. Accuracy drops noticeably past 50K lines.
False positives: Similar-looking code is not necessarily relevant code. Twenty API handlers produce similar embeddings; retrieving 5 of them for a bug in one handler returns 4 irrelevant results.

Dependency Graphs: Pros and Cons

Pros:

Exact structural relevance: Retrieves code connected by actual dependencies — callers, callees, type references, imports. Every returned file has a structural reason for inclusion.
Fast incremental updates: Only re-parse files that changed. A single file edit updates the graph in milliseconds, not minutes. No re-embedding cost.
Scale-independent accuracy: Graph traversal accuracy doesn't degrade with codebase size. A 500K-line codebase retrieves with the same precision as a 10K-line project — the traversal depth is bounded, not the search space.
Impact analysis: "What breaks if I change this function?" is a direct graph query. Follow all incoming edges to find callers, follow type edges to find dependent types. Exact answer, zero false positives.
Token efficiency: Typical relevance ratio of 0.65-0.85 (65-85% of retrieved tokens are useful), compared to 0.15-0.30 for embeddings.

Cons:

Requires structural parsing: Each language needs a parser (tree-sitter grammar, LSP server, or custom AST parser). Supporting 30 languages means maintaining 30 parsers.
No semantic understanding: Can't handle fuzzy queries like "how does auth work here?" without first identifying specific symbols. The graph knows relationships, not concepts.
Dynamic dispatch blind spots: In languages with heavy reflection, dynamic dispatch, or runtime code generation, static parsing misses some edges. Python's `getattr()`, Java's reflection, and JavaScript's `eval()` create invisible dependencies.
Initial indexing time: First-time parsing of a large codebase takes 5-30 seconds depending on size. Subsequent updates are fast, but cold starts require patience.

RAG: Pros and Cons

Pros:

Flexible architecture: Swap retrieval methods without changing the generation pipeline. Start with keyword search, upgrade to graph retrieval later.
Composable: Combine multiple retrieval methods (vector + graph + keyword) in a single pipeline, using each where it's strongest.
Generation optimization: The pipeline can compress, rerank, and filter retrieved results before passing them to the LLM, optimizing token usage.

Cons:

Only as good as retrieval: A RAG pipeline with poor retrieval produces poor results regardless of generation quality. "RAG" is not a quality guarantee — it's a pipeline label.
Pipeline complexity: Multiple stages (retrieval, reranking, compression, generation) means more failure points, more latency, and more configuration surface.
Infrastructure overhead: Full RAG pipelines often require a vector database, an embedding API, a reranking model, and orchestration logic. This is significant infrastructure compared to a single-binary indexer.

Head-to-Head Comparison

|--------|-----------|-------------------|-------------------|

| Speed (query) | 50-200ms | 10-50ms | 100-500ms (multi-stage) |

The pattern is clear: for modification tasks (bug fixes, refactors, feature additions), dependency graphs win on accuracy, speed, and token efficiency. For exploration tasks (onboarding, architecture understanding, finding examples), embeddings have an edge. RAG inherits the characteristics of whichever retrieval method powers it.

Hybrid Approaches: Combining Graph and Embeddings

The most sophisticated indexing systems combine structural and semantic retrieval.

Hybrid retrieval: Use semantic search to identify candidate entry-point symbols (handling fuzzy queries), then switch to graph traversal to expand those entry points into a structurally complete context. This gives you semantic understanding for query interpretation and structural precision for context assembly.

Ranked fusion: Run both retrieval methods in parallel, merge the results, and rerank by a combined score. Structural proximity dominates for files close to the entry point; semantic similarity fills gaps for files the graph doesn't reach.

The practical tradeoff: Hybrid approaches produce the best results but add complexity. For teams that can invest in the infrastructure, they're optimal. For teams that want simplicity, a well-implemented graph-based approach covers 80-90% of task types effectively.

How vexp Uses Dependency Graphs

vexp takes the graph-first approach, building dependency graphs with tree-sitter parsing across 30 programming languages. The indexing pipeline is deliberately simple.

Parsing: Tree-sitter grammars extract symbols and references from every file. No external API calls, no embedding costs, no vector database. The entire index lives in a local SQLite database alongside a graph structure.

Incremental updates: When a file changes, only that file is re-parsed. The graph edges originating from that file are updated. A typical incremental update completes in under 100ms — fast enough to stay current during active development.

Retrieval: Task descriptions are analyzed to identify entry-point symbols through a hybrid of keyword matching, semantic similarity, and graph centrality (PageRank). From those entry points, the graph is traversed outward along dependency edges. Results are ranked by structural proximity and centrality.

Result: Context capsules of 5-15 files with a relevance ratio of 0.65-0.85, delivered in a single MCP tool call. The typical token reduction compared to agent-driven exploration is 65-70%.

The graph approach trades semantic flexibility for structural precision. For the modification tasks that constitute 80%+ of professional development work, this trade-off is overwhelmingly favorable.

Choosing the Right Approach for Your Codebase

Choose embeddings if:

Your codebase uses languages without strong tree-sitter support
Most of your AI tasks are exploratory (understanding, not modifying)
You already have vector database infrastructure
Your codebase is under 50K lines (where scale degradation isn't an issue)

Choose dependency graphs if:

Your tasks are primarily modification (bugs, features, refactors)
Your codebase is large (50K+ lines) and growing
Token cost is a concern (graph retrieval wastes 3-5x fewer tokens)
You want zero external dependencies (no API calls, no vector databases)
You need impact analysis ("what breaks if I change this?")

Choose RAG with hybrid retrieval if:

You have the engineering resources to build and maintain a multi-stage pipeline
Your tasks are evenly split between exploration and modification
You need maximum flexibility across different query types

For most development teams, graph-based indexing provides the best cost-to-quality ratio. It handles the majority of tasks (modification) with high precision, requires minimal infrastructure, and scales without accuracy degradation. Semantic capabilities can be layered on top when needed, rather than serving as the foundation.

Frequently Asked Questions

What is code indexing and why do AI coding agents need it?

Code indexing pre-organizes a codebase so AI agents can find relevant code without reading every file. Without indexing, agents explore file by file, wasting 60-70% of tokens on irrelevant reads. Indexing acts as a catalog that directs the agent to the right files immediately, reducing token costs by 50-70% and improving output accuracy by eliminating noise from the context window.

What are the main differences between embeddings and dependency graphs for code indexing?

Embeddings convert code into vectors capturing semantic similarity — good for finding conceptually related code but blind to structural relationships like function calls and type dependencies. Dependency graphs map actual code relationships (calls, imports, type references) — excellent for structural relevance but unable to handle fuzzy natural-language queries. Embeddings degrade past 50K lines of code; dependency graphs maintain accuracy at any scale.

Which code indexing approach is most token-efficient for AI coding?

Dependency graph indexing is the most token-efficient, achieving a relevance ratio of 0.65-0.85 (only 15-35% token waste). Embedding-based approaches typically achieve 0.15-0.30 relevance ratios (60-85% waste). For a developer processing 500K tokens daily, graph-based indexing saves roughly $3-5/day in token costs compared to embedding-based retrieval.

Can I combine embeddings and dependency graphs for code retrieval?

Yes, hybrid approaches use semantic search to interpret fuzzy queries and identify candidate symbols, then switch to graph traversal for structurally complete context assembly. This combines semantic flexibility with structural precision. However, hybrid systems add complexity. For most teams, a well-implemented graph-based approach covers 80-90% of tasks effectively, and semantic capabilities can be layered on incrementally.

How does vexp's code indexing compare to vector-based RAG systems?

vexp uses dependency graph indexing with tree-sitter parsing across 30 languages. It produces context capsules of 5-15 files with 0.65-0.85 relevance ratios, compared to 0.15-0.30 for typical vector RAG. It requires no external APIs, no vector database, and no embedding costs — the entire index is a local file that updates incrementally in under 100ms. The trade-off is that vexp prioritizes structural precision over semantic exploration, which matches the modification-heavy nature of most professional development work.

Nicola

Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.

Cost & Optimization

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

Nicola·May 25, 2026

Context Engineering

RAG for Code: Retrieval-Augmented Generation in AI Development

RAG retrieves relevant code from your codebase before the AI generates a response. But vector-based RAG misses structural relationships that matter for coding.

Nicola·May 21, 2026

Context Engineering

Context Quality vs Quantity: Why More Tokens Don't Mean Better Code

Loading more files into the context window doesn't improve AI output — it degrades it. Quality context with 5 relevant files beats 50 random ones every time.

Nicola·May 20, 2026

Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

Why Code Indexing Matters

Three Indexing Approaches Explained

Embeddings: Vector Representations of Code

Dependency Graphs: Structural Relationship Maps

RAG: Retrieval Pipelines Combining Search and Generation

Embeddings: Pros and Cons

Dependency Graphs: Pros and Cons

RAG: Pros and Cons

Head-to-Head Comparison

Hybrid Approaches: Combining Graph and Embeddings

How vexp Uses Dependency Graphs

Choosing the Right Approach for Your Codebase

Frequently Asked Questions

Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide

RAG for Code: Retrieval-Augmented Generation in AI Development

Context Quality vs Quantity: Why More Tokens Don't Mean Better Code