Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG

Code Indexing for AI Agents: Embeddings vs Dependency Graphs vs RAG
Your AI coding agent is only as good as its ability to find the right code. On a 10K-line project, it can brute-force — read every file, hold the whole codebase in context. On a 200K-line monorepo, brute-force fails. The context window fills with noise, the model hallucinates nonexistent functions, and each task burns $2-5 in tokens because the agent reads 40 files to modify 3.
Code indexing solves this by pre-organizing the codebase so the agent can find relevant code without reading everything. But not all indexing approaches are equal. Embeddings, dependency graphs, and RAG pipelines each have fundamentally different trade-offs in accuracy, speed, maintenance cost, and token efficiency.
Choosing the wrong indexing strategy costs you 30-60% more tokens and measurably degrades output quality. Here's how each approach works, where it excels, and where it breaks down.
Why Code Indexing Matters
Without indexing, an AI agent navigating a large codebase does the equivalent of reading every book in a library to answer a question about one chapter. It searches filenames, greps for keywords, reads directory trees, and explores file by file until it finds what it needs.
This exploration is expensive. On Claude Code API, a single task that triggers 15-20 file reads burns $1.50-3.00 in tokens — and 60-70% of those reads are irrelevant files the agent checked and discarded. On Cursor Pro, the same exploration consumes fast requests that could be spent on actual code generation.
Indexing is the library catalog. It tells the agent where to look before it starts reading, reducing exploration to near-zero and token waste to the minimum.
The question isn't whether to index. It's how.
Three Indexing Approaches Explained
Embeddings: Vector Representations of Code
Embedding-based indexing converts code into high-dimensional vectors using a neural encoder model. Each chunk of code (typically a function, class, or fixed-size block) becomes a point in vector space where semantically similar code is geometrically close.
How it works:
- Parse the codebase into chunks (by function, class, or fixed-size blocks with overlap)
- Run each chunk through an embedding model (e.g., OpenAI `text-embedding-3-large`, Voyage Code 3)
- Store vectors in a vector database (Pinecone, Qdrant, Chroma, or a local FAISS index)
- At query time, embed the user's task description, find the k nearest vectors, return those chunks
The intuition: Code that does similar things produces similar vectors. A query about "user authentication" retrieves code chunks related to authentication, even if they use different variable names or terminology.
Dependency Graphs: Structural Relationship Maps
Dependency graph indexing parses code to extract symbols (functions, classes, types, modules) and their structural relationships (calls, imports, type references, inheritance). The result is a directed graph where nodes are code entities and edges are dependencies.
How it works:
- Parse every file with a structural parser (tree-sitter, LSP, or language-specific AST tools)
- Extract all symbols: function declarations, class definitions, type aliases, variable declarations
- Resolve references: when function A calls function B, create an edge A → B
- Build a graph with metadata (file location, symbol kind, scope, visibility)
- At query time, identify entry-point symbols and traverse the graph outward along dependency edges
The intuition: Code relevance follows structural connections. If you're modifying a function, its callers, callees, and shared types are relevant — regardless of whether they look textually similar.
RAG: Retrieval Pipelines Combining Search and Generation
RAG (Retrieval-Augmented Generation) is not an indexing method per se — it's a pipeline architecture that combines any retrieval method with generation. The retrieval component can use embeddings, keyword search, graph traversal, or any combination.
How it works:
- Index the codebase using one or more retrieval methods
- When a task arrives, retrieve relevant code chunks using the retrieval method
- Assemble retrieved chunks into a context prompt
- Feed context + task to the LLM for generation
The intuition: Separate retrieval from generation. Let specialized retrieval methods find the right code, then let the LLM focus on generating the right output.
The key insight is that RAG's quality depends entirely on its retrieval method. A RAG pipeline using vector retrieval inherits vector retrieval's limitations. A RAG pipeline using graph retrieval inherits graph retrieval's strengths. RAG is an architecture, not a solution — the retrieval method inside it determines the outcome.
Embeddings: Pros and Cons
Pros:
- Semantic understanding: Captures conceptual similarity beyond keyword matching. "User login validation" can match code labeled "credential verification."
- Language-agnostic: The same embedding model works across programming languages without language-specific parsers.
- Natural-language queries: Handles queries like "how does caching work?" that don't map to specific symbols.
- Mature ecosystem: Well-established vector databases, embedding models, and tooling.
Cons:
- No structural awareness: Embeddings don't encode call relationships, type dependencies, or import graphs. Two structurally unrelated functions with similar variable names produce similar embeddings.
- Chunk boundary problems: Functions split across chunks lose context. A class constructor in chunk 1 and the method with the bug in chunk 3 are retrieved independently, breaking the relationship.
- Re-embedding cost: Every code change requires re-embedding the affected chunks. For active codebases with frequent commits, this creates index staleness or continuous re-embedding costs ($0.01-0.05 per 1M tokens, adding up across large codebases).
- Scale degradation: As codebases grow, the embedding space gets crowded. Top-k retrieval captures a decreasing fraction of relevant code. Accuracy drops noticeably past 50K lines.
- False positives: Similar-looking code is not necessarily relevant code. Twenty API handlers produce similar embeddings; retrieving 5 of them for a bug in one handler returns 4 irrelevant results.
Dependency Graphs: Pros and Cons
Pros:
- Exact structural relevance: Retrieves code connected by actual dependencies — callers, callees, type references, imports. Every returned file has a structural reason for inclusion.
- Fast incremental updates: Only re-parse files that changed. A single file edit updates the graph in milliseconds, not minutes. No re-embedding cost.
- Scale-independent accuracy: Graph traversal accuracy doesn't degrade with codebase size. A 500K-line codebase retrieves with the same precision as a 10K-line project — the traversal depth is bounded, not the search space.
- Impact analysis: "What breaks if I change this function?" is a direct graph query. Follow all incoming edges to find callers, follow type edges to find dependent types. Exact answer, zero false positives.
- Token efficiency: Typical relevance ratio of 0.65-0.85 (65-85% of retrieved tokens are useful), compared to 0.15-0.30 for embeddings.
Cons:
- Requires structural parsing: Each language needs a parser (tree-sitter grammar, LSP server, or custom AST parser). Supporting 30 languages means maintaining 30 parsers.
- No semantic understanding: Can't handle fuzzy queries like "how does auth work here?" without first identifying specific symbols. The graph knows relationships, not concepts.
- Dynamic dispatch blind spots: In languages with heavy reflection, dynamic dispatch, or runtime code generation, static parsing misses some edges. Python's `getattr()`, Java's reflection, and JavaScript's `eval()` create invisible dependencies.
- Initial indexing time: First-time parsing of a large codebase takes 5-30 seconds depending on size. Subsequent updates are fast, but cold starts require patience.
RAG: Pros and Cons
Pros:
- Flexible architecture: Swap retrieval methods without changing the generation pipeline. Start with keyword search, upgrade to graph retrieval later.
- Composable: Combine multiple retrieval methods (vector + graph + keyword) in a single pipeline, using each where it's strongest.
- Generation optimization: The pipeline can compress, rerank, and filter retrieved results before passing them to the LLM, optimizing token usage.
Cons:
- Only as good as retrieval: A RAG pipeline with poor retrieval produces poor results regardless of generation quality. "RAG" is not a quality guarantee — it's a pipeline label.
- Pipeline complexity: Multiple stages (retrieval, reranking, compression, generation) means more failure points, more latency, and more configuration surface.
- Infrastructure overhead: Full RAG pipelines often require a vector database, an embedding API, a reranking model, and orchestration logic. This is significant infrastructure compared to a single-binary indexer.
Head-to-Head Comparison
| Metric | Embeddings | Dependency Graphs | Vector RAG Pipeline |
|--------|-----------|-------------------|-------------------|
| Accuracy (structural tasks) | Low (0.15-0.30 relevance) | High (0.65-0.85 relevance) | Depends on retrieval method |
| Accuracy (exploratory tasks) | High | Low without semantic layer | High with vector retrieval |
| Speed (query) | 50-200ms | 10-50ms | 100-500ms (multi-stage) |
| Speed (incremental update) | Slow (re-embed chunks) | Fast (re-parse changed files) | Depends on indexing method |
| Maintenance cost | Embedding API costs, vector DB | Parser maintenance | All of the above |
| Token efficiency | Poor (60-85% waste) | Excellent (15-35% waste) | Depends on retrieval method |
| Infrastructure | Vector DB + embedding API | Local binary, no dependencies | Vector DB + embedding API + orchestration |
| Scale behavior | Degrades past 50K lines | Consistent at any scale | Depends on retrieval method |
The pattern is clear: for modification tasks (bug fixes, refactors, feature additions), dependency graphs win on accuracy, speed, and token efficiency. For exploration tasks (onboarding, architecture understanding, finding examples), embeddings have an edge. RAG inherits the characteristics of whichever retrieval method powers it.
Hybrid Approaches: Combining Graph and Embeddings
The most sophisticated indexing systems combine structural and semantic retrieval.
Hybrid retrieval: Use semantic search to identify candidate entry-point symbols (handling fuzzy queries), then switch to graph traversal to expand those entry points into a structurally complete context. This gives you semantic understanding for query interpretation and structural precision for context assembly.
Ranked fusion: Run both retrieval methods in parallel, merge the results, and rerank by a combined score. Structural proximity dominates for files close to the entry point; semantic similarity fills gaps for files the graph doesn't reach.
The practical tradeoff: Hybrid approaches produce the best results but add complexity. For teams that can invest in the infrastructure, they're optimal. For teams that want simplicity, a well-implemented graph-based approach covers 80-90% of task types effectively.
How vexp Uses Dependency Graphs
vexp takes the graph-first approach, building dependency graphs with tree-sitter parsing across 30 programming languages. The indexing pipeline is deliberately simple.
Parsing: Tree-sitter grammars extract symbols and references from every file. No external API calls, no embedding costs, no vector database. The entire index lives in a local SQLite database alongside a graph structure.
Incremental updates: When a file changes, only that file is re-parsed. The graph edges originating from that file are updated. A typical incremental update completes in under 100ms — fast enough to stay current during active development.
Retrieval: Task descriptions are analyzed to identify entry-point symbols through a hybrid of keyword matching, semantic similarity, and graph centrality (PageRank). From those entry points, the graph is traversed outward along dependency edges. Results are ranked by structural proximity and centrality.
Result: Context capsules of 5-15 files with a relevance ratio of 0.65-0.85, delivered in a single MCP tool call. The typical token reduction compared to agent-driven exploration is 65-70%.
The graph approach trades semantic flexibility for structural precision. For the modification tasks that constitute 80%+ of professional development work, this trade-off is overwhelmingly favorable.
Choosing the Right Approach for Your Codebase
Choose embeddings if:
- Your codebase uses languages without strong tree-sitter support
- Most of your AI tasks are exploratory (understanding, not modifying)
- You already have vector database infrastructure
- Your codebase is under 50K lines (where scale degradation isn't an issue)
Choose dependency graphs if:
- Your tasks are primarily modification (bugs, features, refactors)
- Your codebase is large (50K+ lines) and growing
- Token cost is a concern (graph retrieval wastes 3-5x fewer tokens)
- You want zero external dependencies (no API calls, no vector databases)
- You need impact analysis ("what breaks if I change this?")
Choose RAG with hybrid retrieval if:
- You have the engineering resources to build and maintain a multi-stage pipeline
- Your tasks are evenly split between exploration and modification
- You need maximum flexibility across different query types
For most development teams, graph-based indexing provides the best cost-to-quality ratio. It handles the majority of tasks (modification) with high precision, requires minimal infrastructure, and scales without accuracy degradation. Semantic capabilities can be layered on top when needed, rather than serving as the foundation.
Frequently Asked Questions
What is code indexing and why do AI coding agents need it?
What are the main differences between embeddings and dependency graphs for code indexing?
Which code indexing approach is most token-efficient for AI coding?
Can I combine embeddings and dependency graphs for code retrieval?
How does vexp's code indexing compare to vector-based RAG systems?
Nicola
Developer and creator of vexp — a context engine for AI coding agents. I build tools that make AI coding assistants faster, cheaper, and actually useful on real codebases.
Related Articles

Vibe Coding Is Fun Until the Bill Arrives: Token Optimization Guide
Vibe coding with AI is addictive but expensive. Freestyle prompting without context management burns tokens 3-5x faster than structured workflows.

RAG for Code: Retrieval-Augmented Generation in AI Development
RAG retrieves relevant code from your codebase before the AI generates a response. But vector-based RAG misses structural relationships that matter for coding.

Context Quality vs Quantity: Why More Tokens Don't Mean Better Code
Loading more files into the context window doesn't improve AI output — it degrades it. Quality context with 5 relevant files beats 50 random ones every time.