RLM: A Claude Code Skill for Processing Massive Documents
# RLM: A Claude Code Skill for Processing Massive Documents
I created this /rlm skill based on the inspiration of this post:
https://x.com/godofprompt/status/2011850737354228039. I later read the referencing
documentation found at https://arxiv.org/pdf/2512.24601 to get an idea of how I
could quickly use this with Claude Code.
## The Context Window Problem
LLMs have a fundamental constraint: context windows. Even with 200K tokens, a large PDF can eat through that budget fast. Load a document, ask a question, and you've already burned most of your reasoning capacity.
>"The key insight is that long prompts should not be fed into the neural network directly but should instead be treated as part of the environment that the LLM can symbolically interact with."
The RLM paradigm flips the approach: instead of cramming documents into context, treat them as external environments the LLM can query programmatically.— Recursive Language Models, Zhang et al., MIT CSAIL
## The Architecture
┌─────────────────────────────────────────────────────────────────┐
│ USER QUESTION │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ ROOT RLM AGENT │
│ • Loads document as Python variable │
│ • Examines structure (pages, lines, chars) │
│ • Decides processing strategy │
└─────────────────────────────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌───────────┐
│ Sub-Agent │ │ Sub-Agent │ │ Sub-Agent │
│ Chunk 1 │ │ Chunk 2 │ │ Chunk N │
└───────────┘ └───────────┘ └───────────┘
│ │ │
└────────────────────┼────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ AGGREGATED RESPONSE │
└─────────────────────────────────────────────────────────────────┘The root agent never loads the full document into context. It uses Python to peek at metadata, search for patterns, and extract only relevant sections. For complex queries requiring full document analysis, it spawns parallel sub-agents that each process a chunk.
## Key Components
| Component | Purpose |
|---|---|
rlmloader.py | Python utilities for document manipulation |
SKILL.md | Skill definition with workflow instructions |
| pdfplumber | PDF text extraction library |
| Task tool | Claude Code's sub-agent spawning mechanism |
## The Document Loader
The loader provides programmatic access without loading content into context:
from rlmloader import loaddocument
doc = load
document('/path/to/massive.pdf')
# Metadata only - doesn't load content into LLM context
doc.getinfo() # → {pages: 500, lines: 15000, chars: 2000000}
# Targeted extraction
doc.getpage(42) # Single page
doc.getlines(100, 150) # Line range
doc.search("revenue Q3") # Keyword search with context
doc.search(r"\$\d+M", regex=True) # Regex search
# Chunking strategies
doc.chunkbypages() # For PDFs
doc.chunkbychars(4000) # For any document
doc.chunkbylines(100) # Line-based chunksThe critical insight: the LLM writes code that executes against the document, receiving only the results. The full document never enters the context window.
## Processing Strategies
The skill selects strategy based on the question type:
### Targeted Search (Constant Cost)
For specific information lookup:
# User: "What was the Q3 revenue?"
doc.search("Q3 revenue")
# → Returns matches with surrounding context
# → LLM answers from small context snippet
### Section Read (Constant Cost)
For questions about specific parts:
# User: "Summarize chapter 5"
doc.getpage(45) # Chapter 5 starts on page 45
doc.getlines(2000, 2500) # Or by line range### Parallel Chunks (Linear Cost)
For aggregation tasks requiring full document analysis:
chunks = doc.chunkbypages()
# Spawn sub-agents in parallel for each chunk
# Each sub-agent receives: chunk + specific question
# Root agent aggregates all responses
### Recursive Decomposition (Linear to Quadratic)
For complex multi-hop reasoning:
# Break question into sub-questions
# Process each (may involve further sub-agent calls)
# Synthesize from all findings
## Sub-Agent Protocol
Sub-agents are stateless. They receive a prompt like:
Given this excerpt from [DOCUMENT]:
[CHUNK CONTENT - 4000 chars]
Extract all mentions of financial metrics with their values.
Respond with structured JSON.
The root agent launches multiple sub-agents in a single response. Claude Code's Task tool runs them concurrently:
# In SKILL.md workflow:
# 1. Chunk document
# 2. Launch N Task calls in parallel (single response)
# 3. Wait for all results
# 4. Aggregate into final answer
## Installation
Clone the skill to your Claude Code skills directory:
git clone https://github.com/joshideas/rlm ~/.claude/skills/rlm
cd ~/.claude/skills/rlm
python3 -m venv .venv
.venv/bin/pip install pdfplumberThe skill auto-activates when you invoke it with /rlm:
/rlm /path/to/document.pdf "What are the key findings?"## Supported Formats
- >PDF (via pdfplumber with page-aware chunking)
- >Plain text (.txt, .log)
- >Markdown (.md)
- >Code files (.py, .js, .ts, etc.)
- >Data formats (.json, .xml, .csv, .html)
## Performance Characteristics
From the original paper's benchmarks:
| Metric | RLM | Baseline |
|---|---|---|
| BrowseComp+ accuracy | 91.33% | 70.47% (summarization) |
| OOLONG-Pairs F1 | 58.00% | 0.04% (base GPT-5) |
| Scale | 10M+ tokens | ~200K tokens |
| Cost | Comparable/lower | Standard |
## Example Session
User: /rlm ~/docs/annual-report-2025.pdf "What was the Q3 revenue?"
Claude: Let me load and examine this document.
$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \
~/docs/annual-report-2025.pdf info
{
"filetype": "pdf",
"totalpages": 247,
"totalchars": 1248000,
"preview": "Annual Report 2025\n\nTable of Contents..."
}
This is a 247-page PDF. Let me search for Q3 revenue information.
$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \
~/docs/annual-report-2025.pdf search "Q3 revenue"
[
{
"match": "Q3 revenue",
"position": 145230,
"context": "...strong performance in Q3 revenue reached $847M,
representing a 23% increase year-over-year..."
}
]
Found it. Let me get more context from that section.
$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlm_loader.py \
~/docs/annual-report-2025.pdf lines 3200 3250
[Financial results section with full Q3 breakdown]
The Q3 2025 revenue was $847 million, up 23% year-over-year.
This was driven primarily by growth in the cloud services
division ($412M) and enterprise licensing ($289M).
## Why This Matters
Traditional RAG systems chunk documents into a vector database, then retrieve semantically similar chunks. This works for simple lookups but fails on:
- >Aggregation queries ("count all X")
- >Multi-hop reasoning ("compare X from section 2 with Y from section 7")
- >Structure-dependent questions ("what's in the third table?")
## Limitations
- >Requires Claude Code (uses Task tool for sub-agents)
- >PDF extraction quality depends on document structure
- >Complex layouts (multi-column, tables) may need manual handling
- >Sub-agent parallelism has API rate limits
## Conclusion
The Recursive Language Model paradigm treats documents as environments rather than inputs. Instead of consuming context with raw text, the LLM writes code to navigate, search, and extract only what's needed.
For Claude Code users working with large documents, RLM provides a practical implementation of this approach. Install the skill, point it at your documents, and ask questions at any scale.
The skill is available at github.com/joshideas/rlm↗.