~/blog/rlm-recursive-language-model-skill

$ cat rlm-recursive-language-model-skill.md _

AI | 2026-01-16 | ◷ 10 min

RLM: A Claude Code Skill for Processing Massive Documents

#Claude Code #AI #Python

# RLM: A Claude Code Skill for Processing Massive Documents

I created this /rlm skill based on the inspiration of this post:
https://x.com/godofprompt/status/2011850737354228039. I later read the referencing
documentation found at https://arxiv.org/pdf/2512.24601 to get an idea of how I
could quickly use this with Claude Code.

## The Context Window Problem

LLMs have a fundamental constraint: context windows. Even with 200K tokens, a large PDF can eat through that budget fast. Load a document, ask a question, and you've already burned most of your reasoning capacity.

"The key insight is that long prompts should not be fed into the neural network directly but should instead be treated as part of the environment that the LLM can symbolically interact with."

— Recursive Language Models, Zhang et al., MIT CSAIL

The RLM paradigm flips the approach: instead of cramming documents into context, treat them as external environments the LLM can query programmatically.

## The Architecture

TEXT

┌─────────────────────────────────────────────────────────────────┐
│                      USER QUESTION                              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ROOT RLM AGENT                               │
│  • Loads document as Python variable                            │
│  • Examines structure (pages, lines, chars)                     │
│  • Decides processing strategy                                  │
└─────────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    ┌───────────┐        ┌───────────┐        ┌───────────┐
    │ Sub-Agent │        │ Sub-Agent │        │ Sub-Agent │
    │ Chunk 1   │        │ Chunk 2   │        │ Chunk N   │
    └───────────┘        └───────────┘        └───────────┘
           │                    │                    │
           └────────────────────┼────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AGGREGATED RESPONSE                          │
└─────────────────────────────────────────────────────────────────┘

The root agent never loads the full document into context. It uses Python to peek at metadata, search for patterns, and extract only relevant sections. For complex queries requiring full document analysis, it spawns parallel sub-agents that each process a chunk.

## Key Components

Component	Purpose
`rlmloader.py`	Python utilities for document manipulation
`SKILL.md`	Skill definition with workflow instructions
pdfplumber	PDF text extraction library
Task tool	Claude Code's sub-agent spawning mechanism

## The Document Loader

The loader provides programmatic access without loading content into context:

PYTHON

from rlmloader import loaddocument

doc = loaddocument('/path/to/massive.pdf')

# Metadata only - doesn't load content into LLM context
doc.getinfo()  # → {pages: 500, lines: 15000, chars: 2000000}

# Targeted extraction
doc.getpage(42)              # Single page
doc.getlines(100, 150)       # Line range
doc.search("revenue Q3")      # Keyword search with context
doc.search(r"\$\d+M", regex=True)  # Regex search

# Chunking strategies
doc.chunkbypages()          # For PDFs
doc.chunkbychars(4000)      # For any document
doc.chunkbylines(100)       # Line-based chunks

The critical insight: the LLM writes code that executes against the document, receiving only the results. The full document never enters the context window.

## Processing Strategies

The skill selects strategy based on the question type:

### Targeted Search (Constant Cost)

For specific information lookup:

BASH
# User: "What was the Q3 revenue?" doc.search("Q3 revenue") # → Returns matches with surrounding context
# → LLM answers from small context snippet

### Section Read (Constant Cost)

For questions about specific parts:

BASH

# User: "Summarize chapter 5"
doc.getpage(45)  # Chapter 5 starts on page 45
doc.getlines(2000, 2500)  # Or by line range

### Parallel Chunks (Linear Cost)

For aggregation tasks requiring full document analysis:

PYTHON

chunks = doc.chunkbypages()
# Spawn sub-agents in parallel for each chunk
# Each sub-agent receives: chunk + specific question
# Root agent aggregates all responses

Metric	RLM	Baseline
BrowseComp+ accuracy	91.33%	70.47% (summarization)
OOLONG-Pairs F1	58.00%	0.04% (base GPT-5)
Scale	10M+ tokens	~200K tokens
Cost	Comparable/lower	Standard

### Recursive Decomposition (Linear to Quadratic)

For complex multi-hop reasoning:

PYTHON
# Break question into sub-questions # Process each (may involve further sub-agent calls)
# Synthesize from all findings

## Sub-Agent Protocol

Sub-agents are stateless. They receive a prompt like:

TEXT
Given this excerpt from [DOCUMENT]: [CHUNK CONTENT - 4000 chars]
Extract all mentions of financial metrics with their values. Respond with structured JSON.

The root agent launches multiple sub-agents in a single response. Claude Code's Task tool runs them concurrently:

PYTHON
# In SKILL.md workflow: # 1. Chunk document # 2. Launch N Task calls in parallel (single response) # 3. Wait for all results
# 4. Aggregate into final answer

## Installation

Clone the skill to your Claude Code skills directory:

BASH
git clone https://github.com/joshideas/rlm ~/.claude/skills/rlm cd ~/.claude/skills/rlm python3 -m venv .venv .venv/bin/pip install pdfplumber

The skill auto-activates when you invoke it with /rlm:

BASH
/rlm /path/to/document.pdf "What are the key findings?"

## Supported Formats

>PDF (via pdfplumber with page-aware chunking)
>Plain text (.txt, .log)
>Markdown (.md)
>Code files (.py, .js, .ts, etc.)
>Data formats (.json, .xml, .csv, .html)

## Performance Characteristics

From the original paper's benchmarks:

Metric RLM Baseline
BrowseComp+ accuracy 91.33% 70.47% (summarization)
OOLONG-Pairs F1 58.00% 0.04% (base GPT-5)
Scale 10M+ tokens ~200K tokens
Cost Comparable/lower Standard
The key advantage is scale. RLMs can process documents 100x larger than context windows while maintaining accuracy.
## Example Session

TEXT

User: /rlm ~/docs/annual-report-2025.pdf "What was the Q3 revenue?" Claude: Let me load and examine this document.$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \ ~/docs/annual-report-2025.pdf info { "filetype": "pdf", "totalpages": 247, "totalchars": 1248000, "preview": "Annual Report 2025\n\nTable of Contents..." } This is a 247-page PDF. Let me search for Q3 revenue information. $ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \ ~/docs/annual-report-2025.pdf search "Q3 revenue" [ { "match": "Q3 revenue", "position": 145230, "context": "...strong performance in Q3 revenue reached $847M, representing a 23% increase year-over-year..." } ] Found it. Let me get more context from that section. $ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlm_loader.py \ ~/docs/annual-report-2025.pdf lines 3200 3250 [Financial results section with full Q3 breakdown]

The Q3 2025 revenue was $847 million, up 23% year-over-year. This was driven primarily by growth in the cloud services division ($412M) and enterprise licensing ($289M).

## Why This Matters

Traditional RAG systems chunk documents into a vector database, then retrieve semantically similar chunks. This works for simple lookups but fails on:

>Aggregation queries ("count all X")
>Multi-hop reasoning ("compare X from section 2 with Y from section 7")
>Structure-dependent questions ("what's in the third table?")

RLMs preserve document structure and enable programmatic navigation. The LLM reasons about how to extract information, not just what information to retrieve.

## Limitations

>Requires Claude Code (uses Task tool for sub-agents)
>PDF extraction quality depends on document structure
>Complex layouts (multi-column, tables) may need manual handling
>Sub-agent parallelism has API rate limits

## Conclusion

The Recursive Language Model paradigm treats documents as environments rather than inputs. Instead of consuming context with raw text, the LLM writes code to navigate, search, and extract only what's needed.

For Claude Code users working with large documents, RLM provides a practical implementation of this approach. Install the skill, point it at your documents, and ask questions at any scale.

The skill is available at github.com/joshideas/rlm↗.

RLM: A Claude Code Skill for Processing Massive Documents

# RLM: A Claude Code Skill for Processing Massive Documents

## The Context Window Problem

## The Architecture

## Key Components

## The Document Loader

# Metadata only - doesn't load content into LLM context

# Targeted extraction

# Chunking strategies

## Processing Strategies

### Targeted Search (Constant Cost)

# → Returns matches with surrounding context

`# → LLM answers from small context snippet`

### Section Read (Constant Cost)

### Parallel Chunks (Linear Cost)

# Spawn sub-agents in parallel for each chunk

# Each sub-agent receives: chunk + specific question

`# Root agent aggregates all responses`

### Recursive Decomposition (Linear to Quadratic)

# Process each (may involve further sub-agent calls)

`# Synthesize from all findings`

## Sub-Agent Protocol

# 1. Chunk document

# 2. Launch N Task calls in parallel (single response)

# 3. Wait for all results

`# 4. Aggregate into final answer`

## Installation

## Supported Formats

## Performance Characteristics

## Example Session

## Why This Matters

## Limitations

## Conclusion