SYSTEM_BLOG
TIME: 00:00:00
STATUS: ONLINE
~/blog/rlm-recursive-language-model-skill
$ cat rlm-recursive-language-model-skill.md _
| 2026-01-16 | 10 min

RLM: A Claude Code Skill for Processing Massive Documents

# RLM: A Claude Code Skill for Processing Massive Documents

I created this /rlm skill based on the inspiration of this post:
https://x.com/godofprompt/status/2011850737354228039. I later read the referencing
documentation found at https://arxiv.org/pdf/2512.24601 to get an idea of how I
could quickly use this with Claude Code.

## The Context Window Problem

LLMs have a fundamental constraint: context windows. Even with 200K tokens, a large PDF can eat through that budget fast. Load a document, ask a question, and you've already burned most of your reasoning capacity.

"The key insight is that long prompts should not be fed into the neural network directly but should instead be treated as part of the environment that the LLM can symbolically interact with."

>

Recursive Language Models, Zhang et al., MIT CSAIL

The RLM paradigm flips the approach: instead of cramming documents into context, treat them as external environments the LLM can query programmatically.

## The Architecture

TEXT
┌─────────────────────────────────────────────────────────────────┐
│                      USER QUESTION                              │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    ROOT RLM AGENT                               │
│  • Loads document as Python variable                            │
│  • Examines structure (pages, lines, chars)                     │
│  • Decides processing strategy                                  │
└─────────────────────────────────────────────────────────────────┘
           │                    │                    │
           ▼                    ▼                    ▼
    ┌───────────┐        ┌───────────┐        ┌───────────┐
    │ Sub-Agent │        │ Sub-Agent │        │ Sub-Agent │
    │ Chunk 1   │        │ Chunk 2   │        │ Chunk N   │
    └───────────┘        └───────────┘        └───────────┘
           │                    │                    │
           └────────────────────┼────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    AGGREGATED RESPONSE                          │
└─────────────────────────────────────────────────────────────────┘

The root agent never loads the full document into context. It uses Python to peek at metadata, search for patterns, and extract only relevant sections. For complex queries requiring full document analysis, it spawns parallel sub-agents that each process a chunk.

## Key Components

ComponentPurpose
rlmloader.pyPython utilities for document manipulation
SKILL.mdSkill definition with workflow instructions
pdfplumberPDF text extraction library
Task toolClaude Code's sub-agent spawning mechanism

## The Document Loader

The loader provides programmatic access without loading content into context:

PYTHON
from rlmloader import loaddocument

doc = loaddocument('/path/to/massive.pdf')

# Metadata only - doesn't load content into LLM context

doc.getinfo() # → {pages: 500, lines: 15000, chars: 2000000}

# Targeted extraction

doc.get
page(42) # Single page doc.getlines(100, 150) # Line range doc.search("revenue Q3") # Keyword search with context doc.search(r"\$\d+M", regex=True) # Regex search

# Chunking strategies

doc.chunk
bypages() # For PDFs doc.chunkbychars(4000) # For any document doc.chunkbylines(100) # Line-based chunks

The critical insight: the LLM writes code that executes against the document, receiving only the results. The full document never enters the context window.

## Processing Strategies

The skill selects strategy based on the question type:

### Targeted Search (Constant Cost)

For specific information lookup:

BASH
# User: "What was the Q3 revenue?"
doc.search("Q3 revenue")

# → Returns matches with surrounding context

# → LLM answers from small context snippet

### Section Read (Constant Cost)

For questions about specific parts:

BASH
# User: "Summarize chapter 5"
doc.getpage(45)  # Chapter 5 starts on page 45
doc.getlines(2000, 2500)  # Or by line range

### Parallel Chunks (Linear Cost)

For aggregation tasks requiring full document analysis:

PYTHON
chunks = doc.chunkbypages()

# Spawn sub-agents in parallel for each chunk

# Each sub-agent receives: chunk + specific question

# Root agent aggregates all responses

### Recursive Decomposition (Linear to Quadratic)

For complex multi-hop reasoning:

PYTHON
# Break question into sub-questions

# Process each (may involve further sub-agent calls)

# Synthesize from all findings

## Sub-Agent Protocol

Sub-agents are stateless. They receive a prompt like:

TEXT
Given this excerpt from [DOCUMENT]:

[CHUNK CONTENT - 4000 chars]

Extract all mentions of financial metrics with their values.
Respond with structured JSON.

The root agent launches multiple sub-agents in a single response. Claude Code's Task tool runs them concurrently:

PYTHON
# In SKILL.md workflow:

# 1. Chunk document

# 2. Launch N Task calls in parallel (single response)

# 3. Wait for all results

# 4. Aggregate into final answer

## Installation

Clone the skill to your Claude Code skills directory:

BASH
git clone https://github.com/joshideas/rlm ~/.claude/skills/rlm
cd ~/.claude/skills/rlm
python3 -m venv .venv
.venv/bin/pip install pdfplumber

The skill auto-activates when you invoke it with /rlm:

BASH
/rlm /path/to/document.pdf "What are the key findings?"

## Supported Formats

  • >PDF (via pdfplumber with page-aware chunking)
  • >Plain text (.txt, .log)
  • >Markdown (.md)
  • >Code files (.py, .js, .ts, etc.)
  • >Data formats (.json, .xml, .csv, .html)

## Performance Characteristics

From the original paper's benchmarks:

MetricRLMBaseline
BrowseComp+ accuracy91.33%70.47% (summarization)
OOLONG-Pairs F158.00%0.04% (base GPT-5)
Scale10M+ tokens~200K tokens
CostComparable/lowerStandard
The key advantage is scale. RLMs can process documents 100x larger than context windows while maintaining accuracy.

## Example Session

TEXT
User: /rlm ~/docs/annual-report-2025.pdf "What was the Q3 revenue?"

Claude: Let me load and examine this document.

$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \
~/docs/annual-report-2025.pdf info

{
"filetype": "pdf",
"total
pages": 247,
"totalchars": 1248000,
"preview": "Annual Report 2025\n\nTable of Contents..."
}

This is a 247-page PDF. Let me search for Q3 revenue information.

$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlmloader.py \
~/docs/annual-report-2025.pdf search "Q3 revenue"

[
{
"match": "Q3 revenue",
"position": 145230,
"context": "...strong performance in Q3 revenue reached $847M,
representing a 23% increase year-over-year..."
}
]

Found it. Let me get more context from that section.

$ ~/.claude/skills/rlm/.venv/bin/python ~/.claude/skills/rlm/rlm_loader.py \
~/docs/annual-report-2025.pdf lines 3200 3250

[Financial results section with full Q3 breakdown]

The Q3 2025 revenue was $847 million, up 23% year-over-year.
This was driven primarily by growth in the cloud services
division ($412M) and enterprise licensing ($289M).

## Why This Matters

Traditional RAG systems chunk documents into a vector database, then retrieve semantically similar chunks. This works for simple lookups but fails on:

RLMs preserve document structure and enable programmatic navigation. The LLM reasons about how to extract information, not just what information to retrieve.

## Limitations

## Conclusion

The Recursive Language Model paradigm treats documents as environments rather than inputs. Instead of consuming context with raw text, the LLM writes code to navigate, search, and extract only what's needed.

For Claude Code users working with large documents, RLM provides a practical implementation of this approach. Install the skill, point it at your documents, and ask questions at any scale.

The skill is available at github.com/joshideas/rlm.