Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables
# Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables
When Drupal needs to handle 50,000+ records efficiently, standard patterns break down. This guide covers the architectural decisions that enable enterprise-scale data processing.
## The Flat Table Philosophy
Drupal's Entity API is powerful but slow at scale. The solution: flat tables that mirror entity data in query-optimized structures.
/
Flat table schema for order processing.
/
function mymoduleschema() {
$schema['orderflat'] = [
'fields' => [
'orderid' => ['type' => 'int', 'unsigned' => TRUE],
'customerid' => ['type' => 'int', 'unsigned' => TRUE],
'status' => ['type' => 'varchar', 'length' => 32],
'total' => ['type' => 'numeric', 'precision' => 10, 'scale' => 2],
'created' => ['type' => 'int'],
'processed' => ['type' => 'int', 'default' => 0],
],
'primary key' => ['orderid'],
'indexes' => [
'statuscreated' => ['status', 'created'],
'customerstatus' => ['customerid', 'status'],
],
];
return $schema;
}### Why Flat Tables Win
| Approach | Query Time (50K records) | Memory Usage |
|---|---|---|
| Entity API | 45+ seconds | 512MB+ |
| Views | 12-15 seconds | 256MB |
| Flat Table | 0.3 seconds | 8MB |
"Every JOIN you eliminate is a gift to your database."
## Queue Worker Architecture
Drupal's Queue API processes items asynchronously. Here's an optimized worker:
/
Processes orders in batches.
@QueueWorker(
id = "orderprocessor",
title = @Translation("Order Processor"),
cron = {"time" = 60}
)
/
class OrderProcessor extends QueueWorkerBase {
public function processItem($data) {
$order
id = $data['orderid'];
// Use direct database queries for speed
$connection = \Drupal::database();
$connection->update('order
flat')
->fields(['status' => 'processing', 'processed' => time()])
->condition('orderid', $orderid)
->execute();
// Perform actual processing
$this->executeOrderLogic($orderid);
$connection->update('orderflat')
->fields(['status' => 'completed'])
->condition('orderid', $orderid)
->execute();
}
}
## The Object Catalog Pattern
When entities reference complex objects, the Object Catalog provides efficient lookups:
class ObjectCatalog {
protected $cache = [];
public function get($type, $id) {
$key = "{$type}:{$id}";
if (!isset($this->cache[$key])) {
$this->cache[$key] = $this->load($type, $id);
}
return $this->cache[$key];
}
public function preload($type, array $ids) {
// Batch load to minimize queries
$missing = arraydiff($ids, arraykeys($this->cache));
if (!empty($missing)) {
$items = $this->loadMultiple($type, $missing);
foreach ($items as $id => $item) {
$this->cache["{$type}:{$id}"] = $item;
}
}
}
}
## Bypass Flags for Bulk Operations
Hooks and event subscribers add overhead. For bulk operations, bypass them:
function processbulkorders(array $orderids) {
// Disable entity hooks temporarily
$state = \Drupal::state();
$state->set('mymodule.bulkprocessing', TRUE);
try {
foreach (arraychunk($orderids, 100) as $batch) {
processorderbatch($batch);
}
} finally {
$state->set('mymodule.bulkprocessing', FALSE);
}
}
// In hook implementations:
function mymoduleentityupdate($entity) {
if (\Drupal::state()->get('mymodule.bulkprocessing')) {
return; // Skip during bulk operations
}
// Normal processing
}
## Cron Job Optimization
Default Drupal cron runs everything sequentially. Split heavy tasks:
/
Implements hookcron().
/
function mymodulecron() {
// Light tasks only - heavy processing goes to queues
$queue = \Drupal::queue('orderprocessor');
// Find unprocessed orders
$order
ids = \Drupal::database()
->select('orderflat', 'o')
->fields('o', ['orderid'])
->condition('status', 'pending')
->range(0, 1000) // Limit per cron run
->execute()
->fetchCol();
foreach ($orderids as $orderid) {
$queue->createItem(['orderid' => $orderid]);
}
}
## Performance Monitoring
Track these metrics in production:
- >Queue depth over time
- >Processing rate (items/minute)
- >Error rate by queue
- >Memory usage** per worker
function logqueuemetrics() {
$queues = ['orderprocessor', 'syncworker', 'emailsender'];
foreach ($queues as $queue
name) {
$queue = \Drupal::queue($queuename);
\Drupal::logger('metrics')->info('Queue @name: @count items', [
'@name' => $queuename,
'@count' => $queue->numberOfItems(),
]);
}
}## Conclusion
Scaling Drupal requires breaking free from the Entity API's convenience for raw performance. Flat tables, optimized queues, and smart caching transform a content management system into an enterprise data processor.
The patterns here handle 50,000+ records without breaking a sweat. Your mileage will vary based on hardware, but the architecture remains the same.