Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables
# Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables
When Drupal needs to handle 50,000+ records efficiently, standard patterns break down. This guide covers the architectural decisions that enable enterprise-scale data processing.
## The Flat Table Philosophy
Drupal's Entity API is powerful but slow at scale. The solution: flat tables that mirror entity data in query-optimized structures.
php/** * Flat table schema for order processing. */ function mymodule_schema() { $schema['order_flat'] = [ 'fields' => [ 'order_id' => ['type' => 'int', 'unsigned' => TRUE], 'customer_id' => ['type' => 'int', 'unsigned' => TRUE], 'status' => ['type' => 'varchar', 'length' => 32], 'total' => ['type' => 'numeric', 'precision' => 10, 'scale' => 2], 'created' => ['type' => 'int'], 'processed' => ['type' => 'int', 'default' => 0], ], 'primary key' => ['order_id'], 'indexes' => [ 'status_created' => ['status', 'created'], 'customer_status' => ['customer_id', 'status'], ], ]; return $schema; }
### Why Flat Tables Win
| Approach | Query Time (50K records) | Memory Usage |
|---|---|---|
| Entity API | 45+ seconds | 512MB+ |
| Views | 12-15 seconds | 256MB |
| Flat Table | 0.3 seconds | 8MB |
"Every JOIN you eliminate is a gift to your database."
## Queue Worker Architecture
Drupal's Queue API processes items asynchronously. Here's an optimized worker:
php/** * Processes orders in batches. * * @QueueWorker( * id = "order_processor", * title = @Translation("Order Processor"), * cron = {"time" = 60} * ) */ class OrderProcessor extends QueueWorkerBase { public function processItem($data) { $order_id = $data['order_id']; // Use direct database queries for speed $connection = \Drupal::database(); $connection->update('order_flat') ->fields(['status' => 'processing', 'processed' => time()]) ->condition('order_id', $order_id) ->execute(); // Perform actual processing $this->executeOrderLogic($order_id); $connection->update('order_flat') ->fields(['status' => 'completed']) ->condition('order_id', $order_id) ->execute(); } }
## The Object Catalog Pattern
When entities reference complex objects, the Object Catalog provides efficient lookups:
phpclass ObjectCatalog { protected $cache = []; public function get($type, $id) { $key = "{$type}:{$id}"; if (!isset($this->cache[$key])) { $this->cache[$key] = $this->load($type, $id); } return $this->cache[$key]; } public function preload($type, array $ids) { // Batch load to minimize queries $missing = array_diff($ids, array_keys($this->cache)); if (!empty($missing)) { $items = $this->loadMultiple($type, $missing); foreach ($items as $id => $item) { $this->cache["{$type}:{$id}"] = $item; } } } }
## Bypass Flags for Bulk Operations
Hooks and event subscribers add overhead. For bulk operations, bypass them:
phpfunction process_bulk_orders(array $order_ids) { // Disable entity hooks temporarily $state = \Drupal::state(); $state->set('mymodule.bulk_processing', TRUE); try { foreach (array_chunk($order_ids, 100) as $batch) { process_order_batch($batch); } } finally { $state->set('mymodule.bulk_processing', FALSE); } } // In hook implementations: function mymodule_entity_update($entity) { if (\Drupal::state()->get('mymodule.bulk_processing')) { return; // Skip during bulk operations } // Normal processing }
## Cron Job Optimization
Default Drupal cron runs everything sequentially. Split heavy tasks:
php/** * Implements hook_cron(). */ function mymodule_cron() { // Light tasks only - heavy processing goes to queues $queue = \Drupal::queue('order_processor'); // Find unprocessed orders $order_ids = \Drupal::database() ->select('order_flat', 'o') ->fields('o', ['order_id']) ->condition('status', 'pending') ->range(0, 1000) // Limit per cron run ->execute() ->fetchCol(); foreach ($order_ids as $order_id) { $queue->createItem(['order_id' => $order_id]); } }
## Performance Monitoring
Track these metrics in production:
- >Queue depth over time
- >Processing rate (items/minute)
- >Error rate by queue
- >Memory usage per worker
phpfunction log_queue_metrics() { $queues = ['order_processor', 'sync_worker', 'email_sender']; foreach ($queues as $queue_name) { $queue = \Drupal::queue($queue_name); \Drupal::logger('metrics')->info('Queue @name: @count items', [ '@name' => $queue_name, '@count' => $queue->numberOfItems(), ]); } }
## Conclusion
Scaling Drupal requires breaking free from the Entity API's convenience for raw performance. Flat tables, optimized queues, and smart caching transform a content management system into an enterprise data processor.
The patterns here handle 50,000+ records without breaking a sweat. Your mileage will vary based on hardware, but the architecture remains the same.