0%
[/]JOSHIDEAS
SYSTEM_BLOG
TIME:13:33:12
STATUS:ONLINE
~/blog/drupal-scale-cron-queue-workers
$cat drupal-scale-cron-queue-workers.md_

Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables

#Drupal#Performance#Architecture

# Scaling Drupal: Cron Jobs, Queue Workers, and Flat Tables

When Drupal needs to handle 50,000+ records efficiently, standard patterns break down. This guide covers the architectural decisions that enable enterprise-scale data processing.

## The Flat Table Philosophy

Drupal's Entity API is powerful but slow at scale. The solution: flat tables that mirror entity data in query-optimized structures.

php
/** * Flat table schema for order processing. */ function mymodule_schema() { $schema['order_flat'] = [ 'fields' => [ 'order_id' => ['type' => 'int', 'unsigned' => TRUE], 'customer_id' => ['type' => 'int', 'unsigned' => TRUE], 'status' => ['type' => 'varchar', 'length' => 32], 'total' => ['type' => 'numeric', 'precision' => 10, 'scale' => 2], 'created' => ['type' => 'int'], 'processed' => ['type' => 'int', 'default' => 0], ], 'primary key' => ['order_id'], 'indexes' => [ 'status_created' => ['status', 'created'], 'customer_status' => ['customer_id', 'status'], ], ]; return $schema; }

### Why Flat Tables Win

ApproachQuery Time (50K records)Memory Usage
Entity API45+ seconds512MB+
Views12-15 seconds256MB
Flat Table0.3 seconds8MB

"Every JOIN you eliminate is a gift to your database."

## Queue Worker Architecture

Drupal's Queue API processes items asynchronously. Here's an optimized worker:

php
/** * Processes orders in batches. * * @QueueWorker( * id = "order_processor", * title = @Translation("Order Processor"), * cron = {"time" = 60} * ) */ class OrderProcessor extends QueueWorkerBase { public function processItem($data) { $order_id = $data['order_id']; // Use direct database queries for speed $connection = \Drupal::database(); $connection->update('order_flat') ->fields(['status' => 'processing', 'processed' => time()]) ->condition('order_id', $order_id) ->execute(); // Perform actual processing $this->executeOrderLogic($order_id); $connection->update('order_flat') ->fields(['status' => 'completed']) ->condition('order_id', $order_id) ->execute(); } }

## The Object Catalog Pattern

When entities reference complex objects, the Object Catalog provides efficient lookups:

php
class ObjectCatalog { protected $cache = []; public function get($type, $id) { $key = "{$type}:{$id}"; if (!isset($this->cache[$key])) { $this->cache[$key] = $this->load($type, $id); } return $this->cache[$key]; } public function preload($type, array $ids) { // Batch load to minimize queries $missing = array_diff($ids, array_keys($this->cache)); if (!empty($missing)) { $items = $this->loadMultiple($type, $missing); foreach ($items as $id => $item) { $this->cache["{$type}:{$id}"] = $item; } } } }

## Bypass Flags for Bulk Operations

Hooks and event subscribers add overhead. For bulk operations, bypass them:

php
function process_bulk_orders(array $order_ids) { // Disable entity hooks temporarily $state = \Drupal::state(); $state->set('mymodule.bulk_processing', TRUE); try { foreach (array_chunk($order_ids, 100) as $batch) { process_order_batch($batch); } } finally { $state->set('mymodule.bulk_processing', FALSE); } } // In hook implementations: function mymodule_entity_update($entity) { if (\Drupal::state()->get('mymodule.bulk_processing')) { return; // Skip during bulk operations } // Normal processing }

## Cron Job Optimization

Default Drupal cron runs everything sequentially. Split heavy tasks:

php
/** * Implements hook_cron(). */ function mymodule_cron() { // Light tasks only - heavy processing goes to queues $queue = \Drupal::queue('order_processor'); // Find unprocessed orders $order_ids = \Drupal::database() ->select('order_flat', 'o') ->fields('o', ['order_id']) ->condition('status', 'pending') ->range(0, 1000) // Limit per cron run ->execute() ->fetchCol(); foreach ($order_ids as $order_id) { $queue->createItem(['order_id' => $order_id]); } }

## Performance Monitoring

Track these metrics in production:

  • >Queue depth over time
  • >Processing rate (items/minute)
  • >Error rate by queue
  • >Memory usage per worker
php
function log_queue_metrics() { $queues = ['order_processor', 'sync_worker', 'email_sender']; foreach ($queues as $queue_name) { $queue = \Drupal::queue($queue_name); \Drupal::logger('metrics')->info('Queue @name: @count items', [ '@name' => $queue_name, '@count' => $queue->numberOfItems(), ]); } }

## Conclusion

Scaling Drupal requires breaking free from the Entity API's convenience for raw performance. Flat tables, optimized queues, and smart caching transform a content management system into an enterprise data processor.

The patterns here handle 50,000+ records without breaking a sweat. Your mileage will vary based on hardware, but the architecture remains the same.