Cost-efficient video processing at scale | Sonu Yadav

Processing large volumes of 360° video is compute-heavy and can get expensive quickly. Here’s how we redesigned our pipeline to cut cost by roughly 30% while keeping reliability and throughput.

Before: monolithic workers

We started with a single type of worker that did decode, process, and encode in one long job. That led to uneven utilization and over-provisioning to handle peaks.

After: staged pipeline

We split the pipeline into clear stages (ingest, decode, process, encode, publish) and sized each stage independently. Decode and encode are CPU-bound; our “process” stage is where we run the custom logic. By scaling each stage to its bottleneck, we reduced idle time and could use smaller instance types where possible.

Other wins

Spot / preemptible instances for non-critical stages, with checkpointing and retries.
Storage lifecycle: keep only the outputs and a short window of raw footage; archive or delete the rest.
Batch sizing: tuning batch size per stage so we don’t over-allocate memory or leave cores idle.

We also improved observability (metrics and tracing) so we could see where time and cost were spent and iterate further.