Digest: Insights on ClickHouse Optimization at Cloudflare

Cloudflare's Billing Pipeline Hits Unexpected Slowdown

Cloudflare’s billing pipeline stumbled into unexpected slowdowns after a seemingly straightforward change: shifting to per-namespace data partitioning for retention. The tweak aimed to improve data organization but instead flooded the system with a surge of tiny data parts. This proliferation triggered severe lock contention during query planning in ClickHouse, grinding query speeds to a crawl. Digging into the issue, Cloudflare’s engineers uncovered that the root cause wasn’t just the volume of data parts but how ClickHouse managed access to the table’s part list. Mutex locks piled up, creating bottlenecks that delayed query execution. What started as a structural refinement morphed into a performance headache—one that demanded a deep dive into ClickHouse’s internals and a rethink of concurrency controls.

Tracing Reveals Lock Contention and Mutex Waits

Tracing the slowdown started with digging into ClickHouse’s internal metrics and logs. Cloudflare’s engineers noticed a sharp rise in lock contention during query planning. The culprit: the system was spending excessive time waiting on mutexes guarding the table’s part list. This list grew substantially after the partitioning change, ballooning the number of data parts from a handful to several hundred per table. Flame graphs painted a clear picture. Threads stalled repeatedly, blocked by exclusive locks that serialized access to the parts metadata. Every query had to acquire these locks to read the parts list, and with so many parts, the lock waits multiplied. The mutex guarding the list became a choke point, turning what should have been lightweight metadata reads into a bottleneck. The team traced this back to the design of the part list structure and its synchronization strategy. Originally, exclusive locks ensured consistency but didn’t scale well when the number of parts exploded. The increased granularity of partitions for per-namespace retention inadvertently caused a spike in lock contention. This revelation shifted the focus from raw query execution to the underlying concurrency controls. The engineers began exploring alternatives to reduce exclusive lock usage. They identified three main areas for optimization: replacing exclusive locks with shared locks where possible, cutting down on expensive vector copying during lock acquisition, and introducing binary search to speed up part lookups within the list. Each optimization targeted the lock contention directly. Switching to shared locks allowed multiple readers to proceed in parallel, slashing mutex wait times. Eliminating vector copying reduced overhead inside the critical section. Binary search sped up access, shortening the time locks were held. This work not only restored query performance but also improved ClickHouse’s scalability in high-partition environments. The detailed tracing and flame graph analysis were crucial in pinpointing the exact lock contention patterns, guiding the team’s targeted interventions.

How Partitioning Changes Affected Query Performance

Cloudflare’s decision to shift to per-namespace retention meant changing how data was partitioned in ClickHouse. Instead of fewer, larger partitions, the system now handled many more, smaller partitions. This architectural tweak was intended to improve data management granularity but had an unintended side effect: query performance took a hit. More partitions mean more data parts for ClickHouse to track and manage during query execution. Each query needs to lock these parts to ensure consistency, but the sheer increase in parts led to a surge in lock contention. The query planner found itself waiting on mutexes far longer than before, causing noticeable slowdowns. This wasn’t a simple scaling issue. The internal data structures that keep track of parts became hotspots for contention. Every additional partition added overhead, multiplying the time spent acquiring and releasing locks. The problem compounded quickly, especially under heavy query loads. Cloudflare’s engineers had to dig deep into ClickHouse’s internals to understand the root cause. They traced the problem to the way the system copied vectors of parts and how it handled locking—both of which became bottlenecks with the new partitioning scheme. Without addressing these, the billing pipeline’s performance would continue to degrade as data grew. This case highlights how a seemingly straightforward change in data partitioning can ripple through system internals, impacting performance in unexpected ways. It also underscores the importance of aligning data architecture decisions with the underlying database mechanics, especially for high-throughput environments.

Optimizations That Cut Latency and Boost Stability

The optimizations Cloudflare deployed didn’t just patch a performance hiccup—they reshaped how ClickHouse handles heavy workloads under complex partitioning schemes. Switching from exclusive to shared locks reduced the lock contention that had been throttling query planning. This change alone cut down wait times dramatically, allowing parallel queries to proceed without stepping on each other’s toes. Removing unnecessary vector copying streamlined memory usage and CPU cycles. It’s a subtle tweak, but in high-throughput environments, shaving off microseconds per operation scales into noticeable latency improvements. Meanwhile, introducing binary search to locate data parts replaced a linear scan that had grown painfully slow as partition counts ballooned. This optimized lookup slashed the overhead for every query, directly boosting responsiveness. For Cloudflare, these refinements restored stability to a critical billing pipeline, ensuring that data retention policies could run without choking the system. For the broader user base of ClickHouse, the enhancements translate into more robust handling of large, fragmented datasets. Operators juggling fine-grained partitions will find fewer surprises in query slowdowns and better predictability under load. The fixes also reinforce a broader lesson: architectural shifts in data layout require holistic consideration of downstream effects on concurrency and data access patterns. Cloudflare’s methodical tracing and targeted optimizations offer a blueprint for others facing similar scaling challenges. It’s a reminder that even mature systems like ClickHouse can benefit from iterative refinement, especially as usage patterns evolve. In practical terms, teams relying on ClickHouse for analytics or billing functions should monitor partition growth closely and consider these optimization strategies proactively. The open source contributions from Cloudflare mean these improvements are now accessible to anyone wrestling with similar bottlenecks, leveling the playing field for large-scale, latency-sensitive applications.

Lessons for Database Tuning at Scale

The Cloudflare experience underscores how seemingly straightforward schema changes can ripple into complex performance bottlenecks. Adding per-namespace partitions increased data parts dramatically, which in turn exposed latent contention issues within ClickHouse’s query planning internals. It’s a reminder that scaling database workloads often requires more than just hardware or raw parallelism; subtle coordination costs between threads and internal data structures can dominate latency. Watching how Cloudflare’s engineers dug into mutex wait patterns and lock granularity offers valuable lessons. Their shift from exclusive to shared locks and the replacement of linear scans with binary search reduced overhead sharply. These aren’t headline-grabbing architectural overhauls but careful, targeted optimizations that restored throughput without compromising correctness. It highlights the importance of profiling at the right level of detail and being willing to rethink assumptions baked into core data structures. For those managing large-scale analytical databases, the next signals worth tracking involve how query engines handle metadata complexity as data volumes and partition counts grow. Will future ClickHouse versions introduce more adaptive locking strategies or lock-free data structures to mitigate these issues? How will other open source projects respond to similar scaling challenges? Cloudflare’s contributions back to ClickHouse hint at a collaborative path forward, but the pressure on query planners and schedulers will only intensify. In the meantime, the practical takeaway is clear: database tuning at scale demands a blend of deep instrumentation, patience, and incremental refinement. The devil lies in the details—lock contention, data copying overhead, and search algorithms inside the engine are just as critical as indexing or query rewriting. Keeping an eye on these micro-level signals will help teams avoid surprises when growth hits a tipping point.

Ссылка на первоисточник

Article author

Mark Evans

Tech Enthusiast & AI Explorer

Mark is a seasoned technology writer with over two decades of experience. At 46, he focuses on testing and reviewing emerging AI tools, breaking down complex innovations into clear, actionable insights.

When Defenders Turn Weapon: The Botnet Attacks Tied to a Brazilian DDoS Firm

A Brazilian DDoS protection company, Huge Networks, was linked to a botnet launching massive attacks on local ISPs. The CEO blames a breach…

3 min read Read

Cybersecurity 500

OWASP’s Bold Vision to Eliminate Insecure Software

The OWASP Foundation’s 2026 strategic plan aims to eradicate insecure software by mobilizing a global community around education, open inno…

3 min read Read

Cybersecurity 460

Scorched Earth 2000 Reloaded for Browsers

Scorched Earth 2000, the classic DOS artillery game, has been ported to run directly in modern browsers using HTML and JavaScript. AI tools…

3 min read Read

Cybersecurity 650

Cloudflare Cuts 1,100+ Jobs, Blames Agentic AI Shift

Cloudflare is cutting over 1,100 jobs worldwide, citing a 600% surge in internal AI use as the driver behind a structural realignment aimed…

3 min read Read

Canvas Breach Disrupts Schools & Colleges Nationwide – Krebs on Security

Cybersecurity 250

Canvas Cybersecurity Breach Shakes U.S. Education System

The ShinyHunters group exposed sensitive data of 275 million students and faculty across nearly 9,000 U.S. schools. Instructure’s repeated…

3 min read Read

Cybersecurity 540

Numa v0.14 Launches Self-Hosted ODoH Relay

Numa v0.14 debuts a self-hosted Oblivious DNS over HTTPS relay bundled in a single Rust binary, offering a privacy-preserving DNS architect…

3 min read Read

Referensi baru untuk mengoptimalkan AI generatif di Google Penelusuran  |  Google Search Central Blog  |  Google for Developers

Cybersecurity 680

SEO and Generative AI: Key Insights from Google's New Resource

Google’s fresh guide on AI-optimized SEO cuts through hype to stress that originality and user value still drive search rankings. It debunk…

3 min read Read

On-Prem Microsoft Exchange Server CVE-2026-42897 Exploited via Crafted Email

Cybersecurity 770

Microsoft Exchange Server Under Active Attack: CVE-2026-42897 Exploitation Demands Immediate Action

A critical cross-site scripting flaw in Microsoft Exchange Server 2016, 2019, and Subscription Edition is actively exploited via malicious…

3 min read Read