Closed-Loop Binary Optimization: Integrating De-Identified Production Telemetry into the Build Lifecycle
Main Article Content
Abstract
Modern optimization techniques for performance mainly operate on the final binary emitted by the compiler. Profile-Guided Optimization (PGO) is a model of performance optimization: rather than applying heuristics to select optimizations at compile time, PGO selects optimizations based on run-time profiling of the program. Static compilation cannot predict the dynamic control flow. The cache behavior will also depend on the workload running in production machines. By measuring the execution in production, compilers can learn the frequency of hot paths and the requirements of branch prediction, caches, and instruction scheduling. Instrumentation overhead is reduced by a load-test infrastructure that runs copies of production traffic. Privacy-sensitive user data is sanitized by privacy-preserving de-identification pipelines. Query structure is preserved to allow possible optimizations in the process of data management. Continuous profiling maintains its effectiveness over time as both execution environments and workloads change. Autotuning, the process of finding optimal compiler settings for the specific workload, is increasingly realized through machine learning techniques. When deployed as standard infrastructure at the production grade, binary optimization offers new economic value through better resource utilization and lower latency services, and can offer a virtuous circle of improvement for high-performance digital infrastructure everywhere through using real-world telemetry to feed into the compiler toolchain.