Qualcomm Falkor CPU core for data-centers is now official. Falkor is the custom CPU design of the Qualcomm Centriq 2400 SoC, which will begin shipping commercially later this year. It’s will come with 48 cores fabbed at 10nm. At the Hot Chips Conference this week, the company will unveil more details of the Qualcomm Falkor CPU core.
key highlights about the Falkor core
Fully custom core design:
Falkor was designed from the ground up specifically for the cloud datacenter server market. It is 64-bit micro-architecture with fully ARMv8 compliant.
Scalable building block:
The Falkor core duplex includes two custom Falkor CPUs, a shared L2 cache and a shared bus interface to the Qualcomm System Bus (QSB) ring interconnect. This modular building block serves as the foundation for our highly scalable 48-core Qualcomm Centriq 2400 SoC design.
Designed for performance, optimized for power:
Qualcomm Falkor CPU is designed to deliver high-end compute performance using a 4-issue, 8-dispatch heterogeneous pipeline. Falkor’s heterogeneous pipeline is designed to optimize performance per unit of power, with variable length pipelines that are tuned per function to maximize throughput and minimize idle hardware. Additionally, Falkor’s out-of-order and rename resources are sized to prevent instruction retirement from being in the performance-critical path, allowing unbridled usage of the multiple execution engines. Other performance-critical elements of the micro-architecture, such as branch prediction algorithms and the cache hierarchy, are state-of-the-art for today’s server class processors. A plethora of sophisticated power management techniques were baked into the design from day one, including such mechanisms as independent p-state control for each of the CPUs and L2, with entry to and exit from low-power states controlled by hardware state machines for ultra-fast state transitions, and hardware state retention for power-collapsed sleep states with ultra-fast recovery.
Performance under memory-intensive workloads:
Qualcomm Falkor CPU is designed to fulfill the demand for larger instruction footprints using an innovative split instruction cache comprised of a single-cycle, low-power 24KB L0 I-cache complementing its 64KB L1 I-cache. The two caches are managed exclusively to provide a total of 88KB of low-latency I-cache. Optimizing data performance where it’s critical, the core supports a 32KB L1 D-cache with an impressive 3-cycle load-use latency. The L1 D-cache is augmented by a sophisticated multi-level hardware prefetch engine that dynamically adapts to system conditions.
Qualcomm Falkor CPU is ready for multi-tenant and other virtualized workloads with the full suite of ARM Execution Levels (EL0-EL3) and TrustZone secure execution environment. Falkor supports the ARMv8 instruction extensions to accelerate cryptographic transform and secure hash operations needed for efficient performance when running networking security protocols such as https. The new Falkor also delivers on the RAS mechanisms needed to keep a datacenter running, such as a fault isolation, reporting, and handling techniques.
System on a chip:
The 48 Falkor CPUs are brought together in a fully-integrated SoC, obviating the real estate, power, and cost of a separate chipset. The memory efficiency of the Falkor CPU is extended throughout the SoC with an extremely high-bandwidth and low-latency ring interconnect extending out to its large L3 cache and multiple memory controllers, avoiding on-die NUMA effects. In addition, the memory subsystem includes innovative shared resource management techniques such as L3 Quality of Service (QoS) extensions and effective memory bandwidth enhancement via in-line and transparent memory compression. The SoC also supports an on-die hardware-based immutable root of trust that authenticates firmware before the first line of firmware is ever executed.
More details about the Qualcomm Centriq 2400 SoC architecture and product specifications will be available in the coming months