Comparison of ARMv8-A cores
This is a table of 64/32-bit ARMv8-A architecture cores comparing microarchitectures which implement the AArch64 instruction set and mandatory or optional extensions of it. Most chips support 32-bit AArch32 for legacy applications, while the Falkor data center chip does not. All chips of this type have a floating-point unit (FPU) that is better than the one in older ARMv7 and NEON (SIMD) chips. Some of these chips have coprocessors, such as the AppliedMicro Helix that also includes cores from the older 32-bit architecture (ARMv7). Some of the chips are SoCs and can combine both ARM Cortex-A53 and ARM Cortex-A57, such as the Samsung Exynos 7 Octa.
Table
Company | Core | Released | Revision | Decode | Pipeline depth |
Out-of-order execution |
Branch prediction |
big.LITTLE role | Execution ports |
Fab (in nm) |
L1 cache Instr + Data (in KiB) |
L2 cache | L3 cache | Core configu- rations |
DMIPS/ MHz |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ARM Holdings | Cortex-A32 (32-bit)[1] | ARMv8.0-A (only 32-bit) | ? | LITTLE | 28[2] | 8–32 + 8–32 | 0–1 MiB | No | 1-4+ | ||||||
Cortex-A35[3] | ARMv8.0-A | 2-wide[4] | 8 | No | Yes | LITTLE | ? | 28 / 16 / 14 / 10 | 8–64 + 8–64 | 0 / 128 KiB–1 MiB | No | 1–4+ | 1.78 | ||
Cortex-A53[5] | ARMv8.0-A | 2-wide | 8 | No | Conditional+ Indirect branch prediction | big/LITTLE | 2 | 28 / 20 / 16 / 14 / 10 | 8–64 + 8–64 | 128 KiB–2 MiB | No | 1–4+ | 2.24 | ||
Cortex-A55[6] | ARMv8.2-A | 2-wide | 8 | No | big/LITTLE | 2 | 28 / 20 / 16 / 14 / 10 | 16–64 + 16–64 | 0–256 KiB/core | 0–4 MiB | 1–8+ | ? | |||
Cortex-A57 | ARMv8.0-A | 3-wide | 15 | Yes 8-wide dispatch | Two-level | big | 8 | 28 / 20 / 16[7] / 14 | 48 + 32 | 0.5–2 MiB | No | 1–4+ | 4.6 | ||
Cortex-A72[8] | ARMv8.0-A | 3-wide | 15 | Yes 8-wide dispatch | Two-level | big | 8 | 28 / 16 | 48 + 32 | 0.5–4 MiB | No | 1–4+ | 4.72 | ||
Cortex-A73[9] | ARMv8.0-A | 2-wide | 11–12 | Yes 7-wide dispatch | Two-level | big | 7 | 28 / 16 / 10 | 64 + 32/64 | 1–8 MiB | No | 1–4+ | ~6.35 | ||
Cortex-A75[6] | ARMv8.2-A | 3-wide | 11–13 | Yes 8-wide dispatch | Two-level | big | 8 | 28 / 16 / 10 | 64 + 64 | 256–512 KiB/core | 0–4 MiB | 1–8+ | ? | ||
Cortex-A76[10] | ARMv8.2-A | 4-wide | 11–13 | Yes 8-wide dispatch | Yes | big | 8 | 7 | 64 + 64 | 256–512 KiB/core | 1–4 MiB | 1–4 | ? | ||
Apple Inc. | Cyclone[11] | ARMv8.0-A | 6-wide[12] | 16[12] | Yes[12] | Yes | No | 9[12] | 28[13] | 64 + 64[12] | 1 MiB[12] | 4 MiB[12] | 2[14] | ? | |
Typhoon | ARMv8.0‑A | 6-wide[15] | 16[15] | Yes[15] | Yes | No | 9 | 20 | 64 + 64[12] | 1 MiB[15] | 4 MiB[12] | 2, 3 (A8X) | ? | ||
Twister | ARMv8.0‑A | 6-wide[15] | 16[15] | Yes[15] | Yes | No | 9 | 16 / 14 | 64 + 64[15] | 3 MiB[15] | 4 MiB[15] | 2 | ? | ||
Hurricane | ARMv8.0‑A | 7-wide[16] | 16 | Yes | Yes | "big" (In A10/A10X paired with "LITTLE" Zephyr cores) |
9 | 16 (A10) 10 (A10X) |
64 + 64[17] | 3 MiB[17] (A10) 8 MiB (A10X) |
4 MiB[17] (A10) No (A10X) |
2 + 2× Zephyr (A10) 3 + 3x Zephyr (A10X) |
? | ||
Monsoon | ARMv8.2‑A[18] | 7-wide | 16 | Yes | Yes | "big" (In Apple A11 paired with "LITTLE" Mistral cores) |
9 | 10 | 64 + 64[19] | 8 MiB | No | 2 + 4× Mistral | ? | ||
Vortex | ARMv8.3‑A[20] | 7-wide | 16 | Yes | Yes | "big" (In Apple A12 paired with "LITTLE" Tempest cores) |
9 | 7 | 128 + 128[19] | 8 MiB | No | 2 + 4x Tempest | ? | ||
Nvidia | Denver[21][22] | ARMv8‑A | 2-wide hardware decoder, up to 7-wide variable- length VLIW micro-ops | 13 | Not if the hardware decoder is in use. Can be provided by dynamic software translation into VLIW. |
Direct+ Indirect branch prediction | No | 7 | 28 | 128 + 64 | 2 MiB | No | 2 | ? | |
Denver 2[23] | ARMv8‑A | ? | 13 | ? | ? | "Super" Nvidia's own implementation | ? | 16 | 128+64 | 2 MiB | No | 2 | ? | ||
Cavium | ThunderX[24][25] | ARMv8-A | 2-wide | ? | No | Two-level | ? | 28 | 78 + 32[26][27] | 16 MiB[26][27] | No | 8–16, 24–48 | ? | ||
ThunderX2[25] (ex. Broadcom Vulcan[28]) |
May 2018[29] | ARMv8.1-A [30] | 8-wide "4 μops"[31][32] "quad-threaded" | ? | Yes[33] | Multi-level | ? | ? | 16[34] | 32 + 32 (data 8-way) | 256KB per core[35] | 1MB per core[35] | 16-32[35] | ? | |
AppliedMicro | Helix | ? | ? | ? | ? | ? | ? | ? | ? | 40 / 28 | 32 + 32 (per core; write-through w/parity)[36] | 256 KiB shared per core pair (with ECC) | 1 MiB/core | 2, 4, 8 | ? |
X-Gene | ? | 4-wide | 15 | Yes | ? | ? | ? | 40[37] | 8 MiB | 8 | 4.2 | ||||
X-Gene 2 | ? | 4-wide | 15 | Yes | ? | ? | ? | 28[38] | 8 MiB | 8 | 4.2 | ||||
X-Gene 3[38] | ? | ? | ? | ? | ? | ? | ? | 16 | ? | ? | 32 MiB | 32 | ? | ||
Qualcomm | Kryo | ARMv8-A | ? | ? | Yes | Two-level? | "big" or "LITTLE" Qualcomm's own similar implementation | ? | 14[39] | 32+32[40] | 0.5–1 MiB | 2, 4 | 6.3 | ||
Kryo 2XX | ARMv8-A | yes | 10 LPE[41] | ||||||||||||
Kryo 3XX | ARMv8.2-A | dynamiQ | 10 LPP[41] | 64+64[41] | 0.5 + 1 MiB | 2 MiB | 4+4 | ||||||||
Falkor[42][43] | 11-8-2017[44] | "ARMv8.1-A features";[43] AArch64 only (not 32-bit)[43] | 4-wide | 10–15 | Yes 8-wide dispatch | Yes | ? | 8 | 10 | 88[43] + 32 | 500KiB | 1.25MiB | 40-48 | ? | |
Samsung | M1/M2[45][46] | 2015 | ARMv8-A | 4-wide | 13[47] | Yes 9-wide dispatch[48] | Two-level | big | 8 | 14 / 10 | 64 + 32 | 2 MiB[49] | no | 4 | ? |
M3[50][47] | 2018 | ARMv8-A | 6-wide | 15 | Yes 12-wide dispatch | Two-level | big | 12 | 10 | Unknown | 512 KiB per core | 4096KB | 4 | ? | |
Company | Core | Released | Revision | Decode | Pipeline depth |
Out-of-order execution |
Branch prediction |
big.LITTLE role | Execution ports |
Fab (in nm) |
L1 cache Instr + Data (in KiB) |
L2 cache | L3 cache | Core configu- rations |
DMIPS/ MHz |
As Dhrystone (implied in "DMIPS") is a synthetic benchmark developed in 1980s, it is no longer representative of prevailing workloads – use with caution.
See also
References
- ↑ Frumusanu, Andrei (22 February 2016). "ARM Announces Cortex-A32 IoT and Embedded Processor". Anandtech.com. Retrieved 13 June 2016.
- ↑ "New Ultra-efficient ARM Cortex-A32 Processor Expands… - ARM". www.arm.com. Retrieved 2016-10-01.
- ↑ "Cortex-A35 Processor". ARM. ARM Ltd.
- ↑ Frumusanu, Andrei. "ARM Announces New Cortex-A35 CPU - Ultra-High Efficiency For Wearables & More".
- ↑ "Cortex-A53 Processor". ARM. ARM Ltd.
- 1 2 Matt, Humrick (29 May 2017). "Exploring DynamIQ and ARM's New CPUs: Cortex-A75, Cortex-A55". Anandtech.com. Retrieved 29 May 2017.
- ↑ "TSMC Delivers First Fully Functional 16FinFET Networking Processor". TSMC. 25 September 2014. Retrieved 19 February 2015.
- ↑ Frumusanu, Andrei. "ARM Reveals Cortex-A72 Architecture Details". Anandtech. Retrieved 25 April 2015.
- ↑ Frumusanu, Andrei (29 May 2016). "The ARM Cortex A73 - Artemis Unveiled". Anandtech.com. Retrieved 31 May 2016.
- ↑ Frumusanu, Andrei (31 May 2018). "ARM Cortex-A76 CPU Unveiled". Anandtech. Retrieved 1 June 2018.
- ↑ Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: The Move to 64-bit". AnandTech. Retrieved 3 July 2014.
- 1 2 3 4 5 6 7 8 9 Lal Shimpi, Anand (31 March 2014). "Apple's Cyclone Microarchitecture Detailed". AnandTech. Retrieved 3 July 2014.
- ↑ Dixon-Warren, Sinjin (20 January 2014). "Samsung 28nm HKMG Inside the Apple A7". Chipworks. Retrieved 3 July 2014.
- ↑ Lal Shimpi, Anand (17 September 2013). "The iPhone 5s Review: A7 SoC Explained". AnandTech. Retrieved 3 July 2014.
- 1 2 3 4 5 6 7 8 9 10 Ho, Joshua; Smith, Ryan (2 Nov 2015). "The Apple iPhone 6s and iPhone 6s Plus Review". AnandTech. Retrieved 13 Feb 2016.
- ↑ "Apple had shifted the microarchitecture in Hurricane (A10) from a 6-wide decode from to a 7-wide decode". AnandTech. October 5, 2018.
- 1 2 3 "Apple A10 Fusion". system-on-a-chip.specout.com. Retrieved 2016-10-01.
- ↑ "Apple A11 New Instruction Set Extensions" (PDF). Apple Inc. June 8, 2018.
- 1 2 "Measured and Estimated Cache Sizes". AnandTech. October 5, 2018.
- ↑ "Apple A12 Pointer Authentication Codes". Jonathan Levin, @Morpheus. September 12, 2018.
- ↑ Stam, Nick (11 August 2014). "Mile High Milestone: Tegra K1 "Denver" Will Be First 64-bit ARM Processor for Android". NVidia. Retrieved 11 August 2014.
- ↑ Gwennap, Linley. "Denver Uses Dynamic Translation to Outperform Mobile Rivals". The Linley Group. Retrieved 24 April 2015.
- ↑ Ho, Joshua (25 August 2016). "Hot Chips 2016: NVIDIA Discloses Tegra Parker Details". Anandtech. Retrieved 25 August 2016.
- ↑ De Gelas, Johan (16 December 2014). "ARM Challenging Intel in the Server Market". Anandtech. Retrieved 8 March 2017.
- 1 2 De Gelas, Johan (15 June 2016). "Investigating the Cavium ThunderX". Anandtech. Retrieved 8 March 2017.
- 1 2 "64-bit Cortex Platform To Take On x86 Servers In The Cloud". electronic design. 5 June 2014. Retrieved 7 February 2015.
- 1 2 "ThunderX_CP™ Family of Workload Optimized Compute Processors" (PDF). Cavium. 2014. Retrieved 7 February 2015.
- ↑ "⚙ D30510 Vulcan is now ThunderX2T99". reviews.llvm.org.
- ↑ Kennedy, Patrick (7 May 2018). "Cavium ThunderX2 256 Thread Arm Platforms Hit General Availability". Retrieved 10 May 2018.
- ↑ "⚙ D21500 [AARCH64] Add support for Broadcom Vulcan". reviews.llvm.org.
- ↑ https://hpcuserforum.com/presentations/santafe2014/Broadcom%20Monday%20night.pdf
- ↑ "The Linley Group - Processor Conference 2013". www.linleygroup.com.
- ↑ "ThunderX2 ARM Processors- A Game Changing Family of Workload Optimized Processors for Data Center and Cloud Applications - Cavium". www.cavium.com.
- ↑ "Broadcom Announces Server-Class ARMv8-A Multi-Core Processor Architecture". Broadcom. 15 October 2013. Retrieved 11 August 2014.
- 1 2 3 Kennedy, Patrick (9 May 2018). "Cavium ThunderX2 Review and Benchmarks a Real Arm Server Option". Serve the Home. Retrieved 10 May 2018.
- ↑ Ganesh T S (3 October 2014). "ARMv8 Goes Embedded with Applied Micro's HeliX SoCs". AnandTech. Retrieved 9 October 2014.
- ↑ Morgan, Timothy Prickett (12 August 2014). "Applied Micro Plots Out X-Gene ARM Server Future". Enterprisetech. Retrieved 9 October 2014.
- 1 2 De Gelas, Johan (15 March 2017). "AppliedMicro's X-Gene 3 SoC Begins Sampling". Anandtech. Retrieved 15 March 2017.
- ↑ "Snapdragon 820 and Kryo CPU: heterogeneous computing and the role of custom compute". Qualcomm. 2 September 2015. Retrieved 6 September 2015.
- ↑ Frumusanu, Ryan Smith, Andrei. "The Qualcomm Snapdragon 820 Performance Preview: Meet Kryo".
- 1 2 3 Smith, Andrei Frumusanu, Ryan. "The Snapdragon 845 Performance Preview: Setting the Stage for Flagship Android 2018". Retrieved 2018-06-11.
- ↑ Shilov, Anton (16 December 2016). "Qualcomm Demos 48-Core Centriq 2400 SoC in Action, Begins Sampling". Anandtech. Retrieved 8 March 2017.
In 2015, Qualcomm teamed up with Xilinx and Mellanox to ensure that its server SoCs are compatible with FPGA-based accelerators and data-center connectivity solutions (the fruits of this partnership will likely emerge in 2018 at best).
- 1 2 3 4 Cutress, Ian (20 August 2017). "Analyzing Falkor's Microarchitecture". Anandtech. Retrieved 21 August 2017.
The CPU cores, code named Falkor, will be ARMv8.0 compliant although with ARMv8.1 features, allowing software to potentially seamlessly transition from other ARM environments (or need a recompile). The Centriq 2400 family is set to be AArch64 only, without support for AArch32: Qualcomm states that this saves some power and die area, but that they primarily chose this route because the ecosystems they are targeting have already migrated to 64-bit. Qualcomm’s Chris Bergen, Senior Director of Product Management for the Centriq 2400, stated that the majority of new and upcoming companies have started off with 64-bit as their base in the data center, and not even considering 32-bit, which is a reason for the AArch64-only choice here. [..] Micro-op cache / L0 I-cache with Way prediction [..] The L1 I-cache is 64KB, which is similar to other ARM architecture core designs, and also uses 64-byte lines but with an 8-way associativity. To software, as the L0 is transparent, the L1 I-cache will show as an 88KB cache.
- ↑ Shrout, Ryan (8 November 2017). "Qualcomm Centriq 2400 Arm-based Server Processor Begins Commercial Shipment". PC Per. Retrieved 8 November 2017.
- ↑ Frumusanu, Andrei. "Samsung Announces Exynos 8890 with Cat.12/13 Modem and Custom CPU".
- ↑ Ho, Joshua. "Hot Chips 2016: Exynos M1 Architecture Disclosed".
- 1 2 Frumusanu, Andrei (23 January 2018). "The Samsung Exynos M3 - 6-wide Decode with 50%+ IPC Increase". Anandtech. Retrieved 25 January 2018.
- ↑ Frumusanu, Andrei. "Hot Chips 2016: Exynos M1 Architecture Disclosed". Anandtech. Retrieved 29 May 2017.
- ↑ "'Neural network' spotted deep inside Samsung's Galaxy S7 silicon brain".
- ↑ Howse, Brett; Frumusanu, Andrei (3 January 2018). "Samsung Announces New 9810 SoC: DynamiQ & 3rd Gen CPU". Anandtech. Retrieved 25 January 2018.