With the introduction of the Raspberry Pi 4, I dug into the announcement and datasheet while waiting for mine to arrive. Well, “datasheet” might be an exaggeration, but there is a list of the Raspberry Pi 4 tech specs available. That link gives a bullet item list of the main specs. In this post, I focus on these four:
- Broadcom BCM2711, Quad core Cortex-A72 (ARM v8) 64-bit SoC @ 1.5GHz
- Up to 4GB LPDDR4-2400 SDRAM (depending on model)
- Gigabit Ethernet
- 2 USB 3.0 ports; 2 USB 2.0 ports.
There is a lot of very cool stuff on this $35 board. (Amazing they have kept that price point!) Or even the 4 GB $55 board. Adding Bluetooth 5, H.265 AND H.264 decoding, USB-C connector (Power), and changing the SoC to a 28 nm process are all significant improvements.
But, being an engineer and a person on the Internet, I am required to nitpick, and armchair engineer this (now) released product (that I haven’t touched yet.) Without doubt there will be a number of benchmarks posted along with the release of the Pi 4. And I expect many of those only benchmark individual components. For example, I already saw graphs on the Gigabit Ethernet speed and USB 3.0 throughput, but not at the same time. As you will see in my datasheet evaluation, using two USB devices at the same time has a performance penalty.
First, what is new with the SoC?
Broadcom, Quad, 28nm, blah blah blah
The Raspberry Pi 4’s CPU is touted as a massive step forward from the Pi 3 B+. Admittedly, I am not an ARM expert, so it is hard for me to judge just based on part numbers. From ARM’s website) here is a chart with the differences.
I only see two differences. The A72 supports an out-of-order pipeline, and the L1/L2 caches are larger than the A53. The Pi 4 bumps the CPU clock to 1.5 GHz, that is 100 MHz faster than the Pi 3 B+. While I know these differences can result in performance improvements, it is not clear to me what kind. Hopefully, the 2-4X claim in the announcement post is valid.
Can anyone with experience on ARM architectures comment on the difference? (Without using generic terms like, “it is designed for performance!”)
Let’s not overlook the CPU’s main peripheral. Next, let’s look at the change in SDRAM.
The press release says Pi 4 triples the available memory bandwidth. The specs page says the board is using LPDDR4-2400. The cynic in me notes that it does not say the actual clock speed used. With DDR memory, you don’t have to run at the chip’s rated speed. (But why would you buy more expensive SDRAMs then?) For comparison, the Pi 3B+ used LPDDR2.
I am not surprised there is a jump past DDR3. There is not much of a market for DDR3 anymore, so prices are going up as available stock goes down.
Moving from DDR2 to DDR4 at the same clock rate does not result in a considerable performance boost. While there are some improvements, there are only so many ways to lay out massively parallel buses. So when it comes down to it, internally the difference between DDR generations is adding more multiplexers (inside the SDRAMs.) Generation updates allow for faster clock and data speeds. So the performance improvements come from increased data rate transfers.
I have not measured the data rate on the Pi 3B+, but its max clock rate is only 1066. I doubt the Pi 3B+ used the fastest DDR2 SDRAM available, so the bump in DDR clock rate can represent a significant performance improvement.
Alright, so I have mixed feelings about the processor side. I am really curious about I/O performance, which I look at next.
The Pi 3B+ offered a Gigabit Ethernet PHY. However, the USB connection to the PHY bottle necked its performance to 300 MB/s. That limitation is now gone on the Raspberry Pi 4.
The link between the CPU and Gigabit Ethernet is over an RMGII Link.
RGMII stands for Reduced Gigabit Media-Independent Interface. With most serial interfaces, there are two major pieces: a media access controller (MAC) and a physical interface (PHY). The idea behind this arrangement is that the physical connection between two devices can change just by changing the PHY. There are standard buses to connect between a MAC and PHY which make interchangeability possible.
For example, in the case of the older Raspberry Pis, the Ethernet controller had no idea there was a USB interface between the port and the CPU.
With the Raspberry Pi 4, that connection is now a standard interface known as RGMII. It supports Gigabit speeds and full duplex operation. Assuming the CPU can ingest or generate data fast enough, the Pi 4 should be capable of Gigabit Ethernet speeds. I fully expect improvement over older Pi boards from the Gigabit interface.
USB 3 (or is it?)
Like many others, I was excited to see that two of the Raspberry Pi 4’s USB ports are now USB 3.0! But, there is a snag that deflated my excitement. Before getting into their performance, let’s get on the same page about USB 3.0. Also known as SuperSpeed, USB 3.0 changed its physical layer from previous USB generations.
USB 2.0 and below used a single differential pair of wires in a half-duplex mode. That means data is sent and received using the same wires. Any system using half-duplex has a fundamental bandwidth limit since it is never possible to simultaneously send and receive information. Though, tricks like “bulk transfers” are used to push large data blocks in one direction at a time.
The USB 3.0 physical layer differs in two ways. First, it is a full duplex system, meaning, there are dedicated wires for TX and RX. Second, the bit rate runs at 5.0 Gb/s. The last info to know is that USB 3.0 shares the same physical layer as PCI-Express Gen2. As you go up in the protocol stack, USB 3.0 looks more like USB. Again, that is the benefit of separating MAC and PHY functions. Sharing the physical layer properties of PCIe-Gen 2.0 means that a single USB 3.0 device and a single PCIe-Gen 2.0 device have, roughly, the same bandwidth.
When looking at the Raspberry Pi 4, according to the announcement page, all four USB ports share a single PCIe Gen2 interface back to the SoC. I guess that there is a single USB hub controller near the physical ports. Curiously, the page says they share 4 Gb of Bandwidth. I’m not sure if that is a misprint or if they meant throughput after the protocol overhead.
Sharing that bandwidth means you do NOT get “USB 3” performance if you’re using more than one USB device. Before we get bent out of shape, let’s consider how many Raspberry Pi applications require two 5 Gb/s devices operating simultaneously. One example I can think of is video recording. You might want to use an HDMI to USB adapter for a camera and save the data to a USB flash drive. However, most HDMI adapters use h264 encoding, so the bus does not saturate even for 1080p (30 or 60 fps.)
What you’re going to find is that when you want to use multiple USB 3.0 devices, you need to consider the bandwidth required for each and how often they transfer data.
In this armchair engineering exercise, I might sound down on the Raspberry Pi 4. I am not. Again, I come back to the $35-55 price points. Without drawing up a comparison chart, I feel the jump from Pi 3 to Pi 4 is considerably higher than from Pi 2 to Pi 3. However, until I get my hands on one, I cannot make that statement with confidence. I have some software benchmarks in mind, and I am planning to poke at some of these high-speed buses with my high-bandwidth oscilloscope to see how the signal integrity looks. Make sure you follow or subscribe to know when that happens.
Any idea where to find a decent product spec – hardware and software – on the BCM 2711? Broadcomm does not have one on their web site.
Broadcomm only provides their datasheets under NDA. So you need to contact a Broadcomm sales rep.
This roadblock is one reason the Beagle Bone is a better choice for a project you intend to turn into a product.
Great analysis – thanks!!! I especially enjoyed your thoughts on the four USB 3 ports.
For me, the biggest item is availability of a model that accepts 4 GByte of RAM. I have been trying to do AI at the edge and the model I was trying to run on the Raspberry Pi 3B just would not fit in 1 GByte.
I already purchased an nVidia Jetson Nano developer board which also has 4 GB RAM. I figure even if I have trouble using the attached GPU hardware, I can still implement my model on that board.
I know some have rumored an 8GB model could happen in the future. But from what I can see, the board should be able to accept a 16GB DRAM chip. Well, once one becomes available.
The increased L1 and L2 cache may impact performance more than the clock speed bump, at least for computationally intensive operations. Since the pi is getting used for things like computer vision more and more these days this is a huge improvement.
Interesting point. Would OpenCV-Python comparisons between a 3 B+ and 4 be appropriate then?
Yeah, I’d like to see that and maybe dlib Python stuff and some cc compiling of something big like OpenCV.