In this article, I am going to take a thorough look at an interesting piece of router/firewall hardware that I recently bought off AliExpress: A fanless Intel mini PC with six Gbit Ethernet ports that should make for an ideal router in any picky user’s home network.
If you don’t feel like spending too much time on my ramblings, you’re of course welcome to hop down to the tl;dr-style section at the very end.
If you query AliExpress for “mini PC”, you get back an avalanche of results. The storefront’s filtering capabilities also don’t make it easy on the potential buyer, and you will have to be ready and willing to wade through MANY colorful item descriptions and bundle choices if you are serious about finding the right fit for the purpose you have in mind. If you do, you will be rewarded with hardware designs that you just won’t be able to find anywhere in the EU’s retail market (except maybe on eBay’s or Amazon’s marketplace platforms, where dropshippers selling these probably source them from AliExpress/Alibaba themselves).
I have been on the lookout for a new home router/firewall appliance with Gbit Ethernet connectivity recently. I would like my new router to have enough horsepower to perform Smart Queue Management on the 300Mbps downstream link my ISP currently leases to me at home, and so I decided to give AliExpress’s x86 mini PC catalog a closer look. After a few short hours of virtual window shopping, I arrived at the EGLOBAL Top MiniPC shop. Their tagline reads OEM & ODM Mini PC Manufacturer, and they have an impressive lineup of small-ish PC hardware in store. Compact and silent media center machines, powerful yet tiny desktop PCs, network/firewall appliances - if you’re looking to build any of these, they’ll probably have you covered.
For my project, the EGLOBAL Fanless pfSense Intel 3865U Industrial Mini PC caught my eye. That is quite a mouthful, and I quickly came to call it the fwbox myself. I will be referring to the piece of hardware tested here by that moniker for the remainder of the review. The fwbox’s hardware highlights, according to the manufacturer, are as follows (reproduced verbatim from the seller’s item description at the time of purchase, 2020-12-15):
- Fanless system without cooling fan, no any noise, thicker and longer heat-sink fins for working as long as 7x24 hours durably.
- 6xIntel i201/i211 Gigabit RJ45 LANs and 2xRS232 COM ports, support Windows, Linux and pfSense OS.
- 1xmSATA SSD+1x2.5” SSD/HDD installation supported.
- One embedded SIM card slot onboard, 3G/4G/WiFi optional.
- Applied in Firewall, Network Server, Network Security, VPN Router etc.
- Full Aluminum Alloy Black color shell, Exquisite production craft on outside design with high quality, slim design 209x150x57MM.
- Low consumption save much energy and eco-friendly.
- Support Watchdog, AWAL, RTC, WOL, PXE, Reset functions.
As with many other AliExpress “mini PC” barebone offers, EGLOBAL lets the buyer choose which (soldered - which means you cannot change it later, unless you replace the whole motherboard) CPU, and if an (or which) optional storage and memory components they would like installed. I chose the 2GB RAM (PC3-12800 DDR3L SO-UDIMM) and 32GB SSD (mSATA-600) bundle, so that I would be able to start testing the machine in-depth right away, without any potential problems like minor (or not-so-minor) DRAM compatibility troubles holding me back. For the CPU, I chose the Intel Celeron 3865U - a dual core Kaby Lake (originally introduced in early 2017) Mobile/Laptop-class CPU clocking at up to 1800MHz that should easily be powerful enough to juggle everything my LAN and WAN would be throwing at it, with some headroom left for other tasks. I intend to put this 3865U to the test and see if this hunch actually holds true.
Although the manufacturer offers to pre-populate the miniPCIe slot with a suitable Wi-Fi card for a relatively small premium, I did not choose that option. There are only very few miniPCIe cards on the market that make decent stations when operated in Access Point mode, especially if you want to offer 802.11ac (“Wi-Fi 5”) speeds. Apart from that, you would usually want or need two distinct radios to cover all your wireless networking needs: one for covering the 2.4GHz band, and another one for the 5GHz band. Since that would take two cards/radios/slots, there’s no way to cram that into the fwbox - and USB adapters are another can of worms on their own. So, in my view, sticking to an external AP that does those wireless things well is the better choice to make, and my unit’s miniPCIe slot will remain unpopulated for the time being.
The seller also advertises a 3y warranty, which I would imagine to be difficult to benefit from, due to the international shipping and export/import burdens involved. Therefore, I did not make any warranty concerns or promises part of my decision making process when buying this mini PC.
With EGLOBAL shipping my device with mass storage pre-installed, they were nice enough to pre-load pfSense in a fairly recent release (2.4.4-p3) for me. However, since I am not a FreeBSD expert, I will be performing my testing on GNU/Linux-based distros and operating systems instead. Experienced BSD users should not expect any nasty surprises though - the hardware really isn’t exotic, and the fact that the manufacturer actively advertises pfSense as the “expected” (and pre-loaded) OS for the fwbox certainly implies good compatibility between hardware and that choice of OS.
When I purchased the fwbox, my desired configuration was offered by EGLOBAL for US$ 193.35/EUR 163.75. DHL charged another EUR 43.69 on top of that to have the fwbox delivered to my doorstep from China in less than two weeks (even though they presumably were drowning in shipments over Christmas), while also handling import turnover tax and tariffs and the like for me. I consider that fair, and the grand total of about EUR 210 is absolutely fair for the kind of device when compared with Mini PC offers available here in Austria, like Zotac’s ZBOX et al.
The main feature and selling point of this specific mini PC variant, at least to me, is the six integrated Gbit-speed Ethernet ports. They’re all Intel i210-based and are connected to the system via PCIe (2.5GT/s each) individually - the Ethernet ports on the fwbox’s back are therefore not connected to a software-configurable, internal Ethernet switch like you would find on most SOHO routers or the like. As a consequence, if you want the fwbox to act mostly like one of these appliances, you will have to emulate whatever a switch does “in software”, on the main CPU. Luckily, Linux can do that both easily end efficiently, by configuring a bridge over as many Ethernet (and other) devices/links as you want.
Most dedicated switching hardware comes with special offloading features and performance enhancements that give it an edge over general purpose networking solutions - but with the comparatively powerful host CPU the fwbox has at its heart, combined with the decent Ethernet adapters and drivers in play, it should be able to keep up in most (if not all) scenarios that involve up to six bridged Gbit links. On the flip side, we gain the unparalleled flexibility that Linux and other free OS offer to do as we please with all frames, packets, flows, and whatever else the fwbox will have to relay and route and proxy through it.
Apart from the obviously important embedded hardware components, there’s more interesting details to explore. As you’ve already been able to glean from the specs quoted earlier, the fwbox comes with four USB-A (USB 3.0 SuperSpeed, 5Gbps) and two COM/Serial (1x9P, 1xRJ-45) ports on the front. All these are easily accessible and live right next to an HDMI port (wired up to the Intel HD Graphics 610 IGP, which makes for a very fast console framebuffer - and probably even a decent media center appliance…), if you ever need to plug in a display to diagnose potential problems on a VT. The front also features a neatly looking, upside-down power button, and a slightly sunk-in reset button, which seems practically impossible to trigger by accident.
On the back, we have the aforementioned six Ethernet ports, two activity LEDs (red for power and green for mass storage activity - both far too bright, of course), the AC in, and two knobs for connecting Wi-Fi antennas, should your configuration need them. The two corresponding internal cables that connect these antennas to the Wi-Fi card dangle loosely on the inside, and are responsible for the audible rattling noises the fwbox emits whenever you tilt or shake it. If you’re not using them, you will want to make sure there’s no possibility for them to cause any kind of harm to the internals of your system, and carefully stow them away on the fwbox’s inside.
The manufacturer advertises the case as “exquisite production craft on outside design with high quality”, and the fwbox lives up to the meaning that supposedly tries to convey. It really feels solid and well-crafted. The matte black, all-aluminum case has a certain charm to it in my eyes, and acts as a very effective heatsink to boot - the whole design is fanless, and completely inaudible under all operating conditions. Still, I wouldn’t want this little tank of a PC to have to try hiding in plain sight in the living room. Luckily, it doesn’t need to: Given its relatively small width, it fits perfectly well inside a 10” rack. If you choose that, you’ll have to be a little creative when mounting it (as it doesn’t include rails), and reserve 2HE due to its somewhat protruding heat dissipation fins, however.
Additional installation material includes antennas for use with a Wi-Fi module, alongside everything you need to drive an additional 2.5” SATA disk and attach it to the case’s solid bottom plate. For some, this might already be enough to make the fwbox into a NAS of sorts. The box also included an HDMI cable that I chose not to depict, and something I cannot quite make sense of…. that had also popped out of the very sturdily packed DHL shipping parcel.
The seller bundles a suitable power supply unit (LITE-ON PA-1600-01C-RoHS, 60W), and will also pack a power cord that works with whatever kind of plug the country you are ordering from requires. Mine did not exhibit any annoying properties: Neither did it get warm, nor was there any operating noise (like coil whine). In my book, that’s the best kind of PSU - the kind you forget it even exists.
Peeking inside the case after loosening four screws in its bottom plate, we can see the SIM card slot the description mentions, as well as a standard SATA port and miniPCIe slot - both unpopulated. Furthermore, the pre-installed 32GB mSATA SSD (QUNION P20A MSATA Series) and DDR3L SO-DIMM (Samsung M471B5674QH0-YK0) are present in expectable locations, both easy to remove or replace. (Note that if you were to order an fwbox for yourself, I would not necessarily expect these exact parts to be included, even if you happen to choose an configuration identical to the one presented - the seller doesn’t specify these components in the offer, and will presumably install whatever they have available at the time.) The SSD delivered surprisingly decent sequential read speed for a SATA device of its size on a quick first look (>500Mbytes/s), but I did not investigate it any further.
Now that we have a clear picture of the physical and bodily underpinnings of what we are dealing with, let’s see what an operating system makes of all this. For testing, I will be using grml 2020.06 “Ausgehfuahangal” - a Debian-based live system that boots off of USB storage, so that I can let the pfSense installation untouched for later inspection or use, if need be. I will also be performing tests using the current OpenWrt 19.07.5 release for x86_64 (BIOS/MBR, ext4 variant), booted from a 16GB USB2.0 thumb drive.
The motherboard’s UEFI-compliant firmware offers an impressive array of settings, far beyond what most “reputable” manufacturers of desktop or server boards expose in their firmware setup menus. One could certainly write a whole series of articles about the different options, but I’ll cut it short this way: Whatever you have seen in the BIOS/UEFI setup screen of a comparable system (excluding most overclocking features), you will find it in this one.
For my purposes, I was most interested in comfortably booting something other than the OS from the primary internal storage, and the AMI-based setup doesn’t disappoint - the user can choose to override boot options ad hoc, and both UEFI- and CSM/BIOS-based options are available. I prefer to boot grml via UEFI these days, which works flawlessly on the fwbox. Using the UEFI’s Compatibility Support Module for BIOS-compatible boot also works. I went on to give it a try with OpenWrt’s boot medium, and that also loaded GRUB and continued to boot flawlessly.
Both distros had proper console/VT output via HDMI during bootup, with keyboard (attached via USB) input working normally. All six Ethernet interfaces had their driver loaded and were brought up OK right out of the box, eager to detect a link.
Summing up, it should be very easy to make your favourite recent-ish spin of (GNU/)Linux fly on the fwbox - there’s no reason to expect the unexpected, all components are solidly compatible with today’s distro landscape.
Some things are best explained by showing the raw data - so to find out what the Linux kernel with all its drivers configured and loaded by the grml distribution makes of the fwbox, let’s have a look at helpful utilities and their diagnostic output. The fixed-width font listing is supposed to give you a preview of the content the individually linked files contain - they were screen-scraped from the output of the quoted commands. If you’ve ever used one of the mentioned tools, I hope you will feel right at home.
Here, we have a look at the system’s DMI data, the kernel debug ringbuffer shortly after booting the system, and the list of modules that the kernel and udev decided/tried to load for all the hardware detected in the system.
# dmidecode 3.2 Getting SMBIOS data from sysfs.
[ 0.000000] Linux version 5.6.0-2-amd64 (firstname.lastname@example.org) (gcc version 9.3.0 (Debian 9.3.0-13)) #1 SMP Debian 5.6.14-2 (2020-06-09) [ 0.000000] Command line: BOOT_IMAGE=/boot/grml64full/vmlinuz apm=power-off boot=live nomce net.ifnames=0 findiso= toram=grml64-full.squashfs live-media-path=/live/grml64-full/ bootid=fc568f09-81be-4f50-b280-a6a7659b2561
Module Size Used by bluetooth 667648 0
A few years ago, the first serious CPU-level vulnerabilities with colorful names began rearing their ugly heads. At least since then, Linux’s /proc/cpuinfo contents have been very interesting, if only to see what hardware design mistakes had to be mitigated at the microcode/firmware/kernel level. Even though Kaby Lake-generation CPUs are not from that first generation of vulnerable hardware designs, they have enough architectural bugs to cause careful folk to worry.
Please note that grml does NOT include the latest Intel microcode patches, so some/all of the detected unmitigated vulnerabilities might have already been patched on a suitably updated system. What this output does tell us, however, is which microcode patches (if any) have been applied to the AMI firmware image that the motherboard was flashed with at the factory. Since I don’t really expect any forthcoming firmware upgrades for this no-name system board, the OS will have to take care of patching the CPU microcode - which usually is the better choice anyhow.
processor : 0 vendor_id : GenuineIntel
The fwbow does not come with any peripherals attached internally that use USB, but of course, there’s plenty of stuff connected via PCI Express.
00:00.0 Host bridge : Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers [8086:5904] (rev 02) DeviceName: Onboard - Other
Modern Ethernet adapters are little embedded computers of their own, and some of them can do many things better or quicker than the host CPU - like checksumming various pieces of Layer 3 data. Linux offers
ethtool to interact with Ethernet devices and their drivers to query and control some of these features, so we’re using it here to check the hardware features of the integrated Intel NICs. Note that querying all six adapters produces the same result each time, but I only reproduced it once.
Settings for eth0: Supported ports: [ TP ]
Also, Intel’s PROADMIN tools were able and willing to verify the integrated NICs as genuine, after getting it to work with booting Linux with the
iomem=relaxed cmdline parameter.
The included mSATA disk isn’t something to get crazy about, but it does seem solid enough and certainly is plenty good for booting and housing a GNU/Linux system. It also offers enough diskspace to store logging data of the fwbox itself, and maybe even its peripheral systems, if that is your thing. If you want your fwbox to also act as a fileserver, you’re better off with a larger (and/or second, which even opens up software RAID possibilities) SATA and/or mSATA storage device.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT loop0 7:0 0 687.3M 1 loop /lib/live/mount/rootfs/grml64-full.squashfs
One of the most interesting aspects of any kind of hardware is its performance in typical usage and workload scenarios. For a router and firewall device, that will mostly boil down to cryptographic operations involved in network protocols, as well as bandwidth/throughput and latency for typical network protocols and operation.
Since we’re dealing with a PC-class device with an extraordinary cooling concept, we will want to make sure that even a continuously stressed system will not run into thermal problems that might cause it to throttle its speed or, even worse, crash it completely due to accumulating excess heat.
Finally, everyone sure wants a home router/firewall to be fast and featureful, but not at too steep a price in terms of operating costs - we’d expect this device to operate with at least acceptable energy efficiency, and in order to gauge that, we will measure its energy consumption in some usual (and also unusual, which should be interesting only on a theoretical level) scenarios.
So what’s the fwbox like when it’s just powered on, having booted a Linux kernel, otherwise mostly chillin’?
If it weren’t for the rather too bright LEDs, you wouldn’t be able to tell that the thing is even powered on: It doesn’t make a sound, and it also doesn’t get hot to the touch (even after hours of operation with practically zero airflow), which hints at an effective cooling design. My wattmeter reports 6.8W power consumption in this state with grml’s default boot options. Having one of the Ethernet interfaces light up with a Gbit link into my home LAN increases that reading to 7.2W. Each additional (up and configured, but relatively idle) link that I took online increased that reading by about 0.15W per link.
With OpenWrt 19.07.5 booted from USB and
powertop --auto-tune having run once, the fwbox draws 4.7W without any Ethernet interfaces connected. Having five Ethernet links connected puts the wattmeter at 6.7W. Cable length and Energy Efficient Ethernet support might be playing a bit of a role in all this - but the numbers should give you a rough idea of what to expect in a typical home network environment.
In this nearly completely idle state, the Kaby Lake CPU’s internal sensors reads a comfy 36C via the coretemp driver at an ambient temperature of 24.7C (measured at ~30cm horizontal distance from the fwbox on the same surface).
Various tools can put heavy load on a CPU, but few are more effective at it than the venerable PRIME95. While there are benchmarks and stress test programs that can actually strain modern CPUs a tiny little bit more (like Intel’s MKL benchmark suite), PRIME95 is nice because it works equally well for AMD and Intel x86 processors and supports a wide range of operating systems. Also, it doesn’t require an expert to tune it to a specific environment or machine.
To check if a particular system is stable, or is prone to failure due to overheating or producing incorrect computation results (due to hardware errors with all kinds of possible underlying reasons), PRIME95 offers the innocuously named “torture test” mode. That initiates a never-ending (unless interrupted by a calculation/verification error, or a human operator telling it to stop) cycle where it factors huge known Mersenne prime numbers, using highly optimized code to use all the processing power your CPU(s) can muster. Consequently, a lot of heat needs to be shed by the system under test. You can invoke that mode directly from your shell by running
Taking a look at the fwbox’s temperature readings after it had pegged away at the torture test for about four hours, we can dare to assume it arrived at a steady state temperature maximum, that will exceed whatever real-world CPU usage would cause. With grml’s defaults for all tunables involved,
sensors reports an unexciting 49C, while the ambient temperature clocks in at 26.0C (increased from before due to my wife’s preference for warm feet). The fwbox’s case is warm, but certainly not hot to the touch at this point. Not bad!
During many hours of stress testing under PRIME95 on grml, the reported maximum power consumption was 13.5W. PRIME95 did not produce any test failures, which would have led me to dismiss the device outright.
Unfortunately, PRIME95-based tests cannot be reproduced when running OpenWrt, since the precompiled binary seems to rely on eglibc-exclusive features/symbols, which OpenWrt doesn’t bring to the table.
PRIME95 also offers a benchmark mode, to compare performance amongst systems. You can invoke it from the interactive menu that starting up
mprime offers, and it produces a textual performance report like the following:
Your choice: [Main thread Dec 30 19:31] Starting worker. [Work thread Dec 30 19:31] Worker starting [Work thread Dec 30 19:31] Your timings will be written to the results.bench.txt file. [Work thread Dec 30 19:31] Compare your results to other computers at http://www.mersenne.org/report_benchmarks [Work thread Dec 30 19:31] Benchmarking multiple workers to measure the impact of memory bandwidth [Work thread Dec 30 19:31] Timing 2048K FFT, 2 cores, 1 worker. Average times: 16.71 ms. Total throughput: 59.83 iter/sec. [Work thread Dec 30 19:31] Timing 2048K FFT, 2 cores, 2 workers. Average times: 33.53, 33.53 ms. Total throughput: 59.64 iter/sec.
Full Output: fwbox_mprime_benchmark.txt
I will not try to interpret or compare these numbers to those produced by other systems, but only post them here as a reference for anyone interested.
Now let’s have a look at performance numbers more grounded in the real world, that will make sense to compare with other systems with similar fields of use. The tool in our belt for that purpose is one that is highly relevant to networking gear: OpenSSL. Most of the time when you do something with Transport Layer Security, it is involved. Software also uses its digest routines to protect against various forms of data tampering and/or corruption. Since it also includes an easy to use benchmark mode, we’ll use it to generate a nice table of throughput scores for various digest/checksum and cipher/encryption modes, as reproduced below.
First, let’s look at single-threaded (actually, single process) results for our selection of some relevant and common cryptographic primitives:
Benchmark script: openssl__bench
OpenSSL 1.1.1g 21 Apr 2020 built on: Tue Apr 21 19:45:21 2020 UTC options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-kZUcLs/openssl-1.1.1g=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 The 'numbers' are in 1000s of bytes per second processed. type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes aes-128-cbc 395517.05k 659327.20k 676089.64k 679126.02k 679609.55k 679845.07k aes-128-xts 258929.64k 911888.16k 1897743.86k 2539918.59k 2816390.35k 2845410.10k aes-128-gcm 238319.51k 710595.19k 1332153.56k 1579320.06k 1662084.71k 1674276.86k
Full plaintext output — grml: fwbox_openssl_speed_n1.txt — OpenWrt: fwbox_openssl_speed_n1_openwrt.txt
Comparing the results from grml to the OpenSSL library as packaged in OpenWrt, there is a clear difference for low blocksizes in many cases (for reasons unclear to me), but mostly perfectly comparable results for the more meaningful, larger sizes. A few notable differences (nistp224, nistp521) emerge in the other direction, too, where the grml build provides starkly increased throughput. Also, OpenWrt’s OpenSSL build comes with fewer cipher suites/curves configured and supported than Debian’s/grml’s, but all relatively common and recommended choices are supported on both platforms.
To give you an idea as to how these numbers stack up to some of my other systems at home, you may want to take a look at the output from OpenWrt 19.07.5 (ramips) on the the D-Link DIR-860L, Debian 10/Buster on an ASUS VivoMini UN45H NUC-like PC with Celeron N3000 and for Archlinux on an AMD Ryzen 5 3600 (x86_64). I realize it would be optimal to compare systems with each other under identical conditions (i.e., running the same OS) - but unless someone expresses interest in that, I’ll skip producing those out of sheer laziness. The numbers should still be roughly comparable and not fluctuate too much between OpenSSL minor releases and compiler versions etc.
The fwbox features a dual core processor, so let’s also look at what it can do with two and four processes competing for compute resources. Theory would suggest near-linear speedup for the two-process scenario, and probably slightly reduced absolute performance (compared to the two-process case) for the four-process run - due to increased scheduling effort that translates into fewer CPU cycles available to crunch numbers.
Benchmark script: openssl__bench_multi
(Due to OpenSSL’s historical peculiarities of the
speed command in
-multi-mode, we have to do a bit of filtering and substitution to arrive at a format that’s friendly to the human eye. The first and only parameter to the script governs the number of parallel workers started by
openssl speed, and defaults to
Forked child 0 Forked child 1 OpenSSL 1.1.1g 21 Apr 2020 built on: Tue Apr 21 19:45:21 2020 UTC options:bn(64,64) rc4(16x,int) des(int) aes(partial) blowfish(ptr) compiler: gcc -fPIC -pthread -m64 -Wa,--noexecstack -Wall -Wa,--noexecstack -g -O2 -fdebug-prefix-map=/build/openssl-kZUcLs/openssl-1.1.1g=. -fstack-protector-strong -Wformat -Werror=format-security -DOPENSSL_USE_NODELETE -DL_ENDIAN -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DRC4_ASM -DMD5_ASM -DAESNI_ASM -DVPAES_ASM -DGHASH_ASM -DECP_NISTZ256_ASM -DX25519_ASM -DPOLY1305_ASM -DNDEBUG -Wdate-time -D_FORTIFY_SOURCE=2 aes-128-cbc 791157.31k 1315311.94k 1350659.05k 1357314.51k 1358941.80k 1359337.06k aes-128-xts 516037.97k 1813784.74k 3780489.78k 5072516.76k 5630643.81k 5690004.28k aes-128-gcm 555777.52k 1573437.77k 2796383.03k 3214518.99k 3340976.13k 3367434.65k
Full plaintext output — grml: fwbox_openssl_speed_n2.txt — OpenWrt: fwbox_openssl_speed_n2.txt
And finally, more for completeness’ sake, the run with four workers:
Full plaintext output — grml: fwbox_openssl_speed_n4.txt — OpenWrt:: fwbox_openssl_speed_n4.txt
Comparing the results we can observe that the theoretical musings - near-linear speedup for two, and sometimes a slight dent for four threads of parallel execution - mostly correspond to our empirical results. The fact that four workers actually beat out the two worker configuration on a dual CPU system in absolute throughput could maybe be explained by the scheduler treating other consumers of CPU time even more harshly with four selfish hogs around. Fairness is a tricky business.
After having established what the fwbox can do compute-wise, we also want to take a look at its networking performance. We want to make sure (or at least extrapolate, since I don’t happen to have enough devices/interfaces around to actually walk that particular walk) that we could saturate all six Gbit Ethernet links when we have bridged them together, and that there’s enough free CPU capacity to perform other duties in that scenario. For these tests, the fwbox will be running OpenWrt 19.07.5 for x86_64, since that makes a number of network configuration tasks (esp. in regard to traffic shaping, which is the original reason for me to buy this thing!) easier. Also, I’ve grown very fond of OpenWrt over the past few years, and am a firm believer in distro diversity - there are few things worse for (IT) security than vast monocultures, after all.
To monitor the per-core and total CPU load, I will be using
mpstat from the sysstat package.
For these tests, I will have hosts from my regular LAN connect directly to the Gbit interfaces the fwbox has to offer. The vanilla OpenWrt image configures eth0 as the LAN (actually, it sets up a bridge with only eth0 attached to it) and defaults to eth1 as the WAN interface. I shuffled those around a bit using luci’s very friendly web interface: In the end, eth5 became the WAN interface (configured as being DHCP-managed - OpenWrt’s default), and eth0 through eth4 were joined together onto the LAN bridge.
Next, I configured a unique (in my LAN) RFC 1918 IPv4 address range of 192.168.10.0/24 for OpenWrt’s built-in DHCP pool, and assigned 192.168.10.1 to the fwbox itself (on its own LAN-side bridge). This should ensure that there would be no problems with address collisions or a potential for routing confusion between my regular home network (192.168.1.0/24), and the temporary network the fwbox was going to produce for its clients. Then, I connected the designated WAN port into one of my home LAN’s ports, and started getting clients online in the fwbox network.
After ensuring that both the fwbox and all its clients had proper network and Internet access (the latter via double NAT - but that is inconsequential for the planned tests), the setup work for performing actual throughput investigations is finished.
To measure the available bandwidth on Ethernet links, we’re going to use
iperf3 - it is a simple TCP and UDP load generator and benchmark tool that is universally available (you can get it for Windows, Android, GNU/Linux, …) and delivers results that are reliable and easy to grok.
Testing client-to-fwbox-throughput, with
iperf3 -s started on the fwbox and
iperf3 -c 192.168.10.1 on a connected client, did not yield in any way interesting results - everything is just about as fast as TCP via Gbit Ethernet can get in my experience, topping out at between 940Mbps and 950Mbps of measured TCP throughput with windows sizes well in excess of 1Mbyte. CPU load on the fwbox is in the low single-digit percent range (<=5%) for this workload, due to iperf3 having to generate and shuttle off the data required for the bandwith test.
One of my client machines - a Windows 10 desktop PC with an AMD Ryzen 5 1600 CPU and an Ethernet interface powered by a RealTek RTL8168/8111 integrated Gbit NIC - once again embarassed itself by being unable to exceed ~710Mbps receive-directed TCP throughput, no matter the L2 peer or cable used. It was therefore disqualified from contributing any meaningful performance numbers to the results presented. All other clients performed up to expectations.
Gauging available bandwidth between two clients by running
iperf3 in server mode on one, and in client mode on the other, yielded roughly the same throughput that we observed in the previous test: around 940-950Mbps. During that run, the fwbox registers a CPU load of about 5% - 7% due to Software IRQs, presumably caused by the Kernel’s bridge code. (If OpenWrt packaged
perf, we would be able to investigate this conveniently and in greater detail - but I consider wild speculation an acceptable substitute in this case.)
When trying to exercise the full-duplex Gigabit speed by having both the client and the server executing on both participating hosts, the observed throughput drops to around 720Mbps in both directions. Since most workloads don’t generate that kind of bi-directional traffic, I did not feel very inclined to research what exactly causes the slowdown in this scenario - it might have been the two Windows 10-hosts I used to actually generate the load and measure speeds on for all I know.
Update (2021-01-15): This effect cannot be reproduced with two hosts running GNU/Linux as their OS and executing iperf tests in both directions simulatenously. It really looks like the client PCs’ Windows 10 TCP/networking stacks (or maybe even NICs/drivers) were to blame for the initially disappointing results. With the Linux kernel at the helm on both ends, we see a steady 920Mbps in one, and 930-940Mbps in the other direction, with 12% CPU consumed by SIRQs.
iperf3 server on another host in my regular LAN (i.e., beyond the fwbox’s NAT), and having one of the fwbox’s clients connecting via the (faux) WAN uplink, we can take a look at how the address rewriting and netfilter getting slightly involved affects the situation. Good news here, too, however: I measured north of 940Mbps consistent throughput with around 7% CPU load on the fwbox.
With all those bars cleared with flying colors, how about a workload that my current router actually cannot handle satsifactorily? That would be shaping a Gbit link’s bandwith using the piece-of-cake shaping preset OpenWrt’s luci-app-sqm (and friends - the scripts that are pulled in as dependencies and that actually configure the shaping policies via
tc et al.) provides. My rampis-based router is CPU-bound at around 220Mbps on OpenWrt 19.07.5, and tops out at that downstream speed with one of its two cores 100% busy with Software IRQs.
After using luci to enable SQM for the fwbox’s Ethernet WAN link and setting the link’s RX and TX speeds to 970000kbps (which is just a wee bit less than what the link can deliver according to the previously established numbers), we get slightly north of 930Mbps throughput with about half a core pegged - so ~25-30% of totally available CPU time consumed, by SIRQs again. Power consumption during this run (five Ethernet links up, HDMI and keyboard disconnected) hovers at around 9W. Pretty amazing really, and there’s still ample headroom for the system to perform additional tasks!
Our last experiment shall involve synthesizing something close to the most demanding real-world-ish workloads, and then looking at the numbers again. For that purpose, we will have the bandwidth test involving the WAN interface with SQM enabled running, while having another two clients perform an iperf3-based bandwidth test in the fwbox’s LAN segment, and also have the fwbox itself execute three openssl benchmarks workers (for aes-128-ecb) of their own. After having this run for an hour, taking the readings from wattmeter and the CPU thermal sensors, we arrive at two numbers: 45C max. CPU temperature, (at 25.2C ambient) and 12W peak power consumption.
That is, for all practical purposes, the power and heat envelope most if not all real-world deployment of the fwbox will not manage to exceed.
Total network bandwith was within of a few Mbps of the previously observed maximums, but if and only if the OpenSSL processes were running under
nice, lowering their scheduling priority. If the OpenSSL workers were chugging along with their default niceness of 0, the bridge throughput was reduced to around 900Mbps, while the client-to-WAN-throughput remained unaffected. Sorry, I don’t have a nifty theory to explain that.
To sum up the (subjectively) most interesting data collected and provide a concise overview over my test results, I prepared a number of facts that should, I hope, provide a sufficient tl;dr-like section.
A few closing remarks to these results:
All measurements still had the pre-installed mSATA disk present, but not in use. One could save some power still by cutting either the USB boot drive that I used, or the mSATA drive out of the equation. Completely diskless setups are also possible, but I wouldn’t want my router and DHCP provider depend on another piece of networked equipment to work.
Putting the fwbox under extreme stress in both its CPU- and network-related functions at the same time reveals that under these (in my opinion rather unrealistic) conditions, the administrator has to decide which task is going to be more important - keeping the network’s bandwidth at or close to its usual maximum, or have other tasks that the fwbox is expected to perform be prioritized over that. Chances are you are never going to experience a workload that requires this trade-off to be made - but if you do, make sure to know when to apply scheduling hints to the busy non-network parts of your system, so that Linux will balance available resources in a way that aligns with your expectations and needs. It’s not hard to do, but it has to be done conciously, by implementing a policy of your own design.
Remember that you can always daisy-chain a suitable Ethernet switch to one of the fwbox’s ports to free up cycles for tasks other than bridging and juggling frames around - while losing some of the flexibility in that now externalized (and enlarged) part of your Ethernet network, of course. But at least with a general purpose device like the fwbow, you DO HAVE this kind of flexibility to trade off for something else in the first place!
Naturally. It’s an amazing little device that provides incredible value, and I am looking forward to making it the cornerstone of my home network pretty soon.
Copyright ©2021 Johannes Truschnigg
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.