Who Killed the Kernel?

Delivering outstanding performance with SolarFlare EF_VI

High performance algorithmic trading involves sophisticated software and hardware components operating in harmony to effectively accomplish market operations. Traders are continually seeking faster market data and order execution services with lower slippage to more accurately qualify their orders. As volumes continue to rise, firms are faced with increasing levels of throughput while still needing to maintain their current or even lower latency targets in order to stay competitive.

FPGA and hardware solutions have become popular for specific scenarios. However, Velas software allows clients to achieve low latency and high throughput in a flexible, cost-effective manner by combining commodity hardware with kernel bypass techniques and an ever-evolving set of software features.

The Linux kernel wasnt meant to handle mega message transfer rates leading to Kernel Bypass…

Vela software runs on the latest generation of commodity servers without the need for custom hardware. This helps lower costs and simplifies the ongoing infrastructure management and support by leveraging widely available skill sets. However, commodity hardware is not without its disadvantages; the Linux kernel is not designed to meet current algorithmic trading demands of passing millions of messages per second between financial networks and trading applications, at sub-microsecond latency. The performance is negatively impacted by context switching since the OS needs to manage state as control is passed from kernel space, where the network drivers reside, to user space where the application runs. Additional cores dont help matters as they scale inversely due to the overhead of multi-processing distributed shared memory management. For example, in tests, while a single core delivered close to 1.5 million packets per second, a dual core system dropped to about one third of that number, and it would take up to approximately 20 cores to just reach the equivalent throughput of a single maxed out core.

The most effective way to avoid overwhelming the kernel and overcome these challenges is to enable NICs to intercept system calls when transferring packets and take control of them directly in hardware, i.e., bypass the kernel. This leaves all other components intact and the localizing message transfer performance management to the NIC itself. Not only does this approach resolve the above challenge with standard hardware, but it addresses growing performance demands as the market size and pace continue to rise.

Solarflare OpenOnload NIC comes in…

OpenOnload is a SolarFlare developed, high-performance network stack that dramatically reduces latency and boosts throughput by bypassing the Linux kernel and directly accessing the NIC. It has support for both TCP and UDP network transport protocols, and a standard API that requires no modifications to end user applications. SolarFlares Ethernet Fabric Virtual Interface (EF_VI) is a proprietary library that allows for even lower level access to the NIC while utilizing standard NIC features in a consistent novel way.

EF_VI delivers half-round trip latency in the sub-microsecond range which is an order of magnitude higher than typical kernel stack latency. It performs network processing in user space; and as it bypasses the kernel it reduces the number of data copies and kernel context switches. Up to three million messages per second can be delivered via a single core.

Vela offers market data delivery with Solarflare NICs

The Vela stack is comprised of modular software components that enable access to all major asset classes and liquidity venues to successfully execute latency-sensitive trading strategies and manage risk across multiple markets. In particular, the Vela Ticker Plant integrates SolarFlare EF_VI to achieve the lowest possible latency and highest available throughput from software-based market data solutions. It utilizes virtual interfaces to establish the relationship between EF_VI resources and the lock-free inbound queue of the thread that handles the data on that interface. This – combined with flexible configuration of threads, DMA buffers, and small buffer pools – allows Vela to optimize the deployment against the shape of the inbound data distribution. Deploying all of this on tuned commodity hardware reduces infrastructure costs, provides transparency and faster issue resolution as well as flexibility for future growth. In recent capacity testing against 48 OPRA lines, a single Vela Ticker Plant solution deployed on a single 2U server sustained 15.6 million packets a second without any data loss (double todays production data).

In summary, by providing the agility to respond to both market developments and the fast-changing regulatory landscape, without the high costs associated with supporting any bespoke hardware, the Vela Ticker Plant delivers the performance, functionality, and flexibility needed by todays algorithmic traders to maintain a lead in a competitive marketplace.