Special Session: FPGA Networking

Special Session on FPGA Networking

Fast, Flexible, and Intelligent Next-Generation Networks

by Muhammad Shahbaz (Purdue University, USA)

Maintaining strict security and performance objectives in next-generation cloud and edge networks demands that compute-intensive management and control decisions are made on the current state of the entire network (e.g., topology, queue sizes, and link and server loads), and applied per-packet at line-rate, in a fast-and-intelligent way. Unfortunately, the dominant solutions available today are either fast-yet-dumb or slow-but-intelligent. Data-plane devices (e.g., switches and NICs) can react in nanoseconds to network conditions; however, they are designed for routing packets and have a constrained programming model (e.g., match-action tables or MATs). Conversely, control-plane servers can make complicated data-driven (AI-based) decisions, yet the round trip between the controller and the device fundamentally limits their reaction speed. In this talk, I will show how my research helps bridge this gap between speed, flexibility, and intelligence. I will present a new data-plane architecture (Taurus) and a declarative programming framework (Homunculus) that, for the first time, allows network operators to execute per-packet data-driven decisions directly within the data-plane device at line rate. I will first focus on the design of Taurus and the accompanying parallel-patterns abstraction (MapReduce), which supports data parallelism to efficiently execute common ML models (e.g., DNNs and SVMs). I will then demonstrate how, using training data, objectives, and constraints as inputs, Homunculus automatically converts the operator’s high-level policies into an efficient ML model to execute on the underlying data-plane device (e.g., Taurus). I will conclude with an overview of my broader research vision and future plans for realizing the full potential of fast, flexible, and intelligent next-generation networks.

Muhammad Shahbaz

Muhammad Shahbaz is a Kevin C. and Suzanne L. Kahn New Frontiers Assistant Professor in Computer Science at Purdue University. His research focuses on the design and development of domain-specific (performance) abstractions, compilers, and architectures for emerging workloads (including machine learning, self-driving networks, serverless compute, and 5G). Shahbaz received his Ph.D. and M.A. in Computer Science from Princeton University and B.E. in Computer Engineering from the National University of Sciences and Technology (NUST). Before joining Purdue, Shahbaz worked as a postdoc at Stanford University and a Research Assistant at Georgia Tech and the University of Cambridge. Shahbaz has built open-source systems, including Pisces, SDX, and NetFPGA-10G, that are widely used in industry and academia. He received the NSF CAREER Award; Facebook, Google, and Intel Research Awards; IETF/IRTF ANRP Prize; ACM SOSR Systems Award; APNet Best Paper Award; Best of CAL Paper Award; Internet2 Innovation Award; and Outstanding Graduate Teaching Assistant


FPGAs to Unlock In-Network Computing

by Suhaib Fahmy (KAUST, Saudi Arabia)

In-network computing is a concept that has been discussed for close to three decades, and gained some attention in recent years with the advent of programmable switches in the datacenter. Computing on packet flows directly within switch hardware can dramatically improve application performance due to the reduced overhead of traversing packet processing hierarchies. However, these programmable switches are restricted in terms of their computational capabilities and can only be deployed within datacenters. We argue that FPGAs are well suited to lightweight, high performance network flow processing, as well as accelerating primitive functions, and can be deployed in a wide range of scenarios. We present a roadmap for enabling this capability that builds on existing approaches for virtualising FPGAs in the datacenter, adapted to a hostless setting. Through this we advocate for a more meaningful deployment of in-network computing in the continuum from edge to cloud to support future applications in cognitive infrastructure.

Suhaib Fahmy

Suhaib Fahmy is Associate Professor of Computer Science and Principal Investigator of the Accelerated Connected Computing Lab (ACCL) at KAUST since 2020. His research explores hardware acceleration and integration of these accelerators within wider computing infrastructure. He graduated from Imperial College London with an MEng in 2003 and PhD in 2008. He has previously been Assistant professor at NTU Singapore, and Associate Professor, Reader, then Full Professor of Computer Engineering at the University of Warwick, UK. His research focuses on exploiting programmable hardware in a connected context, including virtualization, management, and algorithm acceleration. Dr Fahmy has received Best Paper Awards at FPT 2012, ACM TODAES 2019, IEEE HPEC 2021, and the Community Award at FPL 2016. He sits on the ACM Technical Committee on FPGAs and is a Chartered Engineer and Fellow of the IET, as well as a Senior Member of the ACM and IEEE.


Enhancing Network Programmability by transparently executing eBPF Network Programs in FPGA SmartNICs

by Angelo Tulumello (Axbryd, Italy)

Packet processing alone consumes about one-third of data centers’ computing resources. Network processors aim to relieve the CPU from handling packet processing by offloading such workloads to specialized devices known as SmartNICs, DPUs, or IPUs. While these hardware solutions offer performance benefits, they are limited in flexibility and programmability compared to software, which is crucial for adapting to ever-changing networking needs.
In this talk, we will explore our research on porting software network programs written with eBPF to FPGA SmartNICs. eBPF is a software framework integrated into the Linux kernel that enables efficient and flexible network function development in Linux-based operating systems.
We will introduce two solutions we developed: (i) hXDP, a serial VLIW processor implemented in FPGA, specifically tailored to the eBPF Instruction Set Architecture, and (ii) eHDL, a HLS framework that translates eBPF programs into hardware pipelines for FPGA SmartNICs.
These solutions aim to combine the performance benefits of hardware with the flexibility and programmability of software, enabling more adaptable and efficient network processing.

Angelo Tulumello

Angelo Tulumello is a senior researcher at CNIT and co-founder of Axbryd. He received his PhD in Electronic Engineering from the University of Rome Tor Vergata in 2023. His research focuses on developing programmable network solutions and architectures to simplify the creation of efficient networking applications. He specializes in stateful packet processing and offloading high-level software programs onto hardware executors, such as FPGA SmartNICs and programmable switches. Angelo has contributed to several European research projects, including Horizon 2020 BEBA, Superfluidity, 5GPicture, and 5GMed.


Network Engineers meet FPGAs

by Salvatore Pontarelli (Università di Roma Tor Vergata, Italy)

Networks deliver unprecedented volumes of increasingly diverse traffic. Network devices must process billions of packets per second and provide a wide variety of network services, ranging from low latency services for real-time traffic to very high-volume data exchange. FPGA devices seem able to fulfill the performance requirements of these network applications, also providing the needed programmability level.  Unfortunately, designing efficient and high-performance network applications using FPGAs is hard and time-consuming, particularly for the mindset of network engineers who are not used to thinking of hardware designers. In this talk, we first describe the main characteristics of FPGA-based network applications, and after, we will describe some common design patterns used to implement network functions as FPGA-based high-speed network processing pipelines.

Salvatore Pontarelli

Salvatore Pontarelli is an Associate Professor at the Department of Computer Science, University of Rome La Sapienza. Before joining Sapienza, he was a senior researcher with the CNIT, the National Inter-University Consortium for Telecommunications. He also held several research positions at Telecom ParisTech, the University of Bristol, the Italian Space Agency, and the University of Rome Tor Vergata. His research interest focuses on the design of high-speed hardware architectures for programmable network devices. He has participated in research projects funded by public bodies (FP7, H2020) and private companies and collaborated with various internet network device manufacturers (Cisco and Mellanox Technologies). He published more than 70 articles in archival journals and more than 80 contributions in proceedings of international conferences. He got the Jay Lepreau Best Paper Award at USENIX OSDI’20 and a Distinguished Paper Award at ACM ASPLOS’23. He holds 17 patents on software-defined networks and optimization of exact match, LPM (Longest Prefix Matching), and wildcard matching. Some have been implemented in the last generation of Mellanox NVIDIA switch chips.


Advances in Accelerator Driven Distributed Computation

by Ken O’Brien(AMD, Ireland)

In the last decade, ever more heterogeneous systems have been used to offload computation from the host CPU to accelerators, such as FPGAs and GPUs. In this talk, we will demonstrate the use and advantages of accelerator driven communication to achieve lower latency solutions compared to a host centric approach. We will explore the ACCL library for distributed computing with AMD Alveo FPGAs and case-studies from wave simulation and DLRM. Finally, we will give initial results on peer-to-peer communication between GPU and FPGA accelerators.

Ken O’Brien

Ken O’Brien is a senior member of technical staff at AMD Research and Advanced Development. He has worked in the areas of reduced precision machine learning, systems programming, and performance modelling on heterogeneous systems. He is currently researching network acceleration for AI/HPC workloads with FPGAs, DPUs and GPUs. He is co-chair of the H2RC workshop at SC. He holds a PhD in computer science from University College Dublin.


FPGA Network Success and Future

by Andrew Moore (University of Cambridge, United Kingdom)

Throughout the development of network hardware, the FPGA has been a centerpiece providing platform for development, prototyping, testing, evaluation, implementation, and deployment. The inherent flexibility balanced against the rugged performance has made FPGA a core component in the network ecosystem.  FPGA-enabled platforms such as NetFPGA have provided the underlying infrastructure that has enabled numerous other projects; from measurement to traffic identification and from novel transport protocols to OpenFlow, NetFPGA has been the demonstration platform.  More recently the NetFPGA project has evolved to take advantage of the new commodity FPGA-based adapters, enabling work at and beyond 100G; this in-turn has allowed 100GbE implementations and demonstrations of the most advanced research projects. Yet despite this momentum, we are now facing a change in the roll of the end-host adapter and alongside this change we see a change in the roll of the FPGA-based adapters too.  Most successfully, FPGA platforms provided the pathway to the practical uptake of Software Defined Networking, and it is in the domain of end host networking, offload systems, and smart NICs that I will outline where I believe FPGA-based networking has more potential for contributions.  I will outline my own thoughts on where we are for the FPGA-based systems in networking particularly end-host networking.  I don’t believe it will be a straightforward future but I will highlight the roadblocks (and solutions) as I see them. Finally, I will discuss the role of the NetFPGA platform and similar systems in future research and education for networked-systems.

Andrew Moore

Andrew W. Moore is the Professor of Networked Systems, in the Department of Computer Science and Technology at the University of Cambridge and has been in the department most recently for over 15 years. Prior to that he founded the networking research group at the University of London, Queen Mary, now operating in its 17th year, and previously he had co-founded two industrial research laboratories in Cambridge: Marconi (formerly GEC) and Intel Research.  Over this period, he has made significant contributions to the measurement-based management of networked systems and constrained resources, including joint-design of the first 40GbE capture system, and the first application of machine-learning to application-identification within Internet network traffic.  Professor Moore has published over a hundred papers and advised a dozen successful PhDs. In 2015, with Professor Nick McKeown and Professo. Robert Watson, he co-founded the NetFPGA C.I.C. intended to enable and encourage the use of open-source hardware platforms, he has been director of the NetFPGA project, now in its third decade, and has personally originated or contributed-to over a dozen open-source software projects ranging across the operating-systems, networked-systems, measurement, and modelling domains.

Scroll to Top