Secret Origins of Big Switch’s Floodlight Controller – Interview with David Erickson of Beacon Fame

Big Switch David Erickson OpenFlow

Big Switch Networks has seen success with its Floodlight controller downloaded over 10,000 times, and we thought it would be important for the community to understand the origins of Floodlight, what its roots are and what its original creator was thinking when he built it. And so, we reached out to David Erickson at Stanford, who created Beacon, the Java-based OpenFlow controller that subsequently was forked to create Floodlight, and asked for an interview. David was kind enough to agree and was most gracious with his time. We hope you enjoy this interview–especially the part about the Cisco Frankenswitch!

SDNCentral: How did you first get involved in OpenFlow and SDN? What intrigued you?

David: “I was very interested in the precursor work done by Martin Casado on SANE and Ethane, then in November of 2007 I got an email from Nick McKeown that included a draft of what became the OpenFlow whitepaper. I came up with some ideas that started thinking about what kind of innovation this new technology could enable.

In early 2008 the snowball started rolling down the hill and multiple things happened in parallel:

  • I began attending what became OpenFlow spec meetings
  • Nicira provided us with an early copy of NOX to start experimenting with (thanks guys!)
  • I had done some work with the NetFPGA in the past, and began building the software that drove the first hardware OpenFlow switch using the NetFPGA platform, the hardware component having been written by Jad Naous.
  • I decided to use NOX and OpenFlow in a class project where I fused network knowledge gathered by NOX together with virtualization knowledge and control using VMware’s vSphere. The idea was to have a wireless client communicating with a server residing in a VM, while optimizing latency and reliability. This was a success and evolved into a full research group level demo effort that I led in the summer of 2008, which won the best demo award at SIGCOMM. By this point I was definitely hooked on OpenFlow, interest-wise, and have continued to be heavily involved ever since.

BTW, as part of the SIGCOMM demo, I built a Cisco OpenFlow FrankenSwitch.”

SDNCentral: FrankenSwitch?! Our readers will want to hear more about this FrankenSwitch. What can you share?

David: “One of the goals for the SIGCOMM demo was to get hardware switches into the demo topology, since previously everything had been entirely software. We had some early hardware support and switches from HP, and Cisco had donated a Catalyst 6500 for us to use, but no OpenFlow software.  I started an internship at Cisco in the middle of June, with the goal of getting something working for the demo that was two months later. After digging through a number of hardware specs for the supervisor available at that time, and the various line cards, it became apparent that none of the existing hardware had the capabilities desired to support this demo, and further that even with reduced capability it would be extremely difficult for me to get the OpenFlow reference software up and running in IOS in time, particularly since I was not embedded in the Cat 6k product team, and cold calling people familiar with the platform and getting their time was met with an understandable level of resistance.

I decided to try pursuing a different path, which was to use the 6K effectively as a big hardware port multiplier, and to do the actual OpenFlow forwarding inside an external Linux PC.  The theory was that when packets ingress into the 6K that they could be tagged with a Cisco header to indicate the port they arrived on, and that I could get this into the PC where a modified OpenFlow kernel datapath would interpret the Cisco header to determine the port the packet came in on, perform the OpenFlow forwarding, and the resulting packet(s) would then be sent back to the 6K with a similar header prepended indicating which port(s) they should egress on.

In practice this was a little more difficult.  Packets entering the 6K on the front 1G ports were tagged with the Cisco header indicating the port, but if they were sent out any port other than a 10G port the header would be stripped.  So I put a 10G NIC into the PC, but it turned out all the NICs I tried were interpreting the Cisco header on the front of the packets as Ethernet/IP, which of course it was not, and as a result would occasionally throw packets away that had specific bytes that it believed meant the packet was bad.  This was frustrating, and the NIC manufacturers told me there was no way to disable the hardware offload engines causing this behavior.  So I went back to the drawing board and found that I could take the 10G port coming out of the 6K, loop it back in another 10G port on the 6K, and hardwire all packets entering that port to egress a specific 1G port (without interpretation or modification), that would then be connected to a 1G link on the PC.  Fortunately this worked, and the 1G NIC did not throw away the packets.  So the final packet flow through this Cisco OpenFlow FrankenSwitch was:

Inbound:

In 1G port 1/[1-4]
Out 10G port 5/4*
In 10G port 5/5*
Out 1G port 1/12*
In Linux PC*

Outbound:

Out Linux PC*
In 1G Port 1/12*
Out 10G port 5/5*
In 10G port 5/4*
Out 1G port(s) 1/[1-4]

*- Packet encapsulated with the Cisco header

I was pretty proud of this hack at the time, given the extremely tight time frame available to figure out first what was even capable of being done, exploring if it was realistic, and then actually making something work, despite the fact that it wasn’t making full use of the actual hardware in the 6K.  I tried to find a picture of this setup because I remember taking one, but unfortunately I couldn’t track it down.”

SDNCentral: That’s very interesting, thanks for sharing! You’re also well known as the creator of Beacon, which Big Switch has used as a stepping stone to Floodlight. Why didn’t you join Big Switch? Will you be joining Big Switch now that you are wrapping up your PhD?

David: “I was employee #3 at Big Switch (not counting the founders). I worked there part time from June 2010 until January of 2011. I just completed my Oral defense a few weeks ago so yes, I am wrapping up my PhD. Big Switch is high on my list of possible places I could land, they have an amazing group of people there, many of whom I consider to be great friends.”

SDNCentral: What are the differences between Beacon and Floodlight?

David: “In my mind there are two major differences. First, Floodlight is the base of Big Switch’s product, therefore they have significant internal resources committed to its usability, maintenance, and success. In contrast, I created Beacon as a tool to enable me to accomplish the research I wanted to do on top of it, as rapidly as possible. Therefore the scope of the two, and the number of people involved in core development is definitely different. Second, is the licensing. Beacon uses a modified GPL license for its core, and Floodlight uses the Apache license. There are other less-significant differences between the two in terms of run time modularity, Java libraries used, and performance characteristics.”

SDNCentral: Do you plan to continue work on Beacon?

David: “I just released version 1.0.2 on 10/28, it was overdue and contains primarily backported fixes and improvements made while working on my research platform named Virtue.”

SDNCentral: What led you to the original Beacon design?

David: “For me it was actually a very practical need, I found I was spending a large fraction of my time solving problems that were language and/or platform related, time that I would rather spend working on creating interesting applications and logic on top of the platform. This led me to a design point that used a language, Java, that eliminated many of the hassles found in C/C++ (memory management, long compilation time, etc), while also being higher performance and having a very mature tool chain. Along the way I decided to also explore some new features such as having a controller that is fully multithreaded, has run-time code modularity, etc.”

SDNCentral: The Routing, Topology and Learning Switch modules seem to be relatively large objects in the controller. Why are these special? Do you believe these elements to be critical to any OpenFlow controller?

David: “In the Learning Switch module’s case I think it is fairly simple since it emulates the behavior of existing learning switches. Both the Topology and L2 Routing module functionality is pretty standard across OpenFlow controllers. Beacon does not use spanning tree, so a layer 2 Routing module enables it to always send traffic on one of the shortest path(s) between destinations, if desired. To be able to compute the actual routes, some form of topology discovery is needed. NOX used periodic LLDP packets emitted from all ports to do topology discovery, this mechanism worked well and has been inherited (as far as I know) by all other OpenFlow controllers. These mechanisms work quite well in an OpenFlow-only network, however they require further work to interact well with complex legacy networks.”

SDNCentral: What else have you been doing with Beacon lately? Any interesting tidbits you can share?

David: “I’ve been focusing on research that uses Beacon, more so than on Beacon itself. My PhD thesis is ‘Using Network Knowledge to Improve Workload Performance in Virtualized Data Centers’. The high level idea is that today large Virtualized Data Centers (VDCs) are pretty static, once VMs are created and assigned to a server they rarely move. If VDCs were more dynamic, as far as moving VMs around, they could achieve goals such as minimizing power consumption by moving VMs to as few servers as needed, or maximizing performance by load balancing VMs across all physical servers. Today there are products that do this on a small scale (think a single rack), from VMware and Citrix, typically used in smaller data centers/clusters. However these products work only with local resources that their hypervisor’s can measure, such as CPU/memory/NIC counters. Absent from this is the actual core network, which is a critical element if you look at more than just a single rack in isolation.

To understand how network knowledge can inform the VM to server mapping, and improve network workload performance, I built a platform that runs a workload while measuring all resource consumption, feeds this information into an optimization algorithm which produces a new mapping of VMs to physical servers, then runs the workload again using the new VM mapping. This enables a lot of research questions to be explored, such as:

  • How much faster does a particular workload run when it is optimized including network knowledge?
  • What algorithms produce the most optimal VM to server mappings?
  • How long does it take these algorithms to run?
  • How do the properties of my network affect the ability of the workload to be optimized?”

SDNCentral: What do you think about OpenFlow vs other elements of the SDN ecosystem? What about other protocols?

David: “I am biased, but I really like OpenFlow. Is it perfect? No. But it provides a nice base control protocol that enables you to do an awful lot of things today, and it has an ever increasing amount of flexibility and extensibility for organizations to try their own additions and extensions to the protocol. The hope is, and there is some evidence of this, that proven, generally useful additions will be rolled into the general protocol over time. I am unaware of any other protocols today with the same scope, features, and industry support.

As for other elements of the ecosystem, I am very excited about all aspects, hardware, controllers, and applications on top. I think it is wonderful that we are seeing rapid iteration and innovation at all of these layers simultaneously.”

SDNCentral: Any thoughts around the always controversial Northbound API?

David: “My current belief is that it is unlikely in the short term that there will be any sort of standardization here for a couple of reasons. First, the space is still really new, the majority of ‘production’ controllers today are still in some form of trial, with limited deployment experience. And this experience is an important element in informing the API design. Second, the organizations building controllers today tend to also be building the apps, in a very vertical fashion, so they don’t have a lot of incentive to enable their controller to be swapped out with another company’s controller.

Longer term if we see strong growth and use of applications on top, then they may be able to exert the pressure needed on controller writers to standardize the layer they operate on. Or alternatively if one dominant controller platform emerges, then it could become the defacto standard that other controllers then are effectively forced to implement.”

SDNCentral: What do you see as exciting changes in OpenFlow/SDN in the next 18 months?

David: “Great question. Just looking back at the last 18 months, the growth of the industry’s awareness of OpenFlow and SDN has been incredible. I think what I am most looking forward to is hearing more success stories from companies such as Google that have actively deployed SDN in a significant way, and their use cases. I am also hoping to see a few more SDN companies come out of stealth in the next 18 months and learn about their innovations, and hopeful that we will see more and newer hardware released that supports OpenFlow versions 1.3+.”

SDNCentral: Thank you for your time, David! And we wish you the best after graduation!

Comments

Leave a Reply