Who is Responsible for my SDN Network?

This post is also available in: Japanese

David McNamee and Huawei

You’ve done it! You convinced management to adopt SDN and you’re buying switches from Pica8, IBM, and Dell, controllers from Big Switch, as well as applications within the Big Switch Ecosystem. You are so on top of things you have even implemented your own custom traffic engineering application. Excellent! You’ve rolled it all out and for the last month everything has been great until you started experiencing some performance issues in the network. You have great utilization, but the performance seems slow. So let’s start troubleshooting the network and applications. Where do I start? Ok, I just use which tool? Hmmmm, in the past, I always called my main hardware vendor for support. I’ll call tech support for help … but now, who do I call?

The value proposed by SDN proponents is simplified management, and the ability to plug any vendor hardware in for forwarding services. The network is no longer about the proprietary hardware; it’s now about the flexibility of software and accelerating innovation. Has this truly made operating costs lower, or is it all hype? In a September blog, Does SDN Make my Network Management Look Fat, John Strassner explored the challenges of Network Management. Beyond Network Management, troubleshooting and support services are two additional complex challenges.

One of the more unique problems created by the current definition of SDN, and demonstrated by the current SDN state of the art, is the lack of standards and the relative youth of the various SDN offerings. Which version of OpenFlow is your switch supporting, what vendor-specific attributes are added, which optional attributes are available or not on the various switches, are the controllers interoperable, what features have been implemented on top of the controller, and will this only work in greenfield networks, or will it work for hybrid networks as well? With just this complexity alone, troubleshooting performance and transient problems will be very challenging. Now, what tools are available to help you?

In existing networks today, Network Administrators have a wealth of tools and technologies that have been developed over decades explicitly focused on the challenges of analyzing network performance, troubleshooting and diagnosing network outages and providing solutions for remediation. Administrators looking to stand on the bleeding edge of SDN adoption will have to be prepared to roll their own solutions and/or spend extra time trying to find and adapt from the myriad of SDN solutions sprouting like weeds.

This creates several challenges:

  1. There are no standards for controllers and NB APIs, so it will be left to the administrators to find and most likely adapt offerings to the environment they have.
  2. Due to the lack of standards for controllers and NB APIs, there is a high likelihood that the tools you most need do not yet exist, and so they will have to be built in-house.
  3. While CapEx budget may be reduced by the “cheaper commodity hardware”, the OpEx costs for developers will most certainly grow. Does your department have the required budget? Can you afford OpEx costs that may be equivalent or larger than your CapEx reduction?
  4. More importantly, as these new tools are developed and integrated, can you afford the extra time and delay induced by having to wait for new and/or adapted tools to be created, then tested, and then integrated with your network? What about the risk to the stability of your network services?

These points bring us to the next major challenge for Operations, which is how services are supported. One of the biggest potential values of the SDN movement is the speed of innovation and adoption of network services. Whether you believe in the abstractions and functionality offered by OpenFlow or not, the trend of the Hardware vendors to provide open APIs is a benefit to the end developer. They now have mechanisms to adapt the networks to their specific needs. Oh, but be careful what you wish for!

While working at Cisco, I had many opportunities to meet with Enterprise companies from around the world. One of our most powerful capabilities was a feature set called the Embedded Event Manager (which now plays a prominent role in the Cisco ONE offering). EEM was the first major differentiator for device programmability offered by Cisco. It is a powerful tool that enables operators to create custom solutions to identify and correct problems with network services; it can also be used to augment feature sets provided by Cisco with additional, customer- and application-specific functionality. However, such power can work both ways. The number one concern of customers was not bugs in EEM; rather, it was if their developers wrote a set of custom scripts, what percentage of the functionality in those custom scripts would Cisco support. This is because of the difference in programming models for applications compared to networks. In the application world, every tool and IDE vendor will of course support application development. However, this is typically not true in the networking world. This is where I worry about the ability of the SDN initiative to make real breakthroughs. Until application developers can treat the network as “just another set of resources and services”, the real innovation of network applications will remain slow and difficult to develop.

Is anyone prepared to truly support broad adoption of custom features into production networks? Part of the problem is the level of abstraction that these features are developed at. It’s one thing to use EEM to fire off simple TCL scripts for pre-defined IOS events – this is simple and straightforward (since nothing in IOS is changing), but very powerful (since your program can now provide custom behavior triggered by IOS events). However, it is quite a different matter to change the behavior of IOS, or of entities that IOS directly depends on or uses. While the former is easy to detect, the latter can vary widely with the complexity of the operations being performed. An additional challenge is “how do you synchronize changes made by network-integrated applications to changes made with other tools on other network devices?”

Are the tools available to diagnose custom scripts or programs across multi-vendor hardware and software solutions? Are the tools able to recognize that different commands from different management applications applied to different devices from different vendors have the same (or similar) effect? If not, how do you build end-to-end services? How do you build and troubleshoot vendor-agnostic applications? Even if the NB API is equivalent, is the underlying behavior the same? Take that example of EEM; even though the same function is available in IOS, IOS XR, and NXOS, the behaviors of that function are not always equivalent. A script on a Cat6k will behave differently on a Nexus7k or GSR12k. If 3 OSs from one company cannot interoperate, how will creating an equivalent application work across multi-vendor solutions? As an administrator, whom do you call in this environment for the applications you wish to create and use?

Many vendors are touting the release or availability of custom APIs. Often, the easiest step is to get the infrastructure in place to offer APIs to control different features. It is far more challenging to create tools to catch and validate changes to devices; it is even harder to do the same for the business processes and support environments that are used to manage devices that provide the resources and services delivered by the network. As long as we are dreaming, we might as well throw in unified interfaces and APIs for compute, storage, and networking. These are the set of tools that are required to truly enable the adoption of custom network-aware applications. The impact of a faulty application that manages the network can be far larger on a business than an application confined to a server or server farm, as network outages can adversely impact all applications, as well as the resources and services that are needed by those applications.

Some of the vendors are starting to offer Developer Support services. It’s viewed as new service revenue – this is something they understand, growth of revenue. But the technical support organizations are historically based in “box thinking”. This environment is not one that has a history of dealing with high-complexity, small volume problems. In these environments, developer support (application creation) and network operations (application and network operations) are different groups. A call that comes in for network problems will typically route to a team that knows how to troubleshoot and debug “Standard Configurations”.

In order to really help customers, certification processes will need to be put in place to help certify the “safety” of adding custom scripts, programs, and applications to the network. But one of the promises of the SDN vision is commodity hardware and software inside the network device, leading to a vendor-agnostic platform, right? So which vendors provide support and certification for multi-vendor software and hardware networks?

In the end, who will be responsible for your SDN network? For now, it appears that this will fall on the already overburdened and under-staffed IT shops. Are you ready?

Guest Blogger Disclaimer:

The views, opinions and positions expressed within these guest posts are those of the author alone and do not represent those of SDNCentral.com and SDNCentral, LLC. The accuracy, completeness and validity of any statements made within this article are not guaranteed. We accept no liability for any errors, omissions or representations. The copyright of this content belongs to the author and any liability with regards to infringement of intellectual property rights remains with them.

Checkout more from our Guest Bloggers:

Like our Content? Join the SDNCentral Mailing List

  • This field is for validation purposes and should be left unchanged.

About the Author

.

Nearly 20 years in the IT, Networking, and Software Development Space running Enterprise and SP Operations… More

Connect with the AuthorLinkedIn

  1. Sean Hafeez
    says:

    ave,

    Your article highlights an area where Open SDN is driving innovation. As yet, standards for the Northbound APIs have not been fixed. This is a focus of work at the ONF. Robert Sherwood from Big Switch was just appointed Chair of the Architecture Working Group, which is focused on exactly this area. (And the open-source Floodlight controller already has a northbound REST API that network application developers are adopting.)

    An Open SDN gives you visibility into the whole networks from a single control point, and it does so via a API that is simpler than a collective of various tools. Those things you used to do device by device, you can now do from a central controller. Open SDNs deliver the broad visibility and simplicity without taking anything away: you can still exec debug commands or do a trace on a switch the old-fashioned way if you want.

    And some folks might be more comfortable with this approach as we transition to SDN. The inertia behind tools and the knowledge required to use them remains important–and it’s also indicative of the problem with networks today: New protocol, new tool, new debug knowledge to acquire. And then there’s a new problem or a new feature, and a protocol, tool, and learning anew with another problem on the horizon. Wash, Rinse, Repeat.

    OpenFlow changes all that. Some tools will no longer be needed and will go away. And guess what, you gain the ability to see every packet_in, packet_out and flow_mods from a central point. The controller gives you that. It’s simple to use and simple to program. An Open SDN controller can even include the ability to remotely capture packets from a controller and display a live decoded dump on your desktop. That saves time.

    Transitions like these are exciting and just a bit scary. I’ve been building and troubleshooting networks for over 20 years. There were times when I was worried about giving up my old tools. But I am over it. Tasks that use to burn hours now take 5 minutes, and I am working on large scale networks. So, my message to all the overburdened and under-staffed IT out there is this: Spend a little time on Open SDN. You’ll see that it’s simple to learn how to work with a controller and there are many advantages. After a little time you will have answer tough question to consider: “How are you going to spend your newly found free time?”

    Sean Hafeez
    TME
    Big Switch Networks, Inc.

  2. Dave McNamee says:

    Hi Sean

    Thanks for your response. Couple of key items:

    1. Im not concerned with the transition to new tools. The first part of the blog was raising the point that the new is not yet there. Anyone adopting will have to be prepared to roll up their sleeves for now until the market stabilizes.

    2. The thing that annoys me most when discussing SDN is people respond with “Well when we have this or that defined then the problem WILL be solved”. The discussion should be about how to do do it in today’s environment.

    3. I dont think having a central controller solves all problems for debugging issues in the network. When you say packet in, packet out – arent you just referring to the packets that get kicked to the controller that arent processed by flow tables? And if you truly can see “ALL” packets and flows – doesnt the volume of information become overwhelming? Look at Netflow/SFlow/IPFIX. They allow you to get lots of information from the network – and you can see ALL of the network potentially – but at what cost? There can be GB’s of data coming from the network – you need tools to process the information.

    4. OpenFlow is only providing information at the Data Plane, and primarily only solid solutions for L2 (l3 is evolving). So what about the services at L4-7 in conjunction – how do you debug the applications built on top of the non-standard NB API’s that comprise a network service? The market is young, the point is these things need to evolve too.

    Everyone is focusing on commodity switches, or if you build the controller they will come. To transition the networks too a new paradigm the complete suite of functions for an operator to manage their network need to be available for SDN success. I believe they will get there, but that is not the focus yet.

    Finally, you missed my point on programmability. Everyone is building API’s (RESTful or not). Most are building some level of Development Environment for those API’s. The question was about debugging custom applications written to API’s or controllers. The value of SDN is improving programmability. Great! The challenge is support. Do we assume if when I add a new Deep Packet Inspection capability on top of my controller, and there is a problem, that it is my app? What if the app is actually exposing an issue in a switch or controller? The answer cant be simply turn off the app? If the environment is a mismatch of vendors based on commodity plays, tied together with a controller, and custom applications – who is the “one neck to choke vendor” that helps me with operational issues? How do I certify the safety of applications that can change the state of the network? We are no longer talking about simple APIs that are the equivalent of having smart CLI that allow us to change configuration. We are talking about APIs that potentially change the behavior of the network and network services.

    I find the promise of SDN exciting, and it will be very interesting if the industry executes to the promise, but that means thinking of usability and support now to make sure we get it right.

Leave a Reply

You must be logged in to post a comment.

Login


MODAL