Author Archive: Geoff Horne
Posts:
I want to start by saying that e3Switch did an incredible job when we called them at 4pm on a workday and, without hesitation or obligation, shipped us about $20,000 of equipment overnight. (including hand carrying a few units to a co-lo in Sunnyvale)
Why am I so impressed? Well for a 36 hour period, From Wednesday Lunchtime until late last night, the InteropNet had to pull off one of the most interesting challenges we’ve ever had. You see, we got onsite and that none of our external DS3 circuits would work as designed.
There was a deathly silence in the room when the bad news was delivered, and we were now faced with the problem of how we were going to replace a critical component in three different locations around the USA.
The good news is that we not only fixed it, but we did it ahead of schedule and with no noticeable outages.
So, how did we do it? Well we pulled out the emergency manual and went through the motions :
First, you analyse the problem, make sure you have a good clear description of what exactly is broken and what equipment is affected.
Second, examine the impact, and how much time do we have to get the problem fixed. We did that and came back with a 24 window to get at least one circuit working.
Third, we listed ALL the possible solutions that we could get done in that time window with the resources available. That became our new project.
Lastly, and this is the kicker, we implemented ALL of them in parallel. We didn’t have the time or luxury to try one option, see it fail and then try the other. Everyone just got on the phone or keyboard and got busy. The first solution to work would be the winner, and we didn’t care which one.
This may seem like overkill, but curiously, two of the solutions didn’t pan out, so we ended up running with our ‘C’ plan. Once that was in place we relaxed a little, got some sleep, then went back and debugged the failed attempts and prepped them in case our working system decided to get temperamental.
“Divide and conquer”
Putting a network of this size together takes a team of roughly 100
people. This may seem like a lot, but you have to realise that
a) This is all brand new equipment that we’ve never plugged together before
b) We build the entire network in 5 days.
So for this to all come together requires an expert management team,
people who not only have experience with the InteropNet but networks in
general. And they aren’t always easy to find.
What you need are are the kinds of brains who are not only experts, but
are also willing to take ownership of a component and see it through to
completion and have the drive to assemble what you require in 5 days or
less. (oh and they also have the skills to take it apart, ship it to
Vegas, and re-assemble it again). Someone once described us as ‘Internet
Carnies’.
But, if you did try to do this yourself, this is what you’d need :
Routing, This person should have the private numbers of people driving
routers inside the ISPs. Someone who understands BGP, OSPF and the rest
of the routing protocols.
Physical, This is someone who understands the entire cable plant and knows
how many pairs of fibre are free in any run. They are who you go to to
get a link patched.
Union Liaison, When you are working the floor you want someone who can bond
with the labor and get the teamsters to either do the right thing or look
the other way.
Off Show floor, This is your end point delivery expert. They know how
many hubs, and cables you will need to network a room. They are also the
keeper of your port counts
Wireless, you’re going to nee someone who understands RF placement, Radio
loading and the optimal way to get the coverage you need.
Network management, this network has to run ad 101% uptime, so you want
someone who can own and design a monitoring and forensics system that
will let you know anything that looks out of place.
Infrastructure, Racks, space allocation, power and all the other things
that make up the hardware you are using. you’re going to want One person
to go to for decisions.
NOC Manager, when you have a team of 100 people you still need an
operational workspace. This person not only makes sure we have switches
on the desktop but plans all the spares, packing and related logistics
items.
Oh, and by the way, all of the above people are volunteers on this
network.
We just spent most of the evening doing failover testing. This was a fairly exhaustive set of processes where we checked that our A and B paths all worked correctly, our power was balanced and we have some semblance of redundancy.
Along the way we also managed to re-cable a lot of racks and clean up some fairly unpleasant cabling efforts. (There are a few people who have now been banned from touching the wiring plant)
We also re-labeled our PDU to breaker map correctly. We think there will be fewer complaints.
Tomorrow the crew starts to packup.
While we are building the InteropNet I thought I’d take a bit of time to
describe what our core technologies are. What do we need to make a
network like this operational? Well over the years we’ve been able to
boil it down to a best practices network that does everything we need in
as simple a way as possible.
Whenever I explain it to someone, I usually break it down to these
components :
Data Services. This is the ISP and more. Circuits, SIP services, CO-LO
facilities and conference calling
Routing. Anything do do with layer 2 and layer 3 forwarding
Security and firewalls. This also includes IDS, Network access (802.1x)
and end client virus protection
VoIP. Soft Phones, hard phones, Skype gateways, Instant messaging, and
any other related presencing information.
Wireless. Because this network is not just for the exhibitors
Network Management. As pro-active as possible. If something is going to
break we want to know before it happens
Network analysis and forensics. These are all our Taps, repeaters and
traffic replicators. With the right design we can copy any conversation,
anywhere for analysis.
Datacenter infrastructure, Racks, UPS, power management and everything
required to hold this gear upright
Workstations and servers. These days we are virtual. But all our
services are load balanced, backed up and replicated
Out of Band Access. We build an entire parallel network that doesn’t
touch any of our routing infrastructure. on this we hang all our console
servers, PDUs and the management ports of all out equipment.
The team is now a week into the HotStage. They have a full simulation of the entire network and have verified connectivity. The next step is to go through multiple passes of testing and repairing things as they break.
With the InteropNet the main to focus on is not on how the network operates, but how it breaks. If we know where the weak points are, we can work on them and put monitoring in place to make sure that if (some will say when) there is a failure, the team know how to workaround it with zero downtime.
Of course, along the way we’ve had to do some running repairs that we hadn’t expected :
Fixing our transportation system.
Encouraging things to fit.
And checking the packing.
NEMA, L6-20, L5-30 and other arcane words like those will usually pass through the average network engineer without leaving a trace. But in the InteropNet people are learning fast that some of the basic requirements can’t be ignored :
Devices need power, and the plugs had better fit.

Of course, this isn’t always the case, and this year we have had an unusually high number of ’square peg, round hole’ issues . Time isn’t on our side with HotStage, so waiting another 5 days for someone to ship a replacement isn’t going to let us configure out equipment. So we often have to turn to less conventional methods to get out work done. If high voltage power scares you, you should probably look away now.

For those of you who are new to the InteropNet, or if you’re just wondering why we are doing all this, I should probably give you a bit of background.
What we’re doing here is “incorporating new technology, while stressing best practices”. This can be best described by these three key principles.
Innovate
Use the biggest, greatest newest stuff. We want to stay ahead of the curve, but we still have to be production ready. So we got our providers to give us the latest releases. After all, if they are out there selling you upgrades, we are going to show it to you in production.
Educate
No one is allowed to design or build something by themselves. During the course of this build everyone has done any or all of the following : Presented their design to their peers. Taught on how to do their design. Taught on how to use their infrastructure. Taught on how to troubleshoot
their infrastructure or just waxed lyrical on a particular RFC.
Calculate
Lastly, we are taking all of the information we are generating and instead of just dropping it in the Bit bucket we are going to use it to collate statistics on any and all values we care about (I think there is even a plan to see how many gallons of water we will consume during the install at Vegas). When someone later wants to know how the network behaved just after the first keynote. We want to be ready with an answer.
This week the InteropNet team begins building the network in preparation for moving it to the show in Las Vegas. We call this ‘Hot Stage’.
But it isn’t already without hick-ups. This Weekend we had a skeleton crew of about 5 staging the racks and preparing the fibre backbone, so it should have been smooth sailing. In the first 24 hours we’ve already had to experience some interesting things.
- We put the ears on about 8 switches before I realised they were on the wrong way.
- We had to get rid of an old couch due to a ‘rodent issue’.
- We have 140 SFPs missing in transit.
- PG&E had blocked the street while they did some work, so we couldn’t get lunch.
- APC have been diligently working to replace the bad wheels on about 4 racks with only a hand ratchet because we can’t locate one of out tool chests (this is what happens when you keep moving a network around the country)
- And, in the last hour PG&E cut power to the street and blacked out the whole block. Well, at least our UPS got a test.
By the end of the week we should have about 50 engineers in a secret location in San Jose building something beautiful, powerful and portable. Stay tuned for updates
This year the InteropNet will feature IPv6.
Essentially, we will be using a dual stack model with
router-advertisements. This will allow machines that are IPv6 enabled, to
get an Address, and to get to sites that are IPv6 reachable. We will
listen to DNS on both IPv4 and IPv6 so you can make your queries either
way.
The main reason we are doing this is so that we can gauge how many of you
out there are using IPv6 (either knowingly or not). We are going to track
how much traffic take the IPv6 route or the IPv4 and so on, as well as
trying to identify things like “How much of the Internet is IPv6 reachable
today” and “how many clients are out there are setup to use IPv6″
We’re probably also going to do this on the wireless network.
We’ve been a bit quiet about the Interop Network. This is because the team has been working on one of the fastest installs ever. They were given the hall on Sat Sept 13, and had to be operational for end users 24 hours later.
So it has been a lot of long hours and interesting times. As they started to assemble things they realized just how huge this network is. We re-shipped all the hardware we used for Las Vegas but apparently don’t have many spare parts left over. The network is also also co-locating with the Web 2.0 so there are a lot more wireless users and people sending real-time updated to the internet (twitter etc).
Last night the team came up for air and sent me a few interesting statistics.
- We have 25 volunteers and 40 vendor engineers.
- We have 4 x DS3s from the Internet.
- We have about 500 connections to the show floor.
- We have about 60 meeting rooms and conference areas.
- We have 90 active Access points.
- We are using 17kw of power.
- The average temperature in a rack is 77 degrees.
- There are never enough sodas.
If you are attending the show, There are tours of the network on Wed and Thurs. Just stop by the helpdesk for details.

May 15th, 2009 | Geoff Horne