If the info has spelling and grammer errors, please remember that network engineers post here, live, without any executive review.
1/10/2012 2:39 PM San Jose - core router reboot. Emergency. We have noticed some odd routing issues to properly clear and reset tables we must reboot. Could result in some BGP Flapping for upto 20 minutes. Some sites work others do not...then settles down and all work.
10/21/2011 7 AM San Jose - All Clear - As you know we have been monitoring carefully, with techs and spares standing by day & night. It has been more than 16 hours since an incident occurred. Looks like the tech team may have resolved this very intermittent hardware problem yesterday afternoon.
10/20/2011 3:45 PM San Jose - we have had a tech with spares all day on site...waiting for intermittent issue to arise. It happened again. It is hard to track these down when they dont remain hard failure. But we have multiple spares and the tech in the datacenter all day. Rest assured that, if this persists we will begin moving C.O. paths and rerouting customers to another core router.
10/20/2011 San Jose core router, 5 min outage plus BGP flap on some neighbors during repair. Maybe, we replaced the wrong processor card yesterday (there are 2- master & slave on the same buss)...Some customer experienced slowness and dropped packets...our continued monitoring and testing revealed we should replace the second processor too , before a hard failure occurred. Tech is standing by while we monitor.
10/19/2011 Slow repair (up/down during 1 hour) to minimize impact. Event is over - Memory issue reappeared in San Jose. Impact was multiple momentary up-downs and BGP flaps in San Jose due to the attempt to narrow down the memory issue using the spares. Default routes to other core router paths were used to help minimize impact during the transitions. Due to yesterday's event we had spares prepared with the correct code images and config files for this core router and San Jose is now operating on on new processor card with the original slave card in place.
10/18/2011 5:25 PM - Event is over- minor impact - Less than 3 mins total. San Jose Core Router had memory issue due to older code version and the fact it had not been rebooted in more than 2 years. This required an emergency reboot. With tech standing in front of unit , we rebooted it. Down time 5 mins. Customers routing through San Jose Data Center were effected.
6/15/2011 9:30PM-10:30PM Los Angeles Area - Intermittent controller module problem in main core router. This has dual CPU blades. Master had flapping issues and had to be pulled. It would go bad, watchdog flipped slave to master...then watchdog would find the original master to be good again and flip back, then that blade would fail again .... Looping. Pulled bad CPU module. Problem resolved. New Module will be installed this week and should not cause any issue when inserted. We now suspect this caused the 6/1/2011 incident and it took this long to fail harder.
6/1/2011 10:30PM-11:30PM Los Angeles Area - Event is Over. Power Surge in one of our racks at One Wilshire caused memory leak in core router. Most customers in Southern California effected. No events logged in any routers. Very unusual. All blades pass diagnostics. Event is over.


