MCMC looking into TM outage behind global Internet slowdown
By A. Asohan June 13, 2015
- Malaysian telco slapped with #WhoBrokeTheInternet hashtag
- Bad network configuration saw it accepting too much international traffic
THE Malaysian Communications and Multimedia Commission (MCMC) said it is looking into the issue of a Telekom Malaysia Bhd (TM) outage that reportedly caused a slowdown in Internet services across the globe.
The outage, which took place for at least two hours beginning 4:30pm (MYT) yesterday (June 12), affected not only users of TM’s own Streamyx and UniFi broadband services, but is suspected to have affected other Malaysian Internet service providers (ISPs) which use its network.
More importantly, it affected customers of Broomfield, Colorado-based multinational ISP Level 3 Communications, which provides core transport, IP (Internet Protocol), and other services to large carriers across the world.
The issue was wide-ranging enough to have generated comments from Internet users and technical experts on social media, using the hashtag #WhoBrokeTheInternet.
ComputerWeekly reported that Internet traffic was disrupted in France, Germany, Italy, the United Kingdom and the United States, and affected a range of websites including eBay and Yahoo Mail, and gaming services on PSN and Xbox Live.
In a tweet, Level 3 said, “Our network is currently experiencing service disruptions affecting customers in Europe, Middle-East and Africa, and globally.
“Our technicians have isolated the issue and are currently completing the necessary remedial works. We are starting to see normal service return,” it said on June 12.
Level 3 has an extensive network across the world, saying it serves customers in “more than 500 markets in over 60 countries across a global services platform anchored by owned fibre networks on three continents and connected by extensive undersea facilities.”
Meanwhile, TM posted a notice on its website on June 12, but did not explain fully what caused the outage and why it affected worldwide Internet traffic.
“We identified the root cause and our network team immediately took steps to optimise traffic flows, while we worked to restore connectivity to its expected level of performance. The services were restored at 6.30pm on the same day.
“We would like to clarify that during a network reconfiguration exercise, we had unintentionally updated traffic routing information which caused congestion and packet loss to our international connectivity.
“This had affected the Internet traffic flow for some of our customers and some international traffic routes,” it said in its statement.
As at press time, Level 3 and TM had not responded to queries from Digital News Asia (DNA). MCMC told DNA that it is looking into the issue and will revert by Monday (June 15).
Cross-border protocol issue
ComputerWeekly and other US media reported that the issue was tied to TM’s use of the Border Gateway Protocol (BGP), the Internet’s routing protocol which allows providers to route traffic through each other’s networks.
BGP makes routing decisions based on paths, network policies, or rule-sets configured by a network administrator, and in this case, it appears that TM may have erred in configuring its network.
TM published a set of prefixes that it said it could handle, and Level 3 unfortunately accepted it without apparent due diligence on its part.
The TM network thus accepted traffic from Level 3 and was then responsible for delivering these data packets to their intended destinations – when it actually didn’t have the capacity to handle that amount of traffic, causing a massive surge.
Vancouver-based BGPmon, which has developed a tool that monitors BGP routing information in real time, reported that “for about two hours traffic was being redirected toward Telekom Malaysia, which in many cases would have been a longer route and also caused Telekom Malaysia to be overwhelmed with traffic.
“As a result significant portions of traffic were dropped, latency increased and users worldwide experienced a slower Internet service,” BGPmon said.
In Internet terms, there was a ‘route leak.’ An Internet industry expert who preferred anonymity said this was “like someone publishing signboards on the road saying that this is the fastest and shortest way to Thailand, when it is actually the furthest.”
“There were two errors here: TM publishing the wrong signboards, and Level 3 accepting the routes without checking,” the expert said.
Next-gen youth, next-gen Internet, and lightsabres
Internet speeds: It’s not just about the infra, silly
Have TIME and TM solved a US$201mil problem?
Bell Labs hits 10Gbps over copper telephone lines
For more technology news and the latest updates, follow us on Twitter, LinkedIn or Like us on Facebook.