All systems operational

Resolved
Outage of EVPN services due to problems with the underlying L2 Transport Provider

Started
March 06, 2021 at 3:00:00 PM GMT+0
Status
Resolved after 6 days
Affected
Services
EVPN
Upstreams & Peering
NL-ix

Impact

Degraded performance
  • Resolved
    March 12, 2021 at 8:09:31 AM GMT+0

    We've found one last issue with not matching MTU sizes on the transport path which caused some MAC adresses not to be advertised in the EVPN correctly. This issue is now fixed as well and and all services got checked by hand to make sure everything is working.

  • Monitoring
    March 11, 2021 at 9:35:00 AM GMT+0

    While troubleshooting the issue in our lab, we've identified the bug causing the issue. The performance degradation is caused by a bug in combination with the Intel NICs in the EVPN Node and the Juniper Switch terminating the cross-connect and providing the L2 service. For some reason the LLDP Daemon builtin into the Intel NIC causes the Juniper Switch to learn some faulty settings. Once the LLDP Daemon on the NIC is disabled and the link is reset (by shutting and un-shutting) the correct values are learned by the switch and the performance is normal.

    The L2 Transport between the DCs is in use since 10:35 CET and therefore the performance degragation is solved. If you still encounter any issues, feel free to hit us up!

  • Identified
    March 10, 2021 at 7:50:03 PM GMT+0

    Sadly there are no news regarding the L2 Transport Provider, they are still not answering to our mails and neither respond to calls on their 24/7 NOC phone number. We‘re in contact with the different sales teams of other partner companies to be able to get another connection up and running as quickly as possible.

  • Identified
    March 09, 2021 at 8:00:00 PM GMT+0

    We were able to establish a temporary solution by routing the EVPN traffic through another link. The affected services should be back up and running. Please take into account that the performance and latency of the services is still impacted.

  • Identified
    March 08, 2021 at 8:50:00 AM GMT+0

    We've swapped our hardware on both ends of the links to make sure the issue of the transport service not coming up properly is not caused by our hardware. Sadly the measures did not change the situation. We're still struggling in getting a clear feedback by the L2 Transport Provider.

  • Identified
    March 07, 2021 at 7:00:00 PM GMT+0

    We've not yet recieved any feedback of the L2 Transport Provider. We're looking into other options to transport the affected network services to the customers.

  • Identified
    March 06, 2021 at 3:00:00 PM GMT+0

    We're struggling to bring the EVPN Services back up due to problems with the underlying L2 Transport Provider. Due to the migration of the services from FRA3 to FRA2, the redundant path is not yet in place. We're waiting for the provider to react to our incident and to take measures from their end.