❌

Reading view

There are new articles available, click to refresh the page.

Mitigated – Networking reduced availability in East US

What happened?

Between 13:09 UTC and 18:51 UTC on 18 March 2025, a platform issue resulted in an impact to a subset of Azure customers in the East US region. Customers may have experienced intermittent connectivity loss and increased network latency sending traffic within as well as in and out of East US Region.Β 

At 23:21 UTC on 18 March 2025, another impact to network capacity occurred during the recovery of the underlying fiber that customers may have experienced the same intermittent connectivity loss and increased latency sending traffic within, to and from East US Region.


What do we know so far?

We identified multiple fiber cuts affecting a subset of datacenters in the East US region at 13:09 UTC on 18 March 2025. The fiber cut impacted capacity to those datacenters increasing the utilization for the remaining capacity serving the affected datacenters. At 13:55 UTC on 18 March 2025, we began mitigating the impact of the fiber cut by load balancing traffic and restoring some of the impacted capacity; customers should have started to see service recover starting at this time. The restoration of traffic was fully completed by 18:51 UTC on 18 March 2025 and the issue was mitigated.Β 

At 23:20 UTC on 18 March 2025, another impact was observed during the capacity repair process. This was due to a tooling failure during the recovery process that started adding traffic back into the network before the underlying capacity was ready. The impact was mitigated at 00:30 UTC on 19 March after isolating the capacity impacted by the tooling failure.Β 

At 01:52 UTC on 19 March, the underlying fiber cut has been fully restored. We continue working to test and restore all capacity to pre-incident levels.Β 

Our telemetry data shows that the customer impact has been fully mitigated. We are continuing to monitor the situation during our capacity recovery process before confirming complete resolution of the incident.

An update will be provided in 3 hours, or as events warrant

Networking issues impacting Azure Services in East US2

Summary of Impact: As early as 22:00 UTC on 08 Jan 2025, we noticed a partial impact to some of the Azure Services in East US2 due to a configuration change in a regional networking service. The configuration change caused inconsistent service state. This could have resulted in intermittent Virtual machine connectivity issues or failures in allocating resources or communicating with resources in the region. The services impacted include Azure Databricks, Azure Container Apps, Azure Function Apps, Azure App Service, SQL Managed Instances, Azure Data Factory, Azure Container Instances, PowerBI, VMSS, PostgreSQL flexible servers etc. Customers using resources with Private Endpoint NSG communicating with other services would also be impacted.

The impact is limited to a single zone in East US2 region. No other regions are impacted by this issue.

Current Status:

As early as 22:00 UTC on 08 Jan 2025, service monitoring alerted us to a networking issue in East US2 impacting multiple services. As part of the investigation, it was identified that a network configuration issue in one of the zones resulted in three of the Storage partitions going unhealthy. As an immediate remediation measure, traffic was re-routed away from the impacted zone, which brought some relief to the non-zonal services, and helped with newer allocations. However, services that sent zonal requests to the impacted zone continued to be unhealthy. Some of the impacted services initiated their own Disaster Recovery options to mitigate some of them.

Additional workstreams to rehydrate the impacted zone by bringing back the impacted partitions to a healthy state have been ongoing as per the plan. To avoid any further impact, we are validating the fix on one of the partitions, and once that is confirmed, the mitigation will be applied to the other unhealthy partitions as well. We have completed the validation process successfully for one of the partitions and are working on applying the mitigation to all the partitions. Once the mitigation is applied, we intend to complete additional validations before bringing the partitions online.

We do not have an ETA available at this time, but we expect to be able to share more details on our progress in the next update. We continue to advise customers to execute Disaster Recovery to expedite recovery of their impacted services. Customers that have already failed out of the region should not fail back until this incident is fully mitigated. The next update will be provided in 1 hour or as events warrant.

For customers impacted due to Private Link, a patch was applied, and we confirm dependent services should be available.

We have been able to confirm that customers impacted by Azure Databricks, App Services multi-tenant, Azure Function Apps, Logic Apps, and Azure Synapse should start seeing some recovery.

❌