问题描述
Deployment Azure Service Fabric 时,遇见了VMExtensionProvisioningError, 全文如下:
Deployment Name: 385A084F35B2BC55-xxxxOperation Id: 096CEDOperation: CreateCorrelation Id: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxxStatus: FailedError:Code: VMExtensionProvisioningErrorMessage: Multiple VM extensions failed to be provisioned on the VM.
Please see the VM extension instance view for other failures. The first extension failed due to the error:
VM Extension \'RunPowerShell_Frontdoor\' is marked as failed since it depends upon the VM Extension \'ServiceFabricNodeVmExt_vmNodeType1Name\' which has failed.
Help Link: https://aka.ms/ev2/errors/troubleshooting
问题解答
第一步:收集日志
需要远程连接(RDP)到SF的节点中, 在目录C:\\WindowsAzure\\Logs\\Plugins 中收集VM Extension的安装日志
第二步:分析日志
在Microsoft.Azure.ServiceFabric.ServiceFabricNode plugin 日志中,发现是由于在获取证书失败而出现的错误。
[09/09/2021 12:32:01.86] Executing: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.11\\ServiceFabricExtensionHandler.exe enable[09/09/2021 12:32:03.62] Execution Complete.######Execution Output:1> 2021-09-09T12:32:01.9973303Z: Information: Starting...1> 2021-09-09T12:32:01.9973303Z: Information: MachineName: Frontdoor0000001> 2021-09-09T12:32:01.9973303Z: Information: CurrentDirectory: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.111> 2021-09-09T12:32:01.9973303Z: Information: UserName: SYSTEM1> 2021-09-09T12:32:01.9973303Z: Information: Created mutex; this is the only instance of this process running1> 2021-09-09T12:32:02.3723445Z: Information: HandlerManifest Version: 11> 2021-09-09T12:32:02.4348520Z: Information: [HandlerEnvironment]1> 2021-09-09T12:32:02.4348520Z: Information: LogFolder: C:\\WindowsAzure\\Logs\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.111> 2021-09-09T12:32:02.4348520Z: Information: ConfigFolder: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.11\\RuntimeSettings1> 2021-09-09T12:32:02.4348520Z: Information: StatusFolder: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.11\\Status1> 2021-09-09T12:32:02.4348520Z: Information: HeartbeatFile: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.11\\Status\\HeartBeat.Json1> 2021-09-09T12:32:02.4348520Z: Information: DeploymentId: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx1> 2021-09-09T12:32:02.4348520Z: Information: RoleName: _Frontdoor_01> 2021-09-09T12:32:02.4348520Z: Information: Instance: _Frontdoor_01> 2021-09-09T12:32:02.5442327Z: Information: Opening config file: C:\\Packages\\Plugins\\Microsoft.Azure.ServiceFabric.ServiceFabricNode\\1.1.0.11\\RuntimeSettings\\1.settings1> 2021-09-09T12:32:02.5598642Z: Information: Config specified client cert FindBySubjectName \'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\' (store name = My)1> 2021-09-09T12:32:02.5754780Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with \'FindBySubjectName\' \'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\' in store \'My\' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting)1> 2021-09-09T12:32:02.5754780Z: Warning: Heartbeat: Not Ready: Cannot find certificate with \'FindBySubjectName\' \'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\' in store \'My\'1> 2021-09-09T12:32:02.5754780Z: Error: Heartbeat file updated, with error msg1> 2021-09-09T12:32:02.5911044Z: Error: ERROR: Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with \'FindBySubjectName\' \'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\' in store \'My\' at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy) at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting) at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings) at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry() at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable() at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\\bt\\1162492\\repo\\src\\HandlerExe\\Program.cs:line 52Execution Error:Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with \'FindBySubjectName\' \'xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\' in store \'My\'at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadClientCertificate(String findValue, String certStoreName, X509FindType findBy)at Microsoft.Azure.ServiceFabric.Extension.Core.CertificateUtility.LoadCertificateWrapper(CertificateSettings certificateSetting)at Microsoft.Azure.ServiceFabric.Extension.Core.Common.PublicSettingsExtensions.AllowAccessToCerts(PublicSettings publicSettings, ICertificateUtility certificateUtility)at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.ValidateDeployment(HandlerSettings settings)at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallService()at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.InstallServiceWithRetry()at Microsoft.Azure.ServiceFabric.Extension.Core.VMExtensionHandler.Enable()at Microsoft.Azure.ServiceFabric.Extension.Handler.Program.Main(String[] args) in X:\\bt\\1162492\\repo\\src\\HandlerExe\\Program.cs:line 52
最最关键的信息就是Microsoft.Azure.ServiceFabric.Extension.Core.AgentException: Cannot find certificate with \’FindBySubjectName\’ \’xxxxx.cluster.prod.ngpproxy.microsoftonline.cn\’ in store \’My\’ 。
第三步:继续查找不能找到证书的问题
在获取证书的日志中,发现SF是从KV(Key Vault: 密钥保管库)中获取的证书,而在第5步获取auth challenge时,从18:59分02秒 到 19:04分45秒,耗时超过了5 分钟。
- 2021-09-09 18:59:02: <info> [CertificateManager] Beginning refresh for: https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster
- 2021-09-09 18:59:02: <info> [WindowsCertificateStore] attempting to open store \’LocalMachine\\MY\’
- 2021-09-09 18:59:02: <debug> [WindowsCertificateStore] opening the \’LocalMachine\’ store..
- 2021-09-09 18:59:02: <debug> [WindowsCertificateStore] store opened successfully.
- 2021-09-09 18:59:02: <info> [KeyVaultClient] Getting new auth challenge
- 2021-09-09 19:04:45: <error> [CertificateManager] Refreshing \’https://xxxxx-chinanorth-kv.vault.azure.cn/secrets/cluster\’ failed with KeyVault exception: KeyVaultHttpClient.GetChallenge. – http_exception: what = \’WinHttpSendRequest: 12030: The connection with the server was terminated abnormall\’ code = \’windows:12030\’
- 2021-09-09 19:04:45: <info> [CertificateManager] Beginning refresh for:
由此可以推断,由于网络的延迟导致部署SF时,向Key Vault请求认证,由于请求超时而导致部署失败。
在多次重试重新部署SF,部署成功。