欢迎访问 生活随笔!

凯发k8官方网

当前位置: 凯发k8官方网 > 运维知识 > 数据库 >内容正文

数据库

sqlserver镜像状态异常排查 -凯发k8官方网

发布时间:2025/1/21 数据库 21 豆豆
凯发k8官方网 收集整理的这篇文章主要介绍了 sqlserver镜像状态异常排查 小编觉得挺不错的,现在分享给大家,帮大家做个参考.

title: sqlserver · case分析 · 镜像状态异常排查

author: 天铭

问题

用户实例的某个db一直处于synchronizing无法达到synchronized状态,用了很多修复方法并且进行了镜像重搭,但依旧没有达到正常同步态

排查

synchronizing态通常和redo queue或者send queue有关

主库的问题db当前的tps在3k左右

begin tran declare @value int declare @value2 int select @value=convert(int,cntr_value) from sys.dm_os_performance_counters where object_name like '****:database%%'and instance_name='businesscard' and counter_name like 'transactions/ sec%%' waitfor delay '00:00:01' select @value2=convert(int,cntr_value) from sys.dm_os_performance_counters where object_name like '****:database%%'and instance_name='businesscard' and counter_name like 'transactions/ sec%%' select @value2-@value commit tran

主库的发送队列

begin tran declare @value int declare @value2 int select @value=convert(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'log send queue kb' and instance_name='businesscard'; waitfor delay '00:00:01' select @value2=convert(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'log send queue kb' and instance_name='businesscard'; select @value*1./1024 as first_second_mb,@value2*1./1024 as second_second_mb,(@value2-@value)*1./1024 as diff_mb commit tran

备库的应用队列

begin tran declare @value int declare @value2 int select @value=convert(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'redo queue kb' and instance_name='businesscard' waitfor delay '00:00:01' select @value2=convert(int,cntr_value) from sys.dm_os_performance_counters where counter_name = 'redo queue kb' and instance_name='businesscard'; select @value*1./1024 as first_second_mb,@value2*1./1024 as second_second_mb,(@value2-@value)*1./1024 as diff_mb commit tran

备库的应用速度大概在 14mb

begin tran declare @value bigint declare @value2 bigint select @value=convert(bigint,cntr_value)*1./1024/1024 from sys.dm_os_performance_counters where counter_name = 'redo bytes/sec' and instance_name='businesscard'; waitfor delay '00:00:01' select @value2=convert(bigint,cntr_value)*1./1024/1024 from sys.dm_os_performance_counters where counter_name = 'redo bytes/sec' and instance_name='businesscard'; select (@value2-@value) as speed_mb commit tran

假设忽略主库新产生的日志,追上主库需要的时间 32min
select 27338.278320/14/60 =32.545569428566
如果再算上主库新产生的大概在40min左右

查看其它counter值

select cntr_value,* from sys.dm_os_performance_counters where counter_name in( 'log send flow control time (ms)','bytes sent/sec','log bytes sent/sec', 'log compressed bytes sent/sec','log harden time (ms)','log send flow control time (ms)', 'log send queue kb','mirrored write transactions/sec','pages sent/sec', 'send/receive ack time','sends/sec','transaction delay' ) and instance_name='businesscard';

send/receive ack time:
milliseconds that messages waited for acknowledgement from the partner, in the last second.
this counter is helpful in troubleshooting a problem that might be caused by a network bottleneck, such as unexplained failovers, a large send queue, or high transaction latency. in such cases, you can analyze the value of this counter to determine whether the network is causing the problem.

开始怀疑问题在网络上,该主机某个网络组建已经升级了,但是备机的组件未升级
等待确认的时间稳定在800ms左右,对比主备网络组件都升级的主机在100ms左右,这个值跟事务大小有关,目前发现有问题的地方可能是这里

begin tran declare @value bigint declare @value2 bigint select @value=convert(bigint,cntr_value) from sys.dm_os_performance_counters where counter_name = 'send/receive ack time' and instance_name='_total'; waitfor delay '00:00:01' select @value2=convert(bigint,cntr_value) from sys.dm_os_performance_counters where counter_name = 'send/receive ack time' and instance_name='_total'; select (@value2-@value) as 'send/receive ack time' commit tran

另外查看errorlog,时有告警镜像的状态应该是介于suspended和synchronizing之间,错误类似kb,2008r2 应该已经修复。

半小时后发送队列又变大了

备库的应用队列减小的也非常慢

考虑单点时间过久,只能重搭解决,但重搭后追日志还是出现此问题,怀疑可能有坏页或者其它未知情况,计划在维护时间做checkdb。

--- ---
checkdb发现没有坏页,说明数据是ok的,那么问题可能在日志了。

select log_reuse_wait,log_reuse_wait_desc,* from sys.databases where name='businesscard'

log_reuse_wait_desc 为 active_transaction

active_transaction:事务处于活动状态。
一个长时间运行的事务可能存在于日志备份的开头。在这种情况下,可能需要进行另一个日志备份才能释放空间。
事务被延迟(仅适用于 sql server 2005 enterprise edition 及更高版本)。“延迟的事务”是有效的活动事务,因为某些资源不可用,其回滚受阻。

dbcc opentran('businesscard')

15号的事务今天19号,导致中间的日志全是活动日志无法截断,跟用户确认kill掉后,再次查看活动日志

dbcc opentran('businesscard')

发现结果一直在变化,可以理解为活动日志一直在往前走(越来越少),再次备份后活动日志恢复到7g左右,重搭成功。

事实上这是很早以前处理的一个case,当时排查了很久才找到root cause但现在看来问题其实很简单,希望大家下次遇到可以很快处理好

总结

以上是凯发k8官方网为你收集整理的sqlserver镜像状态异常排查的全部内容,希望文章能够帮你解决所遇到的问题。

如果觉得凯发k8官方网网站内容还不错,欢迎将凯发k8官方网推荐给好友。

网站地图