Mysql5.7 的错误日志中最常见的note级别日志解释

在使用mysql5.7的时候，发现了不少在mysql5.6上不曾见过的日志，级别为note, 作者梳理了一下，最常见的note日志以下三种，下面我们来逐个解释。

第一种，Aborted connection . 如上图，信息如下：

2016-03-17T14:44:24.102542Z 59 [Note] Aborted connection 59 to db: \’unconnected\’ user: \’mha\’ host: \’197.xx.xx.xx\’ (Got an error reading communication packets)

2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: \’unconnected\’ user: \’mha\’ host: \’197.xx.xx.xx(Got an error reading communication packets)

2016-03-17T14:44:31.273897Z 61 [Note] Aborted connection 61 to db: \’unconnected\’ user: \’mha\’ host: \’197.xx.xx.xx(Got timeout reading communication packets)

上面两个是连接中断的错误信息，原因不一样，Got an error reading communication packets 的原因是因为网络等原因导致。Got timeout reading communication packets 这个错误的原因是会话的idle时间达到了数据库指定的timeout时间。

第二种：SLAVE多线程同步的信息

如图，信息如下：

2016-03-14T15:48:26.432150Z 73 [Note]Multi-threaded slave statistics for channel \’\’: seconds elapsed = 121; eventsassigned = 100374529; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1451875661700 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000

2016-03-14T15:50:28.398745Z 73 [Note]Multi-threaded slave statistics for channel \’\’: seconds elapsed = 122; eventsassigned = 100500481; worker queues filled over overrun level = 0; waited due aWorker queue full = 0; waited due the total size = 0; waited at clock conflicts= 1452001865500 waited (count) when Workers occupied = 3211993 waited whenWorkers occupied = 445032386000

我们通过源代码，找到下面一段，该段实现了上述日志的输出。

if ((my_now – rli->mts_last_online_stat)>=

mts_online_stat_period)

{

sql_print_information(\”Multi-threadedslave statistics%s: \”

\”seconds elapsed = %lu; \”

\”events assigned = %llu; \”

\”worker queues filled over overrun level = %lu;\”

\”waited due a Worker queue full = %lu; \”

\”waited due the total size = %lu; \”

\”waited at clock conflicts = %llu \”

\”waited(count) when Workers occupied = %lu \”

\”waited when Workers occupied = %llu\”,

rli->get_for_channel_str(),

static_cast<unsignedlong>

(my_now – rli->mts_last_online_stat),

rli->mts_events_assigned,

rli->mts_wq_overrun_cnt,

rli->mts_wq_overfill_cnt,

rli->wq_size_waits_cnt,

rli->mts_total_wait_overlap,

rli->mts_wq_no_underrun_cnt,

rli->mts_total_wait_worker_avail);

rli->mts_last_online_stat=my_now;

通过这一句(my_now – rli->mts_las8000t_online_stat), 以及最后一句rli->mts_last_online_stat=my_now; 可以得知， seconds elapsed 就是上一次统计跟这一次统计的时间间隔。

而mts_online_stat_period =120秒，硬代码，因此就是几乎每隔120秒，就有上述日志的输出。通过进一步查看原代码，初步了解上述日志信息的含义，如下：

events assigned：总共有多少个event被分配执行，计的是总数。

worker queues filled over overrun level：多线程同步中，worker 的私有队列长度超长的次数，计的是总数。

waited due a Worker queue full ：因为worker的队列超长而产生等待的次数，计的是总数。

waited due the total size ：超过最大size的次数，这个由参数slave_pending_jobs_size_max 指定。

waited at clock conflicts ：因为逻辑时间产生冲突的等待时间，单位是纳秒。

waited (count) when Workers occupied ：因为workder被占用而出现等待的次数。（总计值）。

waited when Workersoccupied ：因为workder被占用而出现等待的总时间，总计值，单位是纳秒。

第三种：page_cleaner线程的输出日志

如图，信息如下：

2016-03-24T17:45:28.005117Z 0 [Note] InnoDB:page_cleaner: 1000ms intended loop took 4750ms. The settings might not beoptimal. (flushed=1519 and evicted=0, during the time.)

查找源代码，发现是上面的日志由下面一段代码输出：

if (ret_sleep == OS_SYNC_TIME_EXCEEDED) {

ulint curr_time = ut_time_ms();

if (curr_time >next_loop_time + 3000) {

if (warn_count == 0) {

ib::info() << \”page_cleaner: 1000ms\”

\” intended loop took \”

<<1000 + curr_time

– next_loop_time

<<\”ms. The settings might not\”

\” be optimal. (flushed=\”

<<n_flushed_last

<<\” and evicted=\”

<<n_evicted

<<\”, during the time.)\”;

if (warn_interval >300) {

warn_interval= 600;

}else {

warn_interval*= 2;

}

warn_count= warn_interval;

} else {

–warn_count;

}

} else {

/* reset counter */

warn_interval= 1;

warn_count= 0;

}

next_loop_time= curr_time + 1000;

n_flushed_last= n_evicted = 0;

}

通过分析源代码，输出上述日志的条件是curr_time> next_loop_time + 3000 ，即比原定的循环时间next_loop_time多3000毫秒，而next_loop_time的标准时间是1000毫秒，即1秒钟做一次刷新页的操作。

loop took 4750ms ，即是这次刷新循环的实际经历时间。

后面还有一个(flushed=1519 and evicted=0,during the time.)这样的日志，即对应n_flushed_last与n_evicted 变量，而这两个变量又由n_flushed_list与n_flushed_lru赋值。

./storage/innobase/buf/buf0flu.cc:3322: n_flushed_last +=n_flushed_list;

./storage/innobase/buf/buf0flu.cc:3321: n_evicted += n_flushed_lru;

而n_flushed_list与n_flushed_lru赋值函数为pc_wait_finished，如下，我们来看看该函数的注释。

pc_wait_finished(&n_flushed_lru,&n_flushed_list);

/**

Wait until all flush requests are finished.

@param n_flushed_lru numberof pages flushed from the end of the LRU list.

@param n_flushed_list numberof pages flushed from the end of the

flush_list.

@return trueif all flush_list flushing batch were success. */

static

bool

pc_wait_finished(

ulint* n_flushed_lru,

ulint* n_flushed_list)

{

。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。。

}

通过源代码的注释，我们获知如下信息：

n_flushed_lru number of pages flushed from the end of the LRU list.

n_flushed_lru 这个值表示从lru 列表尾部刷新的页数，也就是日志中如evicted=0指标的所表示的值，如果该值不为零，则存在innodb buffer不够的嫌疑。

n_flushed_list 这个是从刷新列表中刷新的页数，也就是脏页数，也就是日志中flushed=1519 的值。

相关推荐

热门文章

热门标签

回顶部