今天配置了一下group replication,第一次配置,不熟悉,所以是边配边学边解决问题的过程,供未配置过但准备配置group replication的朋友参考。
大致过程如下:
需要在配置文件中增加以下参数。例如,对于192.168.153.153机器,设置my.cnf中的有关group replication的参数如下:
# for use gtid
gtid_mode=ON
enforce_gtid_consistency=ON
log_slave_updates=ON
#add for group replication
master_info_repository=TABLE
relay_log_info_repository=TABLE
binlog_checksum=NONE
transaction_write_set_extraction=XXHASH64
loose-group_replication_group_name=\”aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa\”
loose-group_replication_start_on_boot=on
loose-group_replication_local_address= \”192.168.153.153:33061\”
loose-group_replication_group_seeds= \”192.168.153.44:33061,192.168.153.153:33061,192.168.153.154:33061\”
loose-group_replication_bootstrap_group=off
loose-group_replication_ip_whitelist=\”192.168.153.44/24\”
对于192.168.153.154节点 则只需要将loose-group_replication_local_address设置为 \”192.168.153.154:33061\”,其他参数均相同.
对于192.168.153.44节点 则只需要将loose-group_replication_local_address设置为 \”192.168.153.44:33061\”,其他参数均相同.
第一次启动节点后,需要安装一个group_replicaiton 的plugin . 以及配置复制,进行初始配置。每个节点都需要执行如下两个命令:
INSTALL PLUGIN group_replication SONAME \’group_replication.so\’;
CHANGE MASTER TO MASTER_USER=\’rpl_user\’, MASTER_PASSWORD=\’rpl_pass\’ FOR CHANNEL \’group_replication_recovery\’;
这个复制跟普通的change master命令有区别,并不需要指定master是谁,但需要指定通道为\’group_replication_recovery\’。
配置完成后,启动集群中的第一个节点的group_replication时,需要设置boostrap参数。 其他的节点直接使用START GROUP_REPLICATION;即可。
SET GLOBAL group_replication_bootstrap_group=ON;
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF;
至此,整个集群配置完毕之后,当某个节点宕机之后,使用常规的mysql启动命令进行重启即可。无需再手动配置,节点重启后将自动加入复制集群。
对于配置中遇到一些错误:
2017-05-03T15:06:01.854288+08:00 0 [Warning] Plugin group_replication reported: \'[GCS] Connection attempt from IP address 197.3.153.44 refused. Address is no
t in the IP whitelist.\’
2017-05-03T15:06:01.854395+08:00 0 [ERROR] Plugin group_replication reported: \'[GCS] Error connecting to the local group communication engine instance.\’
错误摘要:来源IP没有在白名单列表中,所以连接拒绝。
原因: 是因为之前没有设置group_replication_ip_whitelist这个参数,默认值为127.0.0.1/8, 因为集群中的所有的节点为同一网段,因此在这里设置为\”192.168.153.44/24\”. 但实际上可以更严格,指定集群内的所有的IP地址。
2017-05-03T16:18:31.963751+08:00 12 [Note] Plugin group_replication reported: \’Establishing group recovery connection with a possible donor. Attempt 1/10\’
2017-05-03T16:18:31.988010+08:00 12 [Note] \’CHANGE MASTER TO FOR CHANNEL \’group_replication_recovery\’ executed\’. Previous state master_host=\’\’, master_port=
3358, master_log_file=\’\’, master_log_pos= 4, master_bind=\’\’. New state master_host=\’MYSQLTMP\’, master_port= 3358, master_log_file=\’\’, master_log_pos= 4, mast
er_bind=\’\’.
2017-05-03T16:18:32.037211+08:00 12 [Note] Plugin group_replication reported: \’Establishing connection to a group replication recovery donor f7e2c834-2a5f-11
e7-b622-005056aa17e6 at MYSQLTMP port: 3358.\’
2017-05-03T16:18:32.037714+08:00 14 [Note] Slave I/O thread: Start asynchronous replication to master \’@MYSQLTMP:3358\’ in log \’FIRST\’ at position 4
2017-05-03T16:18:32.037796+08:00 14 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore no
t recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the \’START SLAVE Syntax\’ in the MySQL Manual for more info
rmation.
2017-05-03T16:18:32.037824+08:00 14 [ERROR] Slave I/O for channel \’group_replication_recovery\’: Fatal error: Invalid (empty) username when attempting to
connect to the master server. Connection attempt terminated. Error_code: 1593
错误摘要:使用空账号去连接master server.
原因:没有配置同步账号跟密码,使用的是空密码进行同步。 需要为复制通道group_replication_recovery设置同步信息。命令如下。
CHANGE MASTER TO MASTER_USER=\’mysqlsync\’, MASTER_PASSWORD=\’mysqlsync_password\’ FOR CHANNEL \’group_replication_recovery\’;
2017-05-03T15:45:02.582665+08:00 25 [Note] Slave I/O thread: Start asynchronous replication to master \’mysqlsync@MYSQLTMP:3358\’ in log \’FIRST\’ at position 4
2017-05-03T15:45:02.582726+08:00 25 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore no
t recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the \’START SLAVE Syntax\’ in the MySQL Manual for more info
rmation.
2017-05-03T15:45:02.611559+08:00 25 [ERROR] Slave I/O for channel \’group_replication_recovery\’: error connecting to master \’mysqlsync@MYSQLTMP:3358\’ – retry-
time: 60 retries: 1, Error_code: 2005
错误摘要: 没能连接master,即primary节点。
原因:是没有识别到MYSQLTMP机器的IP地址信息, 需要将集群中的所有节点的hostname加入/etc/host文件中。