环境描述
自行下载rpm包本地安装或者wget创建PostgreSQL的源文件。本次记录为内网服务器,rpm本地安装,版本为PostgreSQL-9.6,经测试,9.6主从流复制可切换,9.4测试未果。
位于/share/rpms/psql-need/目录下。
机器规划
Master server
系统版本 | 机器配置 | 主机名 | IP |
---|---|---|---|
CentOS Linux release 7.4.1708 | 48C/256G/33TB | Bus-App-21 | 190.176.35.170 |
CentOS Linux release 7.4.1708 | 48C/256G/33TB | Bus-App-22 | 190.176.35.171 |
Master Server
以下命令在rpm包目录下执行。
# yum install ./postgresql96\*# mkdir -p /data/pgsql-9.6/archive# chown postgres.postgres /data/pgsql-9.6# chmod 0700 /data/pgsql-9.6# vim /usr/lib/systemd/system/postgresql-9.6.serviceEnvironment=pgsql-9.6=/data/pgsql-9.6/# /usr/pgsql-9.6/bin/initdb -D /data/pgsql-9.6# systemctl enable postgresql-9.6# systemctl start postgresq-9.6# passwd postgres# su postgres$ psql –U postgres=$ alter user postgres with password ‘Ur passwd’;=$\q# vim /data/pgsql-9.6/postgresql.conflisten_addresses = '*'max_connections = 500superuser_reserved_connections = 5shared_buffers = 128MBdynamic_shared_memory_type = posixwal_level = hot_standbysynchronous_commit = localarchive_mode = onarchive_command = 'cp %p /data/pgsql-9.6/archive/%f'max_wal_senders = 5# vim /data/pgsql-9.6/pg_hba.confhost replication postgres 127.0.0.1/32 md5host replication postgres 190.176.35.0/24 md5# systemctl restart postgresql-9.6
Slave Server
以下命令在rpm目录下执行。
# yum install postgresql96*# vim /usr/lib/systemd/system/postgresql-9.6.serviceEnvironment=pgsql-9.6=/data/pgsql-9.6/# mkdir -p /data/pgsql-9.6/archive# chown postgres.postgres /data/pgsql-9.6#chmod 0700 /data/pgsql-9.6# su postgres$ pg_basebackup -h 190.176.35.170 -U postgres -D /data/pgsql-9.6 -P --xlog$ vim /data/pgsql-9.6/postgresql.confhot_standby = onmax_connections = 510 #于主库数值max_wal_senders = 5$ cp /usr/pgsql-9.6/share/recovery.conf.sample /data/pgsql-9.6/$ mv /data/pgsql-9.6/recovery.conf.sample /data/pgsql-9.6/recovery.conf$ vim /data/pgsql-9.6/recovery.confrecovery_target_timeline = 'latest'standby_mode = onprimary_conninfo = 'host=主库’s IP port=5432 user=postgres password=密码'trigger_file = '/data/pgsql-9.6/trigger.pg$ exit# systemctl enable postgresql-9.6# systemctl start postgresql-9.6
测试
在Master所在服务器查看状态:
$ psql -c "select application_name,state,sync_priority,sync_state from pg_stat_replication"application_name | state | sync_priority | sync_state------------------+-----------+---------------+------------ Bus-App-22 | streaming | 1 | sync(1 row)
测试复制功能
Keepalived-1.2
自行安装软件并测试使用
配置文件配置exm如下
# cat /etc/keepalived/keepalived.confglobal_defs { smtp_server 127.0.0.1 smtp_connect_timeout 30 router_id pg}vrrp_script chk_pgsql { script "/etc/keepalived/scripts/pgsql_check.sh" interval 1 weight 2}vrrp_instance VI_1 { state MASTER #备机为BACKUP interface enp2s0f0 virtual_router_id 61 priority 100 #备机为80 nopreempt advert_int 1 authentication { auth_type PASS auth_pass 1111 } track_script { chk_pgsql } virtual_ipaddress { 190.176.35.210 }}
测试vip的浮动
Keepalived脚本
脚本路径可自己在keeplived.conf中定义,注意脚本权限。
# cat /etc/keepalived/scripts/pgsql_check.sh#!/bin/bash#判断pg是否活着A=`ps -C postmaster --no-header | wc -l`#判断vip浮到哪里B=`ip a | grep 190.176.35.210 | wc -l`#判断是否是从库处于等待的状态C=`ps -ef | grep postgres | grep 'startup process' | wc -l`#判断从库链接主库是否正常D=`ps -ef | grep postgres | grep 'receiver' | wc -l`#判断主库连接从库是否正常E=`ps -ef | grep postgres | grep 'sender' | wc -l`#如果pg死了,将消息写入日记并且关闭keepalivedif [ $A -eq 0 ];then echo "\`date "+%Y-%m-%d--%H:%M:%S"\` postgresql stop so vip stop " >> /etc/keepalived/log/check_pg.log systemctl stop keepalivedelse #判断出主挂了,vip浮到了从,创建trigger,pg备机进入recovery模式,赋予读写权限 if [ $B -eq 1 -a $C -eq 1 -a $D -eq 0 \];then su - postgres -c "touch /data/pgsql-9.6/[trigger.pg](http://trigger.pg/)" echo "`date "+%Y-%m-%d--%H:%M:%S"` standby promote " >> /etc/keepalived/log/check_pg.log fi #判断出自己是主并且和从失去联系 if [ $B -eq 1 -a $C -eq 0 -a $D -eq 0 -a $E -eq 0 ];then sleep 10 echo "`date "+%Y-%m-%d--%H:%M:%S"` can't find standby " >> /etc/keepalived/log/check_pg.log fifi
以下命令为手动指定keepalived配置文件,并启动的命令
# /usr/local/keepalived/sbin/keepalived -D -f /usr/local/keepalived/etc/keepalived/keepalived.conf
异步主从流复制主从切换与恢复MASTER
以下红色标记的命令为流复制主从切换,与恢复MASTER的操作
模拟关闭主库,演练MASTER服务宕机
# systemcrl stop postgresql-9.6
SLAVE上操作
在从库创建trigger(此步骤部署于高可用脚本中,触发自动推读写数据库切换)
$ touch /data/pgsql-9.6/trigger.pg
旧备库recovery.conf变为recovery.done,旧备库成为新主库
恢复操作
MASTER上操作
$ mv /data/pgsql-9.6/recovery.done /data/pgsql-9.6/recovery.conf
如第一次触发HA,则将recovery.done拷贝到MASTER,并修改以下内容
primary_conninfo = 'host=新主库’s IP port=5432 user=postgres password=密码'
如第一次出发HA,则修改MASTER配置文件,MASTER现在为新备库,所以postgersql.conf中的max_connections应该大于新主库
$ vim /data/pgsql-9.6/postgresql.confhot_standby = on #检查是否开启max_connections = (每次恢复都必须修改新备机的这条配置,并比新主机的大)
启动新备库与新备库的keepalived
# systemctl restart postgresql-9.6
以上操作完后应该主备流复制MASTER为备receiver,SLAVE为主sender。使用命令# ps -ef | grep postgres可查看
以下一条操作VIP从SLAVE跳至MASTER
# systemctl restart keepalived# ip a #查看vip应跳回MASTER
SLAVE上操作
关闭新主库postgresql服务
# systemctl stop postgresql-9.6$ mv /data/pgsql-9.6/recovery.done /data/pgsql-9.6/recovery.conf$ vim /data/pgsql-9.6/postgresql.confmax_connections = (每次恢复都必须修改新备机的这条配置,并比新主机的大)# systemctl start postgresql-9.6# systemctl start keepalived
以上操作完后应该主备流复制MASTER为备sender,SLAVE为主receiver。使用命令# ps -ef | grep postgres可查看
在MASTER上查看流复制状态
$ psqlpostgres=# select client_addr,state,sync_state from pg_stat_replication; client_addr | state | sync_state----------------+-----------+------------ (主库’s IP) | streaming | async(1 row)
MASTER为发送端,不存在startup process并存在sender进程判定为主库
$ ps -ef | grep postgrespostgres 378 1 0 11:32 pts/0 00:00:00 /usr/pgsql-9.6/bin/postgres -D /data/pgsql-9.6postgres 379 378 0 11:32 ? 00:00:00 postgres: logger process postgres 381 378 0 11:32 ? 00:00:00 postgres: checkpointer process postgres 382 378 0 11:32 ? 00:00:00 postgres: writer process postgres 383 378 0 11:32 ? 00:00:00 postgres: wal writer process postgres 384 378 0 11:32 ? 00:00:00 postgres: autovacuum launcher process postgres 385 378 0 11:32 ? 00:00:00 postgres: archiver process postgres 387 378 0 11:32 ? 00:00:00 postgres: stats collector process postgres 464 378 0 11:47 ? 00:00:00 postgres: wal sender process postgres 192.168.40.185(50096) streaming 0/90038A0root 1293 1258 0 14:13 pts/1 00:00:00 su postgrespostgres 1294 1293 0 14:13 pts/1 00:00:00 bashpostgres 1417 1294 0 14:38 pts/1 00:00:00 ps -efpostgres 1418 1294 0 14:38 pts/1 00:00:00 grep postgres
SLAVE为接收端,存在startup process与wal receiver进程并存则判定为备库
$ ps -ef | grep postgresroot 1293 1258 0 14:13 pts/1 00:00:00 su postgrespostgres 1294 1293 0 14:13 pts/1 00:00:00 bashpostgres 1448 1 0 14:43 pts/1 00:00:00 /usr/pgsql-9.6/bin/postgres -D /data/pgsql-9.6postgres 1449 1448 0 14:43 ? 00:00:00 postgres: logger process postgres 1450 1448 0 14:43 ? 00:00:00 postgres: startup process recovering 00000003000000000000000Apostgres 1451 1448 0 14:43 ? 00:00:00 postgres: checkpointer process postgres 1452 1448 0 14:43 ? 00:00:00 postgres: writer process postgres 1453 1448 0 14:43 ? 00:00:00 postgres: stats collector process postgres 1454 1448 0 14:43 ? 00:00:00 postgres: wal receiver process streaming 0/A001060postgres 1457 1294 0 14:44 pts/1 00:00:00 ps -efpostgres 1458 1294 0 14:44 pts/1 00:00:00 grep postgres
测试新主库读写
测试数据同步功能
注意事项
- 修改默认data目录时,要注意新目录权限必须为0700,属主属组必须为postgres。
- 务必设置postgres用户的数据库密码:postgres=# alter user postgres with password ‘Ur passwd’;,否则配置主从流复制,备机pg_basebackup会出现认证错误。
- 注意pg_hba.conf 文件的权限问题。
- psql: FATAL: the database system is starting up.修改配置文件hot_standby = on。
- Keepalived脚本中ps -C postmaster,如果是使用postgres用户启动则是ps -C postgres,其他用户为postmaster。