前言 环境说明 集群节点规划 Redis 集群至少一共需要 6 个节点,包括 3 个 Master 节点和 3 个 Slave 节点,且每个 Master 节点对应 1 个 Slave 节点,对应的关系如下:
1 Master –> 1 Slave,Redis 集群需要 6 个节点,如图所示 1 Master –> 2 Slave,Redis 集群需要 9 个节点,以此类推,如图所示 名称 IP 端口 Master 192.168.109 7001 Master 192.168.109 7002 Master 192.168.109 7003 Slave 192.168.109 7004 Slave 192.168.109 7005 Slave 192.168.109 7006
Redis 集群特性 Redis 集群的优点 无中心架构,分布式提供服务。数据按照 slot
存储分布在多个 Redis 实例上。增加 Slave 做 Standby 数据副本,用于 Failover,使集群快速恢复。实现故障 Auto Failover,节点之间通过 gossip
协议交换状态信息;投票机制完成 Slave 到 Master 角色的提升。支持在线增加或减少节点,降低硬件成本和运维成本,提高系统的扩展性和可用性。
Redis 集群的缺点 客户端实现复杂,驱动要求实现 Smart Client,缓存 Slots Mapping 信息并及时更新。目前仅 JedisCluster 相对成熟,异常处理部分还不完善。客户端的不成熟,影响应用的稳定性,提高开发难度。节点会因为某些原因发生阻塞(阻塞时间大于 clutser-node-timeout),被判断为下线。这种 Failover 是没有必要的,Sentinel 模式也存在这种切换场景。
Redis 集群搭建 系统初始化 1 2 3 4 5 6 7 8 9 # echo "net.core.somaxconn = 1024" >> /etc/sysctl.conf# echo "vm.overcommit_memory = 1" >> /etc/sysctl.conf# echo "echo never > /sys/kernel/mm/transparent_hugepage/enabled" >> /etc/rc.local # reboot
创建 Redis 用户 1 2 3 4 5 # groupadd redis# useradd -g redis redis -s /bin/false
Redis 编译安装 Redis 各版本可以从官网 下载,这里使用的版本是 6.0.6
。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 # apt -get install -y build-essential tcl pkg-config# wget http://download.redis.io/releases/redis-6.0.6.tar.gz# tar -xvf redis-6.0.6.tar.gz# cd redis-6.0.6# make # make install PREFIX=/usr/local /redis# ln -s /usr/local /redis/bin/redis-benchmark /usr/local /bin/redis-benchmark# ln -s /usr/local /redis/bin/redis-check-aof /usr/local /bin/redis-check-aof# ln -s /usr/local /redis/bin/redis-check-rdb /usr/local /bin/redis-check-rdb# ln -s /usr/local /redis/bin/redis-sentinel /usr/local /bin/redis-sentinel# ln -s /usr/local /redis/bin/redis-server /usr/local /bin/redis-server# ln -s /usr/local /redis/bin/redis-cli /usr/local /bin/redis-cli# cp redis.conf /usr/local /redis# mkdir -p /var/log /redis# chown -R redis:redis /var/log /redis# chown -R redis:redis /usr/local /redis
更改 Redis 的基础配置内容,其中有些配置文件的文件名都包含了端口号,这是为了后面方便使用不同的端口号来区分各个节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 # vim /usr/local /redis/redis.confio-threads 2 daemonize yes # bind 127.0.0.1protected-mode no masterauth 123456 requirepass 123456 dbfilename dump_6379.rdb pidfile /var/run/redis_6379.pid cluster-config-file nodes_6379.conf appendfilename "appendonly_6379.aof" logfile "/var/log/redis/redis_6379.log"
验证 Redis 是否安装成功
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 # su redis$ cd /usr/local /redis$ ./bin/redis-server redis.conf $ ps -aux |grep redis$ more /var/log /redis/redis_6379.log $ ./bin/redis-cli 127.0.0.1:6379> auth 123456 127.0.0.1:6379> shutdown
Redis 搭建集群 创建 Redis 集群各节点的安装文件,并更改与端口相关的所有配置内容(例如:port、pidfile、dbfilename、logfile、cluster-config-file),同时开启对集群的支持
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 # mkdir -p /usr/local /redis-cluster# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7001# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7002# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7003# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7004# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7005# cp -r /usr/local /redis /usr/local /redis-cluster/redis-7006# sed -i "s/6379/7001/g" /usr/local /redis-cluster/redis-7001/redis.conf# sed -i "s/6379/7002/g" /usr/local /redis-cluster/redis-7002/redis.conf# sed -i "s/6379/7003/g" /usr/local /redis-cluster/redis-7003/redis.conf# sed -i "s/6379/7004/g" /usr/local /redis-cluster/redis-7004/redis.conf# sed -i "s/6379/7005/g" /usr/local /redis-cluster/redis-7005/redis.conf# sed -i "s/6379/7006/g" /usr/local /redis-cluster/redis-7006/redis.conf# sed -i "s/# cluster-enabled/cluster-enabled/g" `find /usr/local /redis-cluster -type f -name "redis.conf" `# sed -i "s/# cluster-config-file/cluster-config-file/g" `find /usr/local /redis-cluster -type f -name "redis.conf" `# sed -i "s/# cluster-node-timeout/cluster-node-timeout/g" `find /usr/local /redis-cluster -type f -name "redis.conf" `# chown -R redis:redis /usr/local /redis-cluster
拷贝 Redis 的集群管理工具
1 2 3 4 5 6 7 8 # cd redis-6.0.6# cp src/redis-trib.rb /usr/local /redis-cluster# chown -R redis:redis /usr/local /redis-cluster/redis-trib.rb
创建 Shell 脚本批量启动 Redis 集群的各个节点
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 # vim /usr/local /redis-cluster/start-cluster.shREDIS_CLUSTER_HOME=/usr/local /redis-cluster cd $REDIS_CLUSTER_HOME cd redis-7001./bin/redis-server redis.conf cd ..cd redis-7002./bin/redis-server redis.conf cd ..cd redis-7003./bin/redis-server redis.conf cd ..cd redis-7004./bin/redis-server redis.conf cd ..cd redis-7005./bin/redis-server redis.conf cd ..cd redis-7006./bin/redis-server redis.conf
Shell 脚本授权执行
1 2 3 # chmod +x /usr/local /redis-cluster/start-cluster.sh# chown -R redis:redis /usr/local /redis-cluster/start-cluster.sh
Redis 集群设置密码 若需要对集群各节点设置密码,那么 requirepass
和 masterauth
都需要同时设置,且两者的密码必须一致,否则发生主从切换时,就会遇到授权问题。值得一提的是,在使用 redis-trib.rb
或者 redis-cli
构建集群的时候,两者设置密码的方式是不一样的,具体如下:
redis-trib.rb
:如果是使用 redis-trib.rb
工具构建集群,集群构建完成前不要配置密码,集群构建完毕需要执行以下命令逐个节点机器设置密码,不需要重启节点1 2 3 4 $ redis -cli -c -p 7001config set masterauth 123456 config set requirepass 123456 config rewrite
redis-cli
:如果是使用 redis-cli
构建集群,首先需要在集群各节点的 redis.conf
中配置密码,包括 requirepass
和 masterauth
,然后在构建集群的命令行里加入 -a password
参数,其中的 password
就是集群各节点的密码1 2 masterauth 123456 requirepass 123456
1 2 3 4 5 6 7 8 $ redis -cli -a 123456 --cluster create \192.168.109:7001 \ 192.168.109:7002 \ 192.168.109:7003 \ 192.168.109:7004 \ 192.168.109:7005 \ 192.168.109:7006 \ --cluster -replicas 1
Redis 集群构建启动 首先执行 Shell 脚本批量启动所有 Redis 节点,切记不能以 Root 用户的身份启动 Redis,否则会造成系统重大安全隐患
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 # su redis$ ./usr/local /redis-cluster/start-cluster.sh $ ps -aux |grep redisredis 32641 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7001 [cluster] redis 32649 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7002 [cluster] redis 32657 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7003 [cluster] redis 20814 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7004 [cluster] redis 20822 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7005 [cluster] redis 20830 0.0 0.0 181880 7688 ? Ssl 21:33 0:00 ./bin/redis-server *:7006 [cluster]
使用 redis-trib.rb
工具构建集群时,在 6.0.6 里面会给打印提示,让你使用 redis-cli
命令来构建集群,并提供给你需要使用的命令,使其和 redis-trib.rb
达到一致的效果(这样就可以不用再单独的安装 Ruby),原本使用 redis-trib.rb
的语句如下
1 2 3 4 5 6 7 $ ./redis-trib.rb create --replicas 1 \ 192.168.109:7001 \ 192.168.109:7002 \ 192.168.109:7003 \ 192.168.109:7004 \ 192.168.109:7005 \ 192.168.109:7006
提供使用的 redis-cli
的语句如下,建议使用 redis-cli
命令来构建 Redis 集群,因为这样就不需要额外安装 Ruby
1 2 3 4 5 6 7 8 $ redis -cli -a 123456 --cluster create \192.168.109:7001 \ 192.168.109:7002 \ 192.168.109:7003 \ 192.168.109:7004 \ 192.168.109:7005 \ 192.168.109:7006 \ --cluster -replicas 1
可以看出两个语句都差不多,而且语句意思也差不多,--cluster-replicas 1
表示主备的比例关系为 1,即一个主节点对应一个备节点,前三个 ip:port
默认表示主节点,后面的依次为前三个主节点的备节点。在生产环境使用多台服务器搭建 Redis 集群时,为了保证高可用(在任意一台服务器挂了的情况下都不影响 Redis 集群的使用),主备节点不可以部署在同一台服务器上,因为主备节点在同一台服务器上,则备节点也没有太大的意义了,所以要错开对应。当主节点宕机后,备节点可以充当主节点继续工作,使 Redis 集群正常运行。
执行完构建集群的命令后(只需执行一次),Redis 默认罗列出集群的对应关系来让你确定,输入 yes
完成集群创建即可
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 >>> Performing hash slots allocation on 6 nodes... Master[0] -> Slots 0 - 5460 Master[1] -> Slots 5461 - 10922 Master[2] -> Slots 10923 - 16383 Adding replica 192.168.1.109:7006 to 192.168.1.109:7001 Adding replica 192.168.1.109:7003 to 192.168.1.109:7004 Adding replica 192.168.1.109:7005 to 192.168.1.109:7002 M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001 slots:[0-5460] (5461 slots) master M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002 slots:[10923-16383] (5461 slots) master M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004 slots:[5461-10922] (5462 slots) master S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003 replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005 replicates 283abb498445ffd6206f24c451ac0b9fb7129383 S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006 replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 Can I set the above configuration? (type 'yes' to accept): >>> Nodes configuration updated >>> Assign a different config epoch to each node >>> Sending CLUSTER MEET messages to join the cluster Waiting for the cluster to join . >>> Performing Cluster Check (using node 192.168.1.109:7001) M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001 slots:[0-5460] (5461 slots) master 1 additional replica(s) M: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006 slots: (0 slots) slave replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005 slots: (0 slots) slave replicates 283abb498445ffd6206f24c451ac0b9fb7129383 M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003 slots: (0 slots) slave replicates 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
测试 Redis 集群 Redis 客户端登录进某个集群节点,登录时需要指定密码,下面可以看到数据放入的哈希槽为 [12182]
,属于 192.168.1.109:7002
所管控的节点,所以就直接跳转到 192.168.1.109:7002
节点来获取刚才放入的数据
1 2 3 4 5 6 7 8 $ redis -cli -c -p 7001 -a 123456127.0.0.1:7001> set foo hello -> Redirected to slot [12182] located at 192.168.1.109:7002OK 192.168.1.109:7002> get foo "hello" 192.168.1.109:7002>
查看 Redis 当前集群的信息
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 $ redis -cli -c -p 7001 -a 123456127.0.0.1:7001> cluster info cluster_state:ok cluster_slots_assigned:16384 cluster_slots_ok:16384 cluster_slots_pfail:0 cluster_slots_fail:0 cluster_known_nodes:6 cluster_size:3 cluster_current_epoch:7 cluster_my_epoch:1 cluster_stats_messages_ping_sent:3154 cluster_stats_messages_pong_sent:3377 cluster_stats_messages_fail_sent:4 cluster_stats_messages_auth-ack_sent:1 cluster_stats_messages_sent:6536 cluster_stats_messages_ping_received:3372 cluster_stats_messages_pong_received:3154 cluster_stats_messages_meet_received:5 cluster_stats_messages_auth-req_received:1 cluster_stats_messages_received:6532
查看 Redis 特定节点的状态
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 $ redis -cli --cluster check 192.168.1.109:7003 -a 123456Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe. 192.168.1.109:7003 (cde86683...) -> 0 keys | 5462 slots | 1 slaves. 192.168.1.109:7002 (283abb49...) -> 1 keys | 5461 slots | 1 slaves. 192.168.1.109:7001 (225e37e5...) -> 0 keys | 5461 slots | 1 slaves. [OK] 1 keys in 3 masters. 0.00 keys per slot on average. >>> Performing Cluster Check (using node 192.168.1.109:7003) M: cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003 slots:[5461-10922] (5462 slots) master 1 additional replica(s) S: 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005 slots: (0 slots) slave replicates 283abb498445ffd6206f24c451ac0b9fb7129383 S: 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004 slots: (0 slots) slave replicates cde86683e2d314fd52cf8708f78935c6648ea3c6 M: 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002 slots:[10923-16383] (5461 slots) master 1 additional replica(s) S: f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006 slots: (0 slots) slave replicates 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 M: 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001 slots:[0-5460] (5461 slots) master 1 additional replica(s) [OK] All nodes agree about slots configuration. >>> Check for open slots... >>> Check slots coverage... [OK] All 16384 slots covered.
查看 Redis 所有集群节点的信息
1 2 3 4 5 6 7 8 9 $ redis -cli -c -p 7001 -a 123456127.0.0.1:7001> cluster nodes 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 master - 0 1616460018217 3 connected 5461-10922 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616460015000 1 connected 0-5460 f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616460018000 1 connected 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616460016000 2 connected 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616460016000 2 connected 10923-16383 cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 slave 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 0 1616460017000 3 connected
验证主从切换,从上面的集群信息可以观察到 192.168.1.109:7003
节点是 192.168.1.109:7004
的 Slave 节点,因此可以 Kill 掉 192.168.1.109:7004
Master 节点的进程,然后观察 192.168.1.109:7003
节点会不会选举为新的 Master 节点,若可以则说明主从切换成功,此时 192.168.1.109:7003
节点的日志信息如下:
1 2 3 4 5 6 7 8 9 11970:S 21 Jul 2020 22:48:40.080 * Connecting to MASTER 192.168.1.109:7004 11970:S 21 Jul 2020 22:48:40.080 * MASTER <-> REPLICA sync started 11970:S 21 Jul 2020 22:48:40.081 # Error condition on socket for SYNC: Operation now in progress 11970:S 21 Jul 2020 22:48:40.982 # Starting a failover election for epoch 7. 11970:S 21 Jul 2020 22:48:40.985 # Failover election won: I'm the new master. 11970:S 21 Jul 2020 22:48:40.985 # configEpoch set to 7 after successful failover 11970:M 21 Jul 2020 22:48:40.985 * Discarding previously cached master state. 11970:M 21 Jul 2020 22:48:40.985 # Setting secondary replication ID to 00c7b21f3980b471d3373792d9d61bedf7e424e6, valid up to offset: 2059. New replication ID is c9f299ab0a8124a56d76e0e8a458135893b45336 11970:M 21 Jul 2020 22:48:40.985 # Cluster state changed: ok
最后重新启动 192.168.1.109:7004
节点,可以发现它会变为 192.168.1.109:7003
节点的 Slave 节点
1 2 3 4 5 6 7a1229732ada6ae8d8eb51ae7b7cac6242a6f8d4 192.168.1.109:7004@17004 slave cde86683e2d314fd52cf8708f78935c6648ea3c6 0 1616461490000 7 connected 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 192.168.1.109:7001@17001 myself,master - 0 1616461492000 1 connected 0-5460 f8a5d94e9928ed615514f23ddaabd259134af709 192.168.1.109:7006@17006 slave 225e37e5bb340467fb58b6f9d14cfb1893bf92d5 0 1616461492000 1 connected 1f3f441d619ceeac55ae91015a3f46ede37352bb 192.168.1.109:7005@17005 slave 283abb498445ffd6206f24c451ac0b9fb7129383 0 1616461492010 2 connected 283abb498445ffd6206f24c451ac0b9fb7129383 192.168.1.109:7002@17002 master - 0 1616461491000 2 connected 10923-16383 cde86683e2d314fd52cf8708f78935c6648ea3c6 192.168.1.109:7003@17003 master - 0 1616461493010 7 connected 5461-10922
Redis 集群重建(初始化) 若 Redis 集群出现无法正常使用的问题,可以尝试执行以下操作来重建 Redis 集群来解决,下述操作会删除 Redis 的所有 RDB 快照数据,切记先备份好数据再进行操作。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 $ pkill -9 redis$ find /usr/local /redis-cluster -type f -iname "dump*.rdb" | xargs rm -rf $ find /usr/local /redis-cluster -type f -iname "nodes_*.conf" | xargs rm -rf $ rm -rf /var/log /redis/*$ ./usr/local /redis-cluster/start-cluster.sh $ redis -cli -a 123456 --cluster create \192.168.109:7001 \ 192.168.109:7002 \ 192.168.109:7003 \ 192.168.109:7004 \ 192.168.109:7005 \ 192.168.109:7006 \ --cluster -replicas 1$ redis -cli -c -p 7001 -a 123456127.0.0.1:7001> cluster info 127.0.0.1:7001> cluster nodes
参考博客