This commit adds comprehensive high availability, disaster recovery,
and automation capabilities for enterprise-grade deployment.
High Availability Features:
- Keepalived integration for Virtual IP (38.14.254.100)
- Automatic failover monitoring and recovery
- PostgreSQL streaming replication support
- Health check scripts with auto-restart
- State change notifications
Disaster Recovery:
- Complete system backup script (database, configs, Docker volumes)
- Automated backup with retention policies
- Recovery manifest with step-by-step instructions
- Off-site backup support (S3, rsync ready)
Automation Tools:
- auto-deploy-server.sh - Deploy to remote server from local
- auto-deploy-server.bat - Windows version with WSL/Git Bash support
- deploy-oneclick.sh - One-click deployment on fresh server
- docker-compose-full.yml - Complete containerized stack
Container Orchestration:
- Full Docker Compose setup with all services
- Service dependencies and health checks
- Persistent volumes for data
- Network isolation with dedicated network
- Production-ready configuration
Deployment Automation:
- Automated dependency installation
- Database initialization with tables and indexes
- Monitoring stack auto-deployment
- Service auto-start via systemd
- Firewall auto-configuration
- Cron job automation
New Services:
- moltbot-failover.service - Auto-recovery monitor
- moltbot-metrics.service - Metrics exporter (9101)
- moltbot-log-analyzer.service - Log aggregation (9102)
- keepalived.service - VIP management
Documentation:
- HIGH-AVAILABILITY.md - Complete HA and automation guide
Architecture Improvements:
- Virtual IP for transparent failover
- Health-based service routing
- Automated disaster recovery backups
- Zero-touch server deployment
- Complete container orchestration support
Service Ports:
- Database API: 18800
- Metrics Exporter: 9101
- Log Analyzer: 9102
- Virtual IP: 38.14.254.100
🤖 Generated with Claude Code
Co-Authored-By: Claude <noreply@anthropic.com>
9.1 KiB
9.1 KiB
🏗️ Moltbot 高可用性和自动化指南
版本: v2.2 最后更新: 2026-01-29
📋 高可用性 (HA) 架构
架构概览
┌───────────────────┐
│ Virtual IP │
│ (38.14.254.100) │
└────────┬───────────┘
│
┌────────────┴────────────┐
│ │
┌──────▼──────┐ ┌──────▼──────┐
│ Master │ │ Backup │
│ Server │ │ Server │
│ │ │ │
│ Gateway │ │ Gateway │
│ PostgreSQL │ │ PostgreSQL │
│ Monitoring │ │ Monitoring │
└─────────────┘ └─────────────┘
│ │
└────────────┬────────────┘
│
┌────────────▼───────────┐
│ Shared Storage │
│ (Optional) │
└────────────────────────┘
🚀 快速开始
一键部署新服务器
在全新的服务器上运行:
# 方法 1: 使用 curl
curl -fsSL https://raw.githubusercontent.com/flowerjunjie/moltbot/main/deploy-oneclick.sh | bash
# 方法 2: 使用 git
git clone https://github.com/flowerjunjie/moltbot.git /opt/moltbot
cd /opt/moltbot
bash deploy-oneclick.sh
远程部署服务器
从本地机器部署到远程服务器:
# Linux/Mac
bash auto-deploy-server.sh root@192.168.1.100
# Windows
auto-deploy-server.bat root@192.168.1.100
🔧 高可用性组件
1. Keepalived (虚拟 IP)
功能: 自动故障转移和虚拟 IP 管理
安装:
apt-get install keepalived
配置文件: /etc/keepalived/keepalived.conf
vrrp_script chk_moltbot_gateway {
script "curl -f http://localhost:18789 || exit 1"
interval 2
weight 2
}
vrrp_instance VI_MOLTBOT {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass moltbot2024
}
virtual_ipaddress {
38.14.254.100/24
}
track_script {
chk_moltbot_gateway
}
}
状态检查:
systemctl status keepalived
ip addr show eth0 | grep 38.14.254.100
2. 自动故障转移
脚本: /usr/local/bin/moltbot-failover.sh
功能:
- 健康检查(每 10 秒)
- 自动重启失败的服务
- 故障计数和阈值
- 日志记录
服务: moltbot-failover.service
启用:
systemctl enable moltbot-failover
systemctl start moltbot-failover
查看日志:
journalctl -u moltbot-failover -f
cat /var/log/moltbot-failover.log
3. PostgreSQL 流复制
配置: /etc/postgresql/14/main/conf.d/replication.conf
设置主服务器:
-- 创建复制用户
CREATE USER replicator WITH REPLICATION ENCRYPTED PASSWORD 'replicator_pass';
-- 配置复制槽
SELECT * FROM pg_create_physical_replication_slot('replica_slot');
设置从服务器:
# 在从服务器上
pg_basebackup -h master-server -D /var/lib/postgresql/data -P -U replicator --wal-method=stream
# 配置 recovery.conf
standby_mode = on
primary_conninfo = 'host=master-server port=5432 user=replicator'
restore_command = 'cp /var/lib/postgresql/archive/%f %p'
4. 灾难恢复备份
脚本: /usr/local/bin/moltbot-dr-backup.sh
备份内容:
- PostgreSQL 完整转储
- 配置文件
- Docker 卷数据
- 系统包列表
- 防火墙规则
运行备份:
/usr/local/bin/moltbot-dr-backup.sh
备份位置: /opt/moltbot-backup/disaster-recovery/
自动备份: 每周日凌晨 3 点
🤖 自动化工具
1. 自动部署工具
文件: auto-deploy-server.sh (Linux) / auto-deploy-server.bat (Windows)
功能:
- 自动安装所有依赖
- 配置数据库
- 部署监控栈
- 设置防火墙
- 配置自动化任务
使用:
# 部署到新服务器
bash auto-deploy-server.sh root@192.168.1.100
2. 一键部署脚本
文件: deploy-oneclick.sh
场景: 在全新的服务器上运行
使用:
# SSH 到服务器
ssh root@your-server
# 运行部署
curl -fsSL https://raw.githubusercontent.com/flowerjunjie/moltbot/main/deploy-oneclick.sh | bash
部署时间: 约 5-10 分钟
3. 容器编排支持
文件: docker-compose-full.yml
包含服务:
- Moltbot Gateway
- Database API
- PostgreSQL
- Redis
- Prometheus
- Grafana
- Node Exporter
- Metrics Exporter
- Log Analyzer
- Nginx
启动:
docker-compose -f docker-compose-full.yml up -d
📊 监控和告警
服务端口
| 服务 | 端口 | 说明 |
|---|---|---|
| Database API | 18800 | REST API |
| Metrics | 9101 | Prometheus 指标 |
| Log Analyzer | 9102 | 日志分析 API |
| Prometheus | 9090 | 指标采集 |
| Grafana | 3000 | 可视化 |
健康检查端点
# Database API
curl http://localhost:18800/api/health
# Metrics
curl http://localhost:9101/metrics
# Log summary
curl http://localhost:9102/api/logs/summary
# Service status
curl http://localhost:18800/api/devices
🛠️ 维护操作
日常维护
检查服务状态:
# 所有 Moltbot 服务
systemctl status moltbot-*
# Docker 容器
docker ps
# 监控栈
cd /opt/moltbot-monitoring && docker-compose ps
查看日志:
# 服务日志
journalctl -u moltbot-db-api -f
journalctl -u moltbot-failover -f
# 应用日志
tail -f /var/log/moltbot-failover.log
备份操作
手动备份:
# 数据库备份
/usr/local/bin/moltbot-backup-auto.sh
# 灾难恢复备份
/usr/local/bin/moltbot-dr-backup.sh
恢复数据库:
# 列出备份
ls -lh /opt/moltbot-backup/database/daily/
# 恢复最新备份
gunzip -c /opt/moltbot-backup/database/daily/moltbot_latest.sql.gz | psql -d moltbot
故障排除
服务无法启动:
# 检查端口占用
netstat -tlnp | grep <port>
# 检查日志
journalctl -u <service> -n 50
# 重启服务
systemctl restart <service>
Keepalived 问题:
# 检查配置
keepalived -t
# 查看日志
journalctl -u keepalived -f
# 检查虚拟 IP
ip addr show eth0
🔐 安全配置
防火墙规则
查看当前规则:
iptables -L -n -v
添加规则:
iptables -A INPUT -p tcp --dport 18789 -s 192.168.1.0/24 -j ACCEPT
netfilter-persistent save
安全建议
- 使用密钥认证: 禁用密码登录
- 配置 fail2ban: 防止暴力攻击
- 定期更新:
apt-get update && apt-get upgrade - 监控日志: 定期检查异常访问
📈 性能优化
系统优化
运行优化脚本:
/usr/local/bin/moltbot-optimize.sh
优化项目:
- 网络参数调优
- PostgreSQL 配置优化
- Docker 资源限制
- 日志轮转配置
性能监控
查看系统指标:
# CPU
top -bn1 | grep "Cpu(s)"
# 内存
free -h
# 磁盘
df -h
# 负载
cat /proc/loadavg
🚨 应急响应
服务全部宕机
-
检查服务器状态
ping <server-ip> ssh root@<server-ip> "systemctl status moltbot-*" -
启动关键服务
systemctl start moltbot-db-api systemctl start moltbot-gateway -
切换到备用服务器(如果配置了 HA)
# 备用服务器会自动提升为主服务器 # 虚拟 IP 会自动迁移
数据库损坏
-
从备份恢复
gunzip -c /opt/moltbot-backup/disaster-recovery/pg_all_*.sql.gz | psql -
检查数据完整性
psql -d moltbot -c "SELECT COUNT(*) FROM conversations;" psql -d moltbot -c "SELECT COUNT(*) FROM devices;"
网络问题
-
检查网络连接
ping 8.8.8.8 traceroute 8.8.8.8 -
检查防火墙
iptables -L -n ufw status
📚 相关文档
DEPLOYMENT-COMPLETE.md- 完整部署指南EXTENSIONS.md- 扩展功能文档ROADMAP.md- 功能路线图docker-compose-full.yml- 容器编排配置
🎯 最佳实践
-
定期测试备份恢复
- 每月测试一次灾难恢复流程
- 验证备份完整性
-
监控告警
- 配置邮件或 Webhook 告警
- 设置合理的告警阈值
-
文档更新
- 记录所有配置更改
- 维护操作手册
-
容量规划
- 监控资源使用趋势
- 提前规划扩容
🎉 高可用性和自动化配置完成!