High Availability Deployment

High availability deployment is designed for finance, healthcare, energy, large manufacturing, group companies, cross-region offices and scenarios where business continuity matters. The goal is not simply to buy more servers; it is to reduce single points of failure across access entry, application services, database, search, cache, transcoding, object storage, backup and network links, and to make recovery and failover verifiable.

Active-standby topology — Active-standby or active-active deployment reduces the impact of a single application server failure.

Suitable Scenarios

Scenario	Typical requirement
Core file platform	Upload, download, preview, approval, sharing and search cannot be interrupted for long
Multi-region offices	Users in different regions need stable access to the same file platform
Regulated industries	Stronger backup, audit, permission, security policy and disaster recovery design
Large asset, drawing and video data	Storage grows continuously and needs object storage and expansion planning
Private AI knowledge base	Vector index, OCR, GPU or model services are required in addition to the file platform

Architecture Layers

High availability should be designed by service layer. A single larger server is not a substitute for redundancy.

Layer	Recommended design
Access entry	Use load balancer, reverse proxy, dual network links or cloud load balancing for HTTPS entry
Application services	Prepare at least primary and standby application nodes; larger deployments can use multiple replicas
Database	Use primary-replica, active-standby or database cluster; verify restore procedures regularly
Search service	Deploy full-text and advanced search separately; use a cluster for larger scale
Cache and queue	Redis, task queues and async workers should be monitored and recoverable
Transcoding and preview	Office, CAD, video and image conversion can be deployed independently and scaled by workload
Object storage	Use single-server, active-standby or distributed object storage for file bodies
Backup and DR	Database, object storage, configuration and indexes need local backup, offsite backup or DR

Recommended Deployment Forms

Scale	Recommended form	Notes
100-1000 users	Dual applications, dual databases, independent search, optional transcoding, S3/OBS or shared storage	Balances cost and availability for smaller but interruption-sensitive enterprises
1000-5000 users standard HA	Primary/secondary application, primary/secondary database, independent search, independent transcoding, object storage	Suitable for most medium and large private deployments
1000-5000 users on K8S	3-5 application replicas, database HA, search cluster, Redis cluster, S3 object storage	Suitable for enterprises with a container platform and operations team
1000-5000 users + AI	Add AI/OCR/vector servers beyond the HA document platform	AI compute can use external compute platforms or private GPU servers
10k-50k users	Higher-spec active-standby applications, database, search, transcoding and independent storage	Requires dedicated assessment of concurrency, regions, bandwidth, file count and AI scope

Server Role Guidance

The following is a common HA role split. Final sizing depends on user count, concurrency, file volume, storage capacity, preview workload, AI modules and network conditions.

Role	Reference configuration	Purpose
Primary application server	16-core CPU, 64GB memory, 500GB SSD system disk	Web, APIs and main application services
Secondary application server	16-core CPU, 64GB memory, 500GB SSD system disk	Standby, active-active or application replica node
Primary database server	8-core CPU, 32GB memory, 500GB SSD system disk	Business data writes and transactions
Secondary database server	8-core CPU, 32GB memory, 500GB SSD system disk	Replication, backup and failover
Search server	8-core CPU, 32GB memory, 1TB SSD system disk	Full-text index, OCR text and advanced search
Transcoding server	8-core CPU, 16GB memory, 200GB system disk	Office, CAD, image and video preview conversion
Object storage server	16-core CPU, 64GB memory, 500GB SSD system disk plus data disks	File bodies through S3, NFS or distributed object storage
AI/OCR server	8-32 CPU cores, 32-128GB memory, GPU depending on model	Private OCR, vectors, knowledge bases and model inference

K8S High Availability

Enterprises with an existing container platform can deploy on K8S. K8S is best for teams that already operate Ingress, StorageClass, observability, logs and image registries. It makes application replicas, resource quotas, rolling upgrades and failover easier to manage.

Workload	Suggested replicas	Resource reference
Application services	3-5 replicas	4-8 CPU cores and 16-32GB memory per replica
Database	2 replicas or external database	Primary-replica or active-standby with SSD storage
Search engine	3-node cluster	4-8 CPU cores, 16-32GB memory and SSD per node
Redis cache	3 nodes	Size memory by concurrency and task volume
Object storage	S3/OBS/OSD	Store file bodies in independent object storage, not application pods

Object Storage And File Availability

File-body availability is often more important than application availability because application servers can be rebuilt while file data cannot be lost. BabelBird supports S3-compatible object storage, cloud object storage, NFS, VM-mounted disks and self-built object storage. For HA deployments, object storage is recommended.

Distributed object storage erasure coding — Object storage reduces disk or node failure risk through multiple nodes, checksums and erasure coding.

Confirm these points during design:

Usable capacity versus raw capacity depends on erasure coding or redundancy strategy.
Storage node count, disk count per node, disk capacity and future expansion unit.
Network bandwidth, latency and isolation between object storage and application servers.
Whether active-standby object storage, offsite sync or object-storage-level DR is required.
Whether existing S3 storage, cloud OBS/OSS/COS or hyperconverged storage can be connected.

For capacity, erasure coding, usable space and hardware preparation, see Object Storage And Erasure Coding.

Backup And Disaster Recovery

High availability reduces service interruption during failures. Backup and disaster recovery ensure data can be restored. They are not substitutes for each other. Even with active-standby or cluster deployment, an independent backup strategy is still required.

Offsite disaster recovery topology — Offsite disaster recovery backs up database, object storage and key configuration to a secondary site.

Data type	Recommendation
Database	Regular full and incremental backup plus restore drills
Object storage	Object storage sync, backup server or third-party backup system
Search index	It can be rebuilt from files and database, but rebuild time must be evaluated
Configuration and certificates	Back up domain certificates, system configuration, license files, mail and integration settings
Logs and audit	Retain access logs, operation logs and security audit logs according to compliance requirements

Overseas And Cross-Region Notes

Choose a cloud region, data center or object storage region close to primary users.
For public access, evaluate international bandwidth, CDN, DNS, certificates and cross-border link stability.
When using cloud object storage, keep it in the same region or private network as application servers where possible.
Cross-region DR should verify dedicated lines, VPN or cloud network quality to avoid sync backlog.
Third-party SSO, email, SMS, AI API or license services should be tested from the target region in advance.

Pre-Launch Checklist

Verify that access can switch to the standby application node when the primary node fails.
Verify database replication, backup and restore procedures, not only backup file existence.
Verify read/write behavior when an object storage node, disk or network path fails.
Verify recovery of search, preview, transcoding, OCR, AI and automation queues.
Verify certificates, domains, load balancers, reverse proxies and internal/external access policies.
Verify that backups can restore a usable system in an isolated environment.
Monitor CPU, memory, disks, object storage, database, search, queues, certificate expiry and backup results.