BabelBirdBabelBird Docs

High Availability Deployment

High availability deployment is designed for finance, healthcare, energy, large manufacturing, group companies, cross-region offices and scenarios where business continuity matters. The goal is not simply to buy more servers; it is to reduce single points of failure across access entry, application services, database, search, cache, transcoding, object storage, backup and network links, and to make recovery and failover verifiable.

Active-standby topology
Active-standby or active-active deployment reduces the impact of a single application server failure.

Suitable Scenarios

Scenario Typical requirement
Core file platform Upload, download, preview, approval, sharing and search cannot be interrupted for long
Multi-region offices Users in different regions need stable access to the same file platform
Regulated industries Stronger backup, audit, permission, security policy and disaster recovery design
Large asset, drawing and video data Storage grows continuously and needs object storage and expansion planning
Private AI knowledge base Vector index, OCR, GPU or model services are required in addition to the file platform

Architecture Layers

High availability should be designed by service layer. A single larger server is not a substitute for redundancy.

Layer Recommended design
Access entry Use load balancer, reverse proxy, dual network links or cloud load balancing for HTTPS entry
Application services Prepare at least primary and standby application nodes; larger deployments can use multiple replicas
Database Use primary-replica, active-standby or database cluster; verify restore procedures regularly
Search service Deploy full-text and advanced search separately; use a cluster for larger scale
Cache and queue Redis, task queues and async workers should be monitored and recoverable
Transcoding and preview Office, CAD, video and image conversion can be deployed independently and scaled by workload
Object storage Use single-server, active-standby or distributed object storage for file bodies
Backup and DR Database, object storage, configuration and indexes need local backup, offsite backup or DR
Scale Recommended form Notes
100-1000 users Dual applications, dual databases, independent search, optional transcoding, S3/OBS or shared storage Balances cost and availability for smaller but interruption-sensitive enterprises
1000-5000 users standard HA Primary/secondary application, primary/secondary database, independent search, independent transcoding, object storage Suitable for most medium and large private deployments
1000-5000 users on K8S 3-5 application replicas, database HA, search cluster, Redis cluster, S3 object storage Suitable for enterprises with a container platform and operations team
1000-5000 users + AI Add AI/OCR/vector servers beyond the HA document platform AI compute can use external compute platforms or private GPU servers
10k-50k users Higher-spec active-standby applications, database, search, transcoding and independent storage Requires dedicated assessment of concurrency, regions, bandwidth, file count and AI scope

Server Role Guidance

The following is a common HA role split. Final sizing depends on user count, concurrency, file volume, storage capacity, preview workload, AI modules and network conditions.

Role Reference configuration Purpose
Primary application server 16-core CPU, 64GB memory, 500GB SSD system disk Web, APIs and main application services
Secondary application server 16-core CPU, 64GB memory, 500GB SSD system disk Standby, active-active or application replica node
Primary database server 8-core CPU, 32GB memory, 500GB SSD system disk Business data writes and transactions
Secondary database server 8-core CPU, 32GB memory, 500GB SSD system disk Replication, backup and failover
Search server 8-core CPU, 32GB memory, 1TB SSD system disk Full-text index, OCR text and advanced search
Transcoding server 8-core CPU, 16GB memory, 200GB system disk Office, CAD, image and video preview conversion
Object storage server 16-core CPU, 64GB memory, 500GB SSD system disk plus data disks File bodies through S3, NFS or distributed object storage
AI/OCR server 8-32 CPU cores, 32-128GB memory, GPU depending on model Private OCR, vectors, knowledge bases and model inference

K8S High Availability

Enterprises with an existing container platform can deploy on K8S. K8S is best for teams that already operate Ingress, StorageClass, observability, logs and image registries. It makes application replicas, resource quotas, rolling upgrades and failover easier to manage.

Workload Suggested replicas Resource reference
Application services 3-5 replicas 4-8 CPU cores and 16-32GB memory per replica
Database 2 replicas or external database Primary-replica or active-standby with SSD storage
Search engine 3-node cluster 4-8 CPU cores, 16-32GB memory and SSD per node
Redis cache 3 nodes Size memory by concurrency and task volume
Object storage S3/OBS/OSD Store file bodies in independent object storage, not application pods

Object Storage And File Availability

File-body availability is often more important than application availability because application servers can be rebuilt while file data cannot be lost. BabelBird supports S3-compatible object storage, cloud object storage, NFS, VM-mounted disks and self-built object storage. For HA deployments, object storage is recommended.

Distributed object storage erasure coding
Object storage reduces disk or node failure risk through multiple nodes, checksums and erasure coding.

Confirm these points during design:

  • Usable capacity versus raw capacity depends on erasure coding or redundancy strategy.
  • Storage node count, disk count per node, disk capacity and future expansion unit.
  • Network bandwidth, latency and isolation between object storage and application servers.
  • Whether active-standby object storage, offsite sync or object-storage-level DR is required.
  • Whether existing S3 storage, cloud OBS/OSS/COS or hyperconverged storage can be connected.

For capacity, erasure coding, usable space and hardware preparation, see Object Storage And Erasure Coding.

Backup And Disaster Recovery

High availability reduces service interruption during failures. Backup and disaster recovery ensure data can be restored. They are not substitutes for each other. Even with active-standby or cluster deployment, an independent backup strategy is still required.

Offsite disaster recovery topology
Offsite disaster recovery backs up database, object storage and key configuration to a secondary site.
Data type Recommendation
Database Regular full and incremental backup plus restore drills
Object storage Object storage sync, backup server or third-party backup system
Search index It can be rebuilt from files and database, but rebuild time must be evaluated
Configuration and certificates Back up domain certificates, system configuration, license files, mail and integration settings
Logs and audit Retain access logs, operation logs and security audit logs according to compliance requirements

Overseas And Cross-Region Notes

  • Choose a cloud region, data center or object storage region close to primary users.
  • For public access, evaluate international bandwidth, CDN, DNS, certificates and cross-border link stability.
  • When using cloud object storage, keep it in the same region or private network as application servers where possible.
  • Cross-region DR should verify dedicated lines, VPN or cloud network quality to avoid sync backlog.
  • Third-party SSO, email, SMS, AI API or license services should be tested from the target region in advance.

Pre-Launch Checklist

  1. Verify that access can switch to the standby application node when the primary node fails.
  2. Verify database replication, backup and restore procedures, not only backup file existence.
  3. Verify read/write behavior when an object storage node, disk or network path fails.
  4. Verify recovery of search, preview, transcoding, OCR, AI and automation queues.
  5. Verify certificates, domains, load balancers, reverse proxies and internal/external access policies.
  6. Verify that backups can restore a usable system in an isolated environment.
  7. Monitor CPU, memory, disks, object storage, database, search, queues, certificate expiry and backup results.
BabelBird capabilities may change by product version, licensed modules and deployment configuration; actual availability depends on the deployed environment and administrator settings.