Integriti High Availability

Overview

Integriti High Availability (HA) provides a robust, fault-tolerant solution for mission-critical security environments. It allows multiple Integriti server nodes to operate simultaneously, ensuring that if one server fails, controllers and clients automatically failover to another available node without significant downtime.

Technical Details

Integriti Active Clustering

Unlike traditional primary/backup (active/passive) configurations, Integriti uses Active Clustering. All server nodes are considered equal and are always active.

  • Three Server Components: Each node runs the Application Server, Controller Server, and Integration Server services.
  • Inter-Node Communication: All controller and integration servers establish bi-directional TCP links on port 44000 to synchronize real-time communications.
  • Database Conduit: The shared SQL database acts as the conduit for disseminating connection details (IPs, hostnames, encryption keys) across the cluster.

Microsoft Failover Clustering (WSFC)

The solution leverages Windows Server Failover Clustering (WSFC) as the foundation for both the Application servers and the SQL database.

  • Capacity: Supports from 2 to 64 physical hosts in a single failover cluster.
  • Roles: Integriti servers typically run as Hyper-V virtual machines (VMs) within the WSFC. If a host crashes, the VM is automatically restarted on another surviving node.

SQL Database Clustering

The heart of the HA solution is a Microsoft SQL Cluster.

  • Active/Passive SQL: While Integriti services are active/active, the SQL instance itself typically runs in an active/passive mode where one node handles the production workload and a passive node takes over upon failure.
  • Shared Storage: A single point of failure in clustering; it must be protected via hardware redundancy, backups, or disk mirroring. Corruption in the shared database will affect all nodes.

Configuration / Programming

Licensing

To enable HA, the system requires:

  • Integriti Professional (v17.1 or later).
  • Additional Server Node (High Availability) License: (Part no. 996965) for each additional node added to the cluster.
  • Pairing: The license key cryptographically pairs the new node with the existing cluster to prevent unauthorized server attachments.

Multi-Server Deployment

  1. Initial Node: Install Integriti and create the database schema as per a standard installation.
  2. Additional Nodes: Install Integriti software on additional hardware and provide the database connection string for the shared SQL cluster.
  3. Registration: Each service connects to the SQL cluster and registers its connection details automatically.
  4. Manual Overrides: If servers are behind NAT or not directly routable, address information can be configured manually in the database to allow NAT traversal.

Controller & Client Failover

  • Controllers: Each Integriti controller can be configured with up to four prioritised static IP addresses for controller servers. It attempts to connect to these in sequence.
  • DNS Support: Alternatively, a single URL can be used, with the customer’s DNS server handling the redirection to available nodes.
  • Clients: Integriti software clients maintain a set of URLs for Application Servers and connect to the first one that responds.

Split-Site / Disaster Recovery

  • Geoclusters: SQL clustering can be extended over geographic distances using geoclusters.
  • Requirements: Requires sophisticated setup and involvement of storage vendors to manage synchronization across multiple disk arrays over geographic spans.
  • Failover Time: Failover typically takes from several seconds to a few minutes, depending on SQL service startup and database recovery times.

Troubleshooting

  • Inter-Node Sync: Ensure port 44000 is open between all Integriti server nodes for cluster communication.
  • Connectivity: Controllers and clients must have reachability to ALL server nodes in the cluster to ensure failover succeeds.
  • DNS Propagation: If using DNS for failover, ensure the mechanism propagates changes within the required response time specified in the site SLA.