What is a service animal

Business Critical Plan - Azure SQL Database and Azure SQL Managed Instance

  • 4 minutes to read

APPLIES TO: Azure SQL database Azure SQL Managed Instance

Azure SQL Database and Azure SQL Managed Instance are both based on the cloud environment-adapted architecture of the SQL Server database engine to ensure availability of 99.99% even in the event of infrastructure failures. Three architectural models are used:

  • Universal / standard
  • Business critical / premium
  • Hyperscale

The premium / mission critical service tier model is based on a cluster of database engine processes. This architectural model relies on always having a quorum of available database engine nodes and has minimal performance impact on the workload even during maintenance activities. The Hyperscale service tier is currently only available for Azure SQL Database (and not for SQL Managed Instance). This service tier provides highly scalable storage and compute performance that allows the storage and computational resources for a database in Azure SQL Database to be scaled well beyond the limits of the Universal and Mission Critical service tier using the Azure architecture.

In Azure, the underlying operating system, drivers, and SQL Server database engine are transparently updated and patched with minimal downtime for end users.

Premium availability is activated in the “Premium” and “Business Critical” service levels and is intended for high workloads in which no impairment in performance due to ongoing maintenance processes can be tolerated.

Compute and storage are integrated in the individual nodes in the premium model. High availability in this architectural model is achieved through compute (SQL Server database engine process) and storage tier replication (locally attached SSD) deployed in a four-node cluster using a technology similar to AlwaysOn Availability Groups in SQL Server become.

The SQL Server database engine process, as well as the underlying MDF and LDF files, are placed on the same node with locally attached SSD storage, providing low latency to the workload. High availability is implemented using a technology similar to AlwaysOn Availability Groups in SQL Server. Each database is a cluster of database nodes with a primary database accessible to customer workloads and three secondary processes that hold copies of the data. The primary node continuously pushes the changes to secondary nodes to ensure that the data on secondary replicas is available if the primary node fails for any reason. Failover is handled by the SQL Server database engine: a secondary replica becomes the primary node and a new secondary replica is created to ensure that there are enough nodes in the cluster. The workload is automatically redirected to the new primary node.

In addition, the Mission Critical cluster includes a built-in Horizontal Read Scaling feature that provides a toll-free, built-in, read-only node that can be used to run read-only queries (such as reports) that should not affect the performance of the primary workload.

When should this level of service be chosen?

The Mission Critical service tier is intended for applications that require low-latency responses from the underlying SSD storage (1-2 msec on average), rapid recovery in the event of a failure in the underlying infrastructure, or offloading reports, analysis and read-only queries to the Require free readable secondary replica of the primary database.

The top reasons you should choose the Mission Critical service tier instead of the Universal tier are as follows:

  • Low I / O latency requirements: For workloads that require the storage tier to respond quickly (1 to 2 milliseconds on average), use the Mission Critical tier.
  • Frequent communication between the application and the database. Applications that cannot take advantage of application-level caching or batch processing of requests and that have to send lots of SQL queries that require fast processing are good candidates for the mission-critical level.
  • Large number of updates: Insert, update and delete processes change the data pages in the main memory (modified page) that must be saved in data files with the process. A potential process crash of the database engine or a failover of the database with a large number of modified pages may increase the recovery time in the "Universal" level. Use the Mission Critical tier when you have a workload that is causing a lot of in-memory changes.
  • Time consuming transactions that change data. Transactions that are open for an extended period of time prevent log files from being truncated, which can increase log size and the number of virtual log files (VLF). A large number of VLFs can slow the recovery of the database after a failover.
  • Reporting and analysis query workloadthat can be redirected to the free secondary read-only replica.
  • Greater resilience and faster recovery from errors. In the event of a system failure, the database on the primary is disabled and one of the secondary replicas immediately becomes the new primary, read-write database capable of processing queries. It is not necessary for the database engine to parse and retry transactions from the log file and load all data in the memory buffer.
  • Advanced protection against data corruption: The Mission Critical tier uses database replicas behind the scenes to ensure business continuity, and the service also uses automatic page repair. This is the same technology that is used for SQL Server database mirroring and availability groups. When a replica cannot read a page due to a data integrity problem, a new copy of the page is obtained from another replica, replacing the unreadable page with no data loss or downtime for the customer. This functionality is available in the "Universal" level if the database has a geo-redundant replica.
  • Greater availability: The "business critical" level in a configuration with several availability zones guarantees an availability of 99.995%, the level "universal" in comparison to 99.99%.
  • Fast geographic recovery: The Mission Critical tier configured with geo-replication provides a guaranteed recovery point objective (RPO) of five seconds and a recovery time objective (RTO) of 30 seconds for 100% of the hours provisioned.

Next Steps