Everybody talks about high availability, everybody wants high availability. And, indeed, options exist for high availability in PostgreSQL.
But what does “high” really mean? And how do you design for HA at scale, when you need to manage tens of thousands of services? A real-world scenario.
Aiven manages tens of thousands of services across multiple clouds.
Offering a fully managed service does not equate to just “running some software in the cloud” - it comes with a lot of strings attached, including customer expectation management, allowing many internal operators to work on systems they didn’t design, and handling cloud provider quirks - sometimes, all at once.
Hence, an High Availability solution should not just be that - high availability; it needs to be observable, easy to operate, provide fast and simple ways out of common and less common problematic paths, and offer clear performance and reliability guarantees to the customers.
This talk will provide insights about the challenges of operating such services at scale, and how we solved those with our HA implementation, leveraging Patroni and pgBackRest. Some of the topics that will be discussed: