Fault isolation boundaries are logical divisions within your infrastructure that are intended to contain the impact of component failure.
AWS provides infrastructure-level fault isolation boundaries including availability zones, regions, accounts, partitions, and local zones.
These boundaries fit into a conversation around resilience. AWS uses them internally to roll out new capabilities in a safe manner so that any failed changes can not only be rolled back quickly, but also only impact a small number of users.
Regions are deliberately designed to be hundreds of miles apart such that a natural disaster impacting one region should not impact its neighbors.
In contrast to regions, availability zones are separated by just tens of miles to ensure that data replication between them remains fast.
AWS services are implemented with a division between their control and data planes, which enables a concept called static stability.
Statistically speaking, failures of individual availability zones are more likely than entire regions.
Implementing statically stable workloads across multiple AZs in a single region is a good first step, and many services in AWS make it easy to operate across them simultaneously.
For the most critical workloads, multi-region operations might be necessary, but this is significantly more expensive and can require a lot of refactoring.
The AWS Whitepaper on Fault Isolation Boundaries provides deeper insights into how various AWS services are built for resilience and some hidden gotchas that you may not be thinking about.