Amazon Redshift Data API simplifies access to Amazon Redshift data warehouses by providing a secure HTTP endpoint for executing SQL queries which removes the need for managing drivers, connections, network configurations, etc.
An improvement that Amazon Redshift has made is session reuse that can significantly streamline multi-step, stateful workloads that involve sequential queries, which helps reduce complexity and optimize the use of database connections.
With Amazon Redshift’s Data API session reuse, a single long-lived session can be used for the entire exchange, transform and load(ETL) process. It allows reusing the same temporary tables throughout the phases of the ETL process which was not possible earlier and thus simplifies the ETL pipeline execution.
The article explains in detail how to create, populate and query temporary staging tables using session reuse across the full data transformation workflow in a persistent Amazon Redshift database session.
The Amazon Redshift Data API is suitable for enabling connections to the Amazon Redshift database, serverless or provisioned clusters, without the need for persistent connections to a cluster.
Relevant use cases for Data API include accessing Amazon Redshift from custom applications using programming languages supported by the AWS SDK, building a serverless data processing workflow, designing asynchronous web dashboards, building and scheduling SQL scripts for ETL pipelines.
Best practices while using the Data API include federating IAM credentials to the database, customizing policies to provide fine-grained access, limiting data retrieval from clients to 100 MB, etc.
The newly launched session reuse functionality in Amazon Redshift Data API has been demonstrated thoroughly with best practices in this article.