DeNA Co.,Ltd. combined AWS Redshift Serverless and dbt to speed up data quality tests by up to 100 times on sensitive data in healthcare and medical sectors.
To comply with their data policies, DeNA aimed to process and test anonymised data and prevent the inclusion of invalid values or any data loss.
Previously, DeNA ran Python batch jobs on EC2, which led to poor performance and high costs when dealing with large datasets.
To speed up data quality tests, DeNA chose Redshift Serverless and dbt due to their scalability, low-cost serverless nature, and optimal cost-performance.
Redshift Serverless offers optimal processing performance for structured data typical of a data warehouse service, and dbt provides a SQL-first templating engine for repeatable and extensible data transformations.
DeNA used Amazon ECS via AWS Fargate to run dbt in a serverless pay-per-use manner, passing sensitive credentials stored in AWS Secrets Manager to containers using an ECS task execution IAM role.
Redshift Serverless was segmented into separate workgroups for access control, allowing fine-grained access control to data by using database security features similar to the GRANT command.
DeNA improved performance up to 100x and reduced costs by 90% by incurring costs only for data quality tests.
Maintainability was achieved through standardisation of the technical stack with dbt, which eliminated siloed knowledge from custom programs utilising the data tests feature.
AWS serverless services almost entirely eliminated operational overhead for managing the workload infrastructure.