4 years ago
About downtime on 2020-03-09
What happened
- Along March, we deployed a lot of changes to our continuous integration and delivery system with the aim of improving environments reproducibility.
- On March 9 the schedule in charge of deploying a new integrates version failed.
- Since we deploy integrates from this system in an automated way with the purpose of rotating our AWS keys and the scheduled deployment failed, the AWS Keys in the previous version of Integrates (which customers were accessing on March 8) expired.
- Due to that, our back-end was not able to communicate to AWS services like DynamoDB (the database) and S3 (finding evidences) among others, causing a downtime in the service.
What we’ve done
- We detected the issue on March 9th and deployed a new version of Integrates with fresh AWS keys manually.
What’s the impact
- Unavailability of users’ and findings’ data between 4am and 7am (Colombian Time).
What we are doing to help
- We fixed and make sure the continuous delivery system works as expected. We are investigating ways to avoid future occurrences of expired AWS keys in the system.