July 11th 2022 10:12AM EST Downtime Analysis & Remediation
Cause Live BE server stopped responding. Actions RDS server was still responding, but showing 20 open connections. EC2 server was not responding, SSH unresponsive, Jenkins (http) unresponsive. AWS reboot of the EC2 instance never finished (5m).AWS stop instance never finished (5m).AWS force stop took 3m+, eventually stopped.Instance manually restarted via AWS console. Forced PM2 restart […]
Exec Review : 06/28/2022 Sales Order Module Offline
Overview The Sales Order module was built originally by the TP team as the “Purchase Order” module. It was extremely simplistic, built on poor data structures and API controllers, and did not meet the client needs. In Q3 2021 a significant update to the Sales Order module was requested. The new features required notable changes […]
Apr 28 2022 9:35 – 10:35 AM Downtime Analysis
9:44AM EST report came in that tenants could not login to production The Live BE server on EC2 was non-responsive when trying to login to Jenkins. Rebooted the EC2 instance after viewing the monitoring reports. Approximately 9:35AM CPU spike to near 100%. Started a rebuild process on Jenkins to force the API to do a […]