Exec Review : 06/28/2022 Sales Order Module Offline

Overview

The Sales Order module was built originally by the TP team as the “Purchase Order” module. It was extremely simplistic, built on poor data structures and API controllers, and did not meet the client needs. In Q3 2021 a significant update to the Sales Order module was requested. The new features required notable changes to the data structures and underlying code of this module.

One feature in particular, the “order status indicator” was not designed in a way that was scalable or performant. Concerns were noted over the possible performance issues of the architecture. After considering customer demands, the delivery timeline, and the expected level of use for the Sales Order module, the decision was made to publish the new features and put the module on the development schedule for a future refactor.

In late June 2022, the use of the Sales Order module increased, adding pressure to the non-performant code. This caused extensive delays with the user experience to the point that the API server could not get data back to the user before the 300 second browser time out on the front end application.

Cause

Poorly Performing SQL Queries

Some of the SQL queries that were designed for the sales order status indicator were poorly structured. One query in particular has come up in data analysis reports and has been cited numerous times as needed to be restructured. The query was designed using complex case/when statements that were then fed to coalesce statements. This query cannot be optimized by the PostgreSQL data engine and thus leads to sequential scans of the database. One of the scans is on the largest data table in the system, the inventories table. This one subquery, a single part of the overall data query, could take up to 20 seconds to complete on its own.

Lack Of Consideration Of Data Indices

Closely related to the poorly performing data queries was a lack of consideration of building proper data indexes. The status indicator query alone involves over a dozen relations and subqueries across multiple tables. Some of the tables are extensive such as the inventory table or production (batches) table. In order to quickly filter and sort data proper indexes are necessary.

None of the complex filters, such as limiting record sets by status plus tenant ID, had indexes created on the tables that needed to be filtered. As such asking for data such as “all completed batches for tenant #1” would take 10-20x longer to complete resulting in 3-5 second response times. Multiple that by a dozen similar queries and the performance is already in the “minute-plus” range. Not acceptable for user experience.

Poor Data Structures

New data structures were put in place to help support the sales order queries, among other features of the application. However, much of the existing core Sales Order data structures were left unchanged. Not enough time was available to create the newly requested features AND rewrite the entire existing Sales Order front end module , API controller, and data processors. In addition, the pressure to deliver the module quickly meant taking a simplistic approach to the data stores. Existing data structures were extended as the overhead to add meta and summary tables would have added 3-4 weeks of development time to the schedule. The same “revisit later and refactor” decision was applied to this situation.

No Front End UX Controls

Very little, if any, consideration has been taken across the OmniBlocks application to manage poor user behavior. The original application was designed with a “single user/single state” mindset. While some of this has been corrected, and extensive rewrite of the primary user interfaces is necessary for managing user interactions.

One of the actions that is not prevented in the UX is allowing users to “jump between modules” in the middle of a data intensive operation. This includes “reloading the same module”.

In the case of Sales Orders, the 2-minute-plus response time seems to have caused users to “assume something was wrong” and thus click on the “Sales Order” sidebar entry a second time hoping for a faster response. This sends a second data query to the server, starting a parallel set of data queries that can take several minutes to respond. This compounds the issue as the first query is still running on an identical data set. The propensity for these identical queries to get into lock/wait states on the data set is extremely high, slowing down BOTH queries significantly. By the time the user does this the third time the API server and database are nearly non-responsive with 10-minute-plus wait times.

Resolution

Actions Taken

Remove The Real Time Status Indicator

One of the requested features on the Sales Order initial listing was to add a “Status Indicator” to all Sales Orders. The data query that is performed to manage this in real-time is the main “poor performance” SQL query noted in the causes section.

The status indicator has been removed from the listing and now requires manual interaction to check a box and turn on that indicator. The code has also been updated to have the basic Sales Order records be listed first with the status indicator doing background queries and updating each entry on the table as results are returned. Each indicator can take up to a minute to render; This is not the best user experience but is an acceptable stop gap while proper “deeper fixes” are architected.

This change prevents the data intensive query from running by default, creating significantly less load on the data server.

Building Data Indexes

An analysis of the data query and subquery stack was performed to create a set of potential data index candidates. Over two dozen SQL queries were analyzed and nearly a dozen new database indexes were added to the system. These updates address the most time consuming queries related to the Sales Order module. Some queries returned a 20-fold increase in performance; in one case reducing a 30-second inquiry to sub 1-second.

Some of the indexes that were created have improved performance of other modules in the application.

More data query analysis and index analysis is needed across the application.

Remaining Areas of Concern

Unclear Standard For Sales Order Architecture

The Sales Order module has gone through several changes and feature request from clients. In order to perform the refactor noted below, time must be taken to interview the clients and combine their input with our internal knowledge of the Sales Order system. A proper needs and feature assessment must be performed so the development team can create the right architecture before performing the full Sales Order module refactor.

Much of the existing data structure are based on legacy knowledge and is likely incorrect based on the original development team’s interpretation of what was needed. That team didn’t document anything and simply started creating data structures and code. The current team has limited input on what the original request and feature set was. As such decisions are made to keep legacy elements intact, much of which may not be needed.

A clear requirements assessment will significantly improve the results of a refactor of this module.

Sales Order Refactoring

The current patches noted above are temporary stop-gap measures meant to get the system functional as quickly as possible. They are considered “band aids” for a much larger endeavor, the Sales Order Refactor.

This includes addressing the current data structures, their related API endpoint controllers and data processors. It also includes reworking much of the existing complex data queries. An extensive subsystem of metadata tables needs to be created to avoid doing real-time data status calculations across multiple tables; instead triggers during data changes need to be employed to augment stored calculations for given data sets.

The entire Sales Order module has been on the refactor list since the initial “additional features” request was made in Q2/Q3 2021. It has continually been pushed down the list below nearly all other modules including the Production Refactor (HoFSM), Formulas Refactor, Inventory Refactor, and even the addition of a new QC module needed for better FDA compliance. The decision to rank it below these modules was due to the fact that “nobody is using it that much”; Which apparently changed in late June.

Despite the new focus on the Sales Order module, this probably remains a lesser used module compared to the risks of not finishing the Production systems refactor or getting the Field subsystem of the QC module finished (which ties into phase 3 of the production refactor).

Bottom line — more resources are needed to address the known potential “fragile, non-performance, and non-scalable” pieces of the application in a timely fashion.

Lack Of Proper Data Indexes

Several efforts have been made to address slow performing data queries over the past 18 months. This has been addressed in an ad-hoc fashion, only analyzing queries and potential indexes that improve performance when the UX is negatively impacted. A concerted effort needs to be taken to review the bulk of SQL queries present the API controller and data processors and build proper indexes to better support the queries.

Recommended Actions

Full Sales Order Assessment

As noted above, perform a proper needs assessment and get buy-in from all parties. Involve the R&D team from the beginning and engage in customer conversations. Use this information to create the right data architecture, derive the proper data processors, API endpoint controllers, and eventually a new front end User Experience.

Resource Estimate: (1) Sales/Customer Service Rep x 1 week , (1) Full Stack Engineer x 1 week, (1) Backend/DB engineer x 1 week.

Sales Order Refactor

As noted above, a complete refactor is necessary to fix the underlying core problems with the Sales Order module that remain. Data structure changes, query changes, adding meta tables. This is a significant “lift” for the dev team and needs to be on the radar.

Resource Estimate: (1) Full Stack Engineer x 4 weeks, (1) Front End Engineer x 2 weeks

Employ Modern Data Management Techniques

The complexity of the Sales Order inquiries sheds light on the necessity and benefits of employing modern data management and access techniques. The current OmniBlocks application is somewhat “monolithic” with all tenants and all data store in several large data tables. While the tables are not excessively large by today’s standards, they should be better managed. Two active clients are already generating 60GiB of data in 6 months (though this has somewhat been remedied by smarter audit logs, slowing the velocity to a couple GiB/Month).

Adding views to create subsets of data, considering sharding the database, and offloading some data sets such as the Audit Log to a NoSQL or low-velocity data store in a separate database & SQL server are some of the options that need to be considered.

Resource Estimate: (1) DB Admin or highly qualified BE engineer for 4+ weeks.

Tagged downtime analysis, executive report, OmniBlocks®