KT Audit Products Feature
General Overview
The audit products feature enables the operations team to easily validate recent changes to products status. The main issue that happens today is products been made unavailable (for stock shortage for example), and them people forgetting to making them available again, leading to a loss of possible sales. For detailed information on the business problem, please check [Opportunity] Ability to audit per store available and unavailable products (DOP).
The main challenge in this feature is that product availability management is complex process, where the required data is distributed in different data sources (Sanity CMS and DynamoDB) and require filtering by large data sets, specially for BK production environment.
Our chosen approach for this issue was creating a Redis cache, where we can keep the data centralized and enable fast access for report generation. In the next sections we’ll dive deeper into each component of this feature, but for an UI/UX overview, please check the figma discovery page.
Initial Cache Load
As previously stated, part of the data necessary to generate the audit reports com from sanity, and part comes from the dynamo menu data table. The initial load is triggered the first time a user tries to access the audit products report for a market, by the AuditProductsReport
query (query resolver code), which can be pooled until the report is ready.
AuditProductsReport
Query Response:
{
"data": {
"auditProductsReport": {
"isReportReady": BOOLEAN // If report loading is under progress
"report": [AUDIT_PRODUCTS_ROW] // Report data. Returns empty list if loading
}
}
}
Given the sanity data is relatively small, it is cached by AuditProductsReport
query, but this is not the case for the menu-data table. To avoid exceeding the lambda execution time and improve the load performance, the AuditProductsReport
separates the menu-data load into batches of 20 restaurants, and load them in parallel using SNS and the load-audit-products-cache
lambda (lambda handler code).
The load process keeps track of its current status using 3 different records in Redis.
sanityCache:info
Status of sanity cachemenuCache:info
Status of the menu-data table cachelambdaLoadStack
Status of the parallel lambdas in execution
Concerning the lambdaLoadStack
, for every restaurant batch created, the AuditProductsReport
add the batch id to this stack, which then is removed by the load-audit-products-cache
lambda. If the stack is empty after removing the current id, the lambda also updates menuCache:info
to mark the load as completed.
Generating Reports
As we have the requirement of applying multiple filters to a large dataset and displaying the most recently updated data first, we opted to leverage redis sorted sets to enable the filtering and pagination of these reports. Besides the cache of the sanity and dynamo data, during the initial load we also create Redis' sorted sets of each of the filters we intend to apply in the report, and during the query we use zunionstore
(union of sorted sets), zinterstore
(intersection of sorted sets) and zrevrange
operations to achieve this result. The sets are ordered by date of last update. Find below a visual representation of the cache data structure.
Also, find below an example of the AuditProductsReport
applying all possible filters to the report, and paginating the indexes from 0 to 100.
Only Store IDs filter can receive more than one value, which requires the union of store ids sorted sets. All the other filters can accept only a single value.
{
"filters": {
"storeIds": [
"2222"
],
"section": "Euroking",
"availability": true,
"channel": "whitelabel",
"serviceMode": "pickup",
"type": "Item"
},
"pagination": {
"end": 0,
"start": 100
}
}
Redis Cache Update
Once again, we use different cache update strategies for our two different data sources.
DynamoDB menu-data Table
For the menu data table, we use a dynamo stream to trigger the load-audit-products-menu-sync
lambda (lambda handler code) every time a product status is updated. This ensures we keep the reports updated with me most recent availability changes.
Sanity CMS
In Sanity’s case, we don’t have access to a Dynamo stream to ensure real time updates in the cache. Luckily though, the data that comes from sanity (sections, name, internal name) changes a lot less frequently than availability. As such, we configured the sanity cache to expire after 24 hours. If an AuditProductsReport
query is triggered and the cache has expired, it will reload only the sanity cache.
IMPORTANT: Sections and Type field are special, given that they come from sanity, but can also be filtered by. The implication here is that a change on Section (a sanity document Type cannot change) needs to be reflected in the section filters sorted sets.
However, there is no simple way to keep track of which menu-data cache items are associated to which products, given the sorted sets are indexed using the menu-cache item id. Therefore, we calculate and store the sha256
of the sanity sections data, and if it ever changes, the current market data is dropped and a new cache load is triggered from scratch.
Export Reports in CSV format
Similar to the normal reports, the report in CSV format can generated using the ExportAuditProductsCsv
query (resolver code here). The front-end, once again should start a pooling and wait until the report generation has finished, and once it does, the back-end will return a S3 pre-signed url that for the file download.
ExportAuditProductsCsv
Query Response:
{
"data": {
"exportAuditProductsCsv": {
"downloadUrl": STRING, // url for report download
"isReportReady": BOOLEAN, // if the report generation has finished
}
}
}
Using S3 as an intermediary step is necessary due to the large dataset size we are working with, which would easily exceed the AWS Gateway limit. Similar to the initial load, the report generation is also split into different lambdas to improve performance. Find below a diagram detailing this process:
Once again during this process, the query splits the report into different batches that are sent to SNS and trigger the load-audit-products-csv-handler
lambda (lambda handler code) that will upload that batch’s partial report to S3. When the last lambda executes, it also creates a reports/${reportId}/finished.txt
file in S3 signaling that all batches have been processed. This S3 event will trigger the load-audit-products-merge-csv
lambda (lambda code here) that will in turn:
Merge all partial reports
Upload the final csv to S3
Generate the pre-signed URL
Update the report status record in redis with the download URL
At this point. The front-end pooling will get a response of the download URL, stop the pooling and start the file download.
Feature Toggles
The only feature toggle used to control the release of this feature is the enable-audit-products. This feature controls if the Review Restaurants button will appear for the user or not. Without it, the report is inaccessible.
FT Off
FT On
Deployment
Environment
To track in which environment this feature is currently deployed, check the tags of this commit (last fix commit of the audit products feature). As of now (31/03/2025) it is is DEV environment, but should reach Staging, QA and PROD on the next weeks.
Regions
Since deploying this feature requires the provisioning of an Elasticache cluster, which will generate extra infrastructure costs, we limited its deployment only to Iberia (EUW3 AWS region). If it is ever desired to enable it to new markets, these are the changes required:
Serverless
Update the enable filter for the lambdas deploy:
Terraform
Include the elasticache_node_type
parameter in the desired region on terraform. Example here: live/prod/bk/main.tf
Known Issues
1. Search by Name
One of the requests by the operations team is to also be able to search by products name although we didn’t have enough time to finish it.
The free version of Redis (we are using the open source Valkey engine) does not include the option for full-text search, so the simplest solution here would be to load the entire cache in memory and perform a partial string text search on the name field.
2. Generating the report for a restaurant, than going back
Steps to reproduce
Select a restaurant on the Restaurant Page
Click in Review Restaurant
Click in cancel to go back
Select another Restaurant
Click in Review Restaurant
Expected result
The report for the second restaurant load
Actual result
The Review Restaurant still shows the report for the previous restaurant. Also, any filters that were selected in the preview reports are still applied in the current one.
Improvements
1. Improve loading messages
Currently, when the report is loading, the report only displays the word loading, instead of the traditional loading of DOP or another UX signifier.
The same thing happens when loading the next page of the infinite scrolling:
2. Add Clear filter
It should have a button called Clear Filters above the filters, to reset all filter to the initial state according to the Figma design.
Tests
Tests Plan is documented here: https://rbictg.atlassian.net/wiki/x/nACbUgE
KT video
Part1:
Part2:
Part3:
Part4: