Jira Initiatives	TBC
Project Status	Draft
Created On	Jul 8, 2024
Key Business Stakeholders	@Siobhan Van Der Kley @Jonathan Brown @Victoria Quan (Deactivated) @Rayand Ramlal
Engineering	@Anirudha Porwal
Due Date	TBD

Objective

Digital Business and Guest Care are looking to access the guest care tickets data coming through the Digital platforms. This will allow end-users to be able to query this data to perform routine and ad hoc descriptive reporting, as well as build dashboards off of this data to provide key stakeholders and leadership team members with self-service insights on guest care performance. An example use case for this data will be based on identifying large swings in tickets linked to specific restaurants; this will allow Digital to direct Field Team resources to specific restaurants to investigate any potential operational challenges. Lastly, it is envisioned that ML modelling (e.g. NLP) will be applied to this data to extract sentiment and provide insights on leading indicators stemming from this data. To that end, this set of requirements focuses on ingestion of this data into Digital’s Databricks environment.

Requirements

The following are the requirements from Business and Digital Guest Care:

Data

All unmasked data fields from Snowflake table BRAND_TH.ZENDESK.TICKETS_WITH_FEEDBACK (sample: )
For CAN & US
For all periods available (minimum required from January 1st, 2020 inclusive)
Data types should be preprocessed, where applicable, prior to ingestion
Critical columns include: Store_ID, Ticket_ID, Create_Time, Update_Time, Order_Time, Country_Code, Status, User_ID, Service_Mode, Case_Level_Tags, Severity, Ticket_Tags, New_Customer_Inquiry_Field, Form_Category, Agent_Comments, Duration_since_Last_update, Time_Spent_Total, Subject, Comment, Ticket_URL, Complaint_Type, Is_deleted _{[Non-exhaustive - TBC based on availability of metadata]}
Clear Indicator of Critical Timestamps (e.g. Create time of ticket, modification times, etc.)

Location

Dataset should be replicated into an appropriate schema within the Loyalty catalog in Databricks

Frequency

Data should be refreshed daily (at minimum) and should have the prior full-day’s data available by 07h00 EST of the immediate following day

Metadata

Metadata should be provided along with the table indicating the definitions (and limitations, where applicable) of the critical columns, to begin

Risks

Ingestion of PII due to free-text within the data should be risk-assessed prior to ingestion
Potential data lags: an assessment should be made, prior to ingestion, to determine if any data points are coming in at non-standard frequencies
Project timeline overruns
Key columns not populating consistently, e.g. loyalty IDs (User_ID - ?), Store_ID, etc.

Digital Guest Care Ticket Data Ingestion