Digital Guest Care Ticket Data Ingestion
Jira Initiatives | TBC |
---|---|
Project Status | Draft |
Created On | Jul 8, 2024 |
Key Business Stakeholders | @Siobhan Van Der Kley @Jonathan Brown @Victoria Quan @Rayand Ramlal |
Engineering | @Anirudha Porwal |
Due Date | TBD |
Objective
Digital Business and Guest Care are looking to access the guest care tickets data coming through the Digital platforms. This will allow end-users to be able to query this data to perform routine and ad hoc descriptive reporting, as well as build dashboards off of this data to provide key stakeholders and leadership team members with self-service insights on guest care performance. An example use case for this data will be based on identifying large swings in tickets linked to specific restaurants; this will allow Digital to direct Field Team resources to specific restaurants to investigate any potential operational challenges. Lastly, it is envisioned that ML modelling (e.g. NLP) will be applied to this data to extract sentiment and provide insights on leading indicators stemming from this data. To that end, this set of requirements focuses on ingestion of this data into Digital’s Databricks environment.
Requirements
The following are the requirements from Business and Digital Guest Care:
Data
All unmasked data fields from Snowflake table BRAND_TH.ZENDESK.TICKETS_WITH_FEEDBACK (sample: )
For CAN & US
For all periods available (minimum required from January 1st, 2020 inclusive)
Data types should be preprocessed, where applicable, prior to ingestion
Critical columns include: Store_ID, Ticket_ID, Create_Time, Update_Time, Order_Time, Country_Code, Status, User_ID, Service_Mode, Case_Level_Tags, Severity, Ticket_Tags, New_Customer_Inquiry_Field, Form_Category, Agent_Comments, Duration_since_Last_update, Time_Spent_Total, Subject, Comment, Ticket_URL, Complaint_Type, Is_deleted [Non-exhaustive - TBC based on availability of metadata]
Clear Indicator of Critical Timestamps (e.g. Create time of ticket, modification times, etc.)
Location
Dataset should be replicated into an appropriate schema within the Loyalty catalog in Databricks
Frequency
Data should be refreshed daily (at minimum) and should have the prior full-day’s data available by 07h00 EST of the immediate following day
Metadata
Metadata should be provided along with the table indicating the definitions (and limitations, where applicable) of the critical columns, to begin
Risks
Ingestion of PII due to free-text within the data should be risk-assessed prior to ingestion
Potential data lags: an assessment should be made, prior to ingestion, to determine if any data points are coming in at non-standard frequencies
Project timeline overruns
Key columns not populating consistently, e.g. loyalty IDs (User_ID - ?), Store_ID, etc.