Research Support Service Level Agreement (SLA) Policy for High-Performance Computing (HPC) Environments
Last update 11/8/2024*
*If this policy is more than 365 days old refer to policy owner for updated version
Contents
Version Control 3
Purpose.. 3
Parties Involved.. 3
Scope of Services. 3
Services Availability.. 3
Support Hours. 4
Incident Response and Resolution Times. 4
User Responsibilities. 4
Performance Metrics. 5
Review and Reporting.. 5
Contact Information.. 5
Version
Author
Approver
Notes
1.0
J Buenger
Initial policy creation
This Service Level Agreement (SLA) outlines the scope of services, performance expectations, and support levels provided by the University of Chicago Booth Information Technology (IT) HPC Support Team to researchers utilizing the High-Performance Computing (HPC) resources.
The following services are covered under this SLA:
HPC resources will be available to users as follows:
The Booth IT HPC Support Team will provide support during the following hours:
The following response and resolution times [SM1] [JB2] are established for incidents based on their priority levels:
Priority Level
Description
Response Time
Resolution Time
Critical
Cluster-wide outages and critical failures
Within 1 hour (24x7)
Within 4 hours (24x7)
High
Major service degradation or job failures
Within 1 business day (24x7)
Medium
General issues affecting individual users
Within 1 business day (M-F)
Within 22 business hours (M-F)
Low
Minor issues, requests for information
Within 2 business days (M-F)
Within 44 business hours (M-F)
Users of the HPC resources are expected to:
Performance of the HPC support services will be measured using the following metrics:
The SLA will be reviewed annually by the Booth IT leadership team Review will include evaluation of performance metrics, user feedback, and any necessary adjustments to the SLA.
For support and assistance, users can contact the Booth IT HPC Support Team as follows:
[SM1]Are these times for business hours or 24X7 response?
[JB2]See updates to chart
[SM3]Do we have these? I have not written them.
[JB4]Mercury Architecture and Usage Limits — Mercury Computing Cluster documentation
[SM5]Do we have these documented?
[JB6]Running Programs on Mercury — Mercury Computing Cluster documentation