Shift-left OCP System Evaluation with Virtual Client library (VC) for Azure
Terminology
Term |
Definition |
VC |
Virtual Client library for Azure, an open-source repository of workloads and industry benchmarks by Microsoft. |
Shift-Left |
Enabling quality evaluations of the hardware systems during early stages of their development lifecycle. |
EV |
Engineering version, used in the context of the hardware system maturity. |
DV |
Development version, used in the context of the hardware system maturity. |
PV |
Production version, used in the context of the hardware system maturity. |
Workloads |
Customer representative executables or applications |
Overview
Virtual Client library (VC) for Azure is an open source, standardized automation library of industry benchmarks and cloud customer workloads from Microsoft1. It is a key platform used by 7+ engineering groups at Microsoft to evaluate the impact of software, firmware and hardware changes to Azure customer experience, and Azure infrastructure for deployment decisions, following Safe Deployment Practices (SDP)2.
This document outlines potential opportunities to leverage the VC platform to shift-left quality evaluation of OCP systems by evaluating their “cloud readiness” as per the OCP specifications during early phases of development lifecycle. It also explores opportunities to leverage VC platform for OCP Test and Validation initiatives3.
Virtual Client
Virtual Client (VC) is an open-source platform from Microsoft that supports 40 (and adding) workloads and benchmarks spanning x64 and ARM64 architecture, Windows and Linux Operating Systems, and Host and Virtualized runtimes. VC offers run-time dependency management, extensible monitoring capabilities, multiple configurations for workloads and benchmarks, common-schema for analysis needs, and off-the-shelf, robust data engineering pipeline using Azure Platform As-A Service (PaaS) offerings. These capabilities supported by VC can evaluate various OCP systems (e.g., Server, Networking, Firmware, Storage etc.) for performance, reliability, and security benchmarking.
This platform can scale from one benchtop system to 1000s in Datacenters, and its capabilities have been growing due to the open-sourced codebase repositories curated by Microsoft engineers and engineering contributions from the industry.
Figure 1 VC engineering stack
VC uses the concept of profiles to author and support multiple configurations of workloads and benchmarks. VC profiles are JSON configuration files that specify compile time and runtime behavior of the underlying benchmarks and workloads, for example, compiler version, architecture, compiler flags etc. Once deployed, VC executable uses these profiles to create run-time environment, bring and load dependencies, trigger monitoring tools and establish local and remote connections.
Applications
VC platform can be leveraged to shift-left performance, reliability, and security benchmarking evaluations for OCP systems (e.g., server, networking, storage etc.) following OCP specifications.
Another important consideration for using the VC platform is the robust data engineering pipeline and The VC platform abstracts emitted metrics, monitors, performance counters, telemetry events, and unstructured logs using a common schema with a documented data dictionary. This allows OCP projects to track and compare performance and reliability benchmarking across comparable systems and system versions. The VC data engineering infrastructure streamlines A/B comparisons and generates portable, reproducible metrics. It helps expedite OCP specification validations using OCP Test-and-Validation project schema.
While by default, these data are persisted in the local filesystem, VC also uses Azure Platform As-A Service (PaaS) offerings (e.g., Azure Storage, Azure Event Hubs, Azure Data Explorer etc.) for off-the-shelf data analysis needs. It focuses on reporting, analysis, and insights vs. execution.
Figure 2 High level data engineering pipeline
Figure 3 Example of unified data collection schema
Analysis
Based on the early analysis, the following OCP specifications can be supported with minor changes to the VC profiles.
OCP SPEC |
OCP Projects |
VC Value-Add |
Capability Readiness |
OCP Contributor |
NVMe Cloud SSD Specification |
Storage |
Automated Testing |
Yes (FIO, DiskSpd) |
Microsoft, Meta |
NVMe Cloud HDD Specification |
Storage |
Automated Testing |
Yes (FIO, DiskSpd) |
Microsoft, Seagate, Western Digital |
Hyperscale NVMe Boot SSD Specification |
Storage |
Automated Testing |
Yes (FIO, DiskSpd) |
Meta, Google |
Datacenter NVMe® SSD Specification |
Storage |
Automated Testing |
Yes (FIO, DiskSpd) |
Microsoft, Meta, HPE, Dell |
Base Specification for Immersion Fluids |
Cooling |
Power Monitoring, Automated Workload |
Yes (imputil, SPECpower) |
Intel |
Shasta HW System Specifications |
Rack |
Power/Fan/Temperature Monitoring |
Yes (imputil, SPECpower) |
Microsoft |
OCP Accelerator Module Design Specification |
Accelerator |
Power/Fan/Temperature Monitoring |
Yes (imputil, FPGAstress) |
Meta, Microsoft, Baidu |
Test and Validation Enablement Initiative |
Validation |
Automated Testing |
Yes |
OCP |
High Performance Computing - Incubation |
HPC |
Automated Testing |
Yes (HPCG, HPLinkpack, LAPACK etc.) |
OCP |
NIC 3.0 |
Networking |
Automated Testing |
Yes (NTTTCP, sockperf etc.) |
OCP |
Composable Memory |
Memory |
Automated Testing |
Yes (SPECjbb, LMBench etc.) |
OCP |
Regional Project Community - China Mainland |
AI |
Automated Testing |
Yes (Superbench, MLPerf etc.) |
OCP-China |
Opportunities
The following OCP projects could leverage the VC platform capabilities for performance, reliability, and security benchmarking.
OCP Projects |
VC Capabilities |
Ipmiutil data in parallel with SPECpower or other simulated workloads |
|
Standard fault injection or stress workloads |
|
Wide range of networking benchmarks |
|
Ipmiutil data in parallel with SPECpower or other simulated workloads |
|
Firmware automation and qualification. |
|
Security qualification like the tests we are running for ACC |
|
HPC benchmarks like HPCG, LAPACK |
|
IO workloads like FIO, DiskSpd, and database workloads |
|
GPU benchmarks like Superbench and MLPerf |
FAQ
What is Virtual Client library for Azure?
- It is a standardized, collaborative, and open-sourced platform of workloads and industry benchmarks. It is an outcome of work, ideas, and expertise of engineers across Azure coming together to standardize and solve cloud-scale system readiness problems.
- Virtual Client platform supports 40+ workloads and cloud customer representative benchmarks focusing on CPU, GPU, Disk I/O, Network/Web performance, Memory, SQL/Database, Compression, Encryption, Java, Resiliency and Machine learning model training and inference.
- At the high level, Virtual Client platform orchestrates Workload execution, System Monitoring and Dependencies, and provides metrics, telemetry, and monitoring with a standardized common schema.
- Virtual Client can be run as a stand-alone executable on a single machine to 100s of machines as an end-to-end solution.
What are the platforms supported by Virtual Client?
Virtual Client platform abstracts platform and runtime requirements using “profiles.” It supports Linux and Windows operating systems, x64 and ARM64 architectures, Host and Guest (VM) platforms.
How can I use / contribute to the Virtual Client library for Azure?
Virtual Client is open source, and the repository is actively maintained and curated by the Azure engineering team. The repository is well documented with step-by-step instructions to onboard Virtual Client by leveraging Azure PaaS offerings (Event Hub, Azure Data Explorer etc.)
Getting Started: https://microsoft.github.io/VirtualClient
GitHub Repo : https://github.com/microsoft/VirtualClient
Where can I find the list of workloads supported by Virtual Client?
- Virtual Client platform supports 40+ workloads and the list are growing with each production release curated by the Azure engineering team.
- Workloads | Virtual Client (microsoft.github.io)
References
1 What is Virtual Client? https://aka.ms/VirtualClient
2 Azure Safe Deployment Practices (SDP): Advancing safe deployment practices | Azure Blog | Microsoft Azure
3 OCP Test and Validation Enablement Initiative: https://www.opencompute.org/wiki/OCP_Test_and_Validation_Enablement_Initiative
Published on:
Learn moreRelated posts
Azure Confidential Clean Rooms Demonstration
Azure Database for PostgreSQL Flexible Server - Elastic Clusters, faster disks, and AI updates
Increase scalability, optimize performance, and integrate advanced AI features with Azure Database for PostgreSQL Flexible Server. Scale up wi...
Disconnected operations for Azure Local
Introducing the new Linux-based Azure Cosmos DB Emulator (Preview)
We are excited to announce the preview release of the new Linux-based Azure Cosmos DB Emulator! This latest version is built to provide faster...
Azure Cosmos DB Shines at Microsoft Ignite 2024!
Microsoft Ignite 2024 took over the Windy City this week, bringing with it new technological innovation and exciting product announcements apl...