Phison HCI Phison HCI

Phison Hyper-Converged Infrastructure software is the core architecture of the next-generation AI Data Platform. It integrates computing, storage, GPU resources, and an AI management platform to provide enterprises with a one-stop AI infrastructure solution. 群聯超融合基礎設施（HCI）軟體是下一代 AI 資料平台的核心架構。它整合運算、儲存、GPU 資源與 AI 管理平台，為企業提供一站式 AI 基礎設施解決方案。

0 0 Downtime Operations 零停機維運

Built-in health monitoring and automatic failover redirect traffic within seconds when a node goes offline, enabling rolling maintenance with zero service interruption. 內建健康監控與自動容移（Failover），節點離線時流量在數秒內重導，支援滾動維護，服務零中斷。

0 0 % UP % UP More Usable GPU Capacity 更多可用 GPU 算力容量

Reuse previously generated KV cache instead of rebuilding it, freeing GPUs for additional workloads. 重用先前產生的 KV 快取 (KV cache) 而非重新建構、釋放 GPU 資源以處理額外工作負載。

0 0 % UP % UP Faster TTFT vs. Recompute TTFT 較重算更快

Retrieve cached context from VRAM, DRAM, or SSD tiers instead of recomputing prefill from scratch. 自 VRAM、DRAM 或 SSD 分層中擷取已快取的上下文，而非從頭重新計算 prefill。

Enterprise AI Infrastructure Challenges 企業 AI 基礎設施的核心障礙

Before adopting Phison HCI, enterprises deploying private AI must first overcome the following fundamental challenges.在採用 Phison HCI 之前，部署 Private AI 的企業必須先克服以下根本性挑戰。

Persistently Low GPU Utilization GPU 閒置率居高不下

Full-card or passthrough deployment often reaches only 20–30% GPU utilization, leaving compute underused and hardware ROI extremely low. 整卡或直通部署平均 GPU 使用率僅 20–30%，算力大量閒置，硬體投資 ROI 極低。

Limited LLM Context Length LLM 上下文長度受限

GPU HBM often cannot hold the KV cache large models need; long documents and conversations degrade quickly or fail to complete. GPU HBM 常不足以承載大型 KV cache；長文件與長對話場景效能急降或無法完成推論。

Lengthy Deployment Cycles 部署週期冗長

From procurement and networking to containers and model launch, traditional flows often take weeks or months, slowing AI innovation. 從採購、網路到容器平台與模型上線，傳統流程常需數週至數月，嚴重拖慢 AI 創新。

Fragmented Multi-System Management 多系統管理破碎化

Compute, storage, network, containers, and monitoring are managed separately—IT teams operate across five or more systems, raising labor cost and config drift risk. 計算、儲存、網路、容器與監控分散管理，IT 需跨五套以上系統，人力成本高且易組態漂移。

What Core Technology Does Phison Own? 群聯擁有哪些核心技術？

Phison HCI builds on three self-developed technologies — vGPU partitioning and time-sharing, aiDAPTIV Cache Memory tiering from HBM to NVMe, and multi-node tensor/pipeline parallel scale-out — to eliminate GPU idle waste, extend effective memory across the cluster, and run large-model inference at production scale.群聯 HCI 以三項自研技術為基礎——vGPU 切割與分時共享、aiDAPTIV Cache Memory 從 HBM 到 NVMe 的分層快取、以及多節點 Tensor/Pipeline 平行擴展——消除 GPU 閒置、延伸叢集有效記憶體，支撐大型模型量產級推論。

vGPU Partitioning + Time-Sharing vGPU 切割 + 分時共享

Split a single GPU into vGPU instances with on-demand compute and memory allocation. Multiple models or tenants time-share the same card with QoS isolation, and quotas adjust dynamically at peak load. 單卡切割為 vGPU 實例，按需分配算力與顯存；多模型、多租戶分時共享並確保 QoS 隔離，尖峰期動態調整配額。
KV Cache Extension Across Multiple Nodes KV Cache 擴充多節點

aiDAPTIV tiers cache from GPU HBM to aiDAPTIV Cache Memory, expanding effective memory 10×+. KV cache shares across nodes so Prefill results reuse across Decode workloads — supporting 128K+ token inference without OOM. aiDAPTIV Cache Memory 自 GPU HBM 分層至 NVMe SSD，有效記憶體延伸 10 倍以上；KV cache 跨節點共享，Prefill 結果供多 Decode 節點重用，支撐 128K+ tokens 推論。
Multi-Node Scale-Out Architecture 多節點橫向擴展架構

Tensor Parallel + Pipeline Parallel split 70B–405B models across nodes. Cross-node KV cache sharing cuts inter-node traffic as throughput scales linearly with each node added. Tensor Parallel + Pipeline Parallel 切分 70B–405B 模型；跨節點 KV cache 共享降低跨機通訊，新增節點即可線性提升吞吐量。

Phison HCI Architecture 群聯 HCI 架構

Through software-hardware integration and modular design, enterprises can quickly deploy AI workstations, Private AI, AI agents, RAG, AI inference, and Edge AI applications, lowering adoption barriers and accelerating AI implementation.透過軟硬體整合與模組化設計，企業可快速部署 AI 工作站、Private AI、AI 代理、RAG、AI 推論與 Edge AI 應用，降低導入門檻並加速 AI 落地。

User Surfaces 使用者介面

Platform Services 平台服務

AI Platform AI 平台

On-prem model upload 本地模型上傳
OCI Artifacts support OCI Artifacts 支援
Rapid model deployment 快速模型部署
Performance monitoring 效能監控

Service Deployment & Management 服務部署與管理

Scheduling & orchestration 排程與編排
Backend services 後端服務

Compute Resources 運算資源

CPU CPU
RAM RAM
GPU & vGPU GPU & vGPU
SSDs SSDs

Cluster 叢集
Container 容器
VM 虛擬機
Multi-Tenancy 多租戶
Access Control 存取控制
Audit 稽核
Cost 成本
Image 映像檔
Monitoring 監控

Benefits 效益

Accelerated Deployment 加速部署

One-click deployment一鍵部署

GPU Optimization GPU 最佳化

vGPU time-slicing & sharingvGPU 時間切片與共享

Multi-Tenant Isolation 多租戶隔離

Resource & permission segregation資源與權限隔離

Full-Stack Observability 全端可觀測性

Real-time monitoring即時監控

Cost analytics成本分析

Phison HCI Software 群聯 HCI 軟體

Software Deployment & Container Image Management 軟體部署與容器映像管理
CPU / GPU / VM Resource Management CPU / GPU / VM 資源管理
vGPU vGPU
Security & Access Control 安全性與存取控制
Application Marketplace 應用程式市集
Storage Management 儲存管理
Cost Management 成本管理
Monitoring, Observability & Alerting 監控、可觀測性與警報

Unified Management Console 統一管理控制台

Phison hyper-converged software unifies heterogeneous GPU, XPU, storage, and VM resources under one control plane — manage Kubernetes, VMs, AI inference, and monitoring in a single console to maximize AI infrastructure ROI without switching tools.群聯超融合軟體將異質 GPU/XPU、儲存與虛擬機資源統整於單一控制平面 — 在單一控制台管理 Kubernetes、虛擬機、AI 推論與監控，整合混合硬體、最大化投資價值，無需切換工具。

ai-cluster production Application Hub

Application Hub

Browse and install Helm charts

Reload Upload

Chart	Version	App Version	Repository	Actions
vllm	0.6.2	v0.6.2	otterscale-charts	View Install
llm-inference	1.2.0	1.2.0	otterscale-charts	View Install
prometheus-stack	55.5.0	2.47.0	community-charts	View Install
rook-ceph	1.14.2	v1.14.2	rook-release	View Install
kubevirt	0.59.0	v1.0.0	kubevirt-charts	View Install

ai-cluster production Model

Model Status

Monitor LLM inference health, latency, and GPU allocation in real time.

Overview

production Last 1h

Models

Llama-3-70B

KV cache Usage

91.2%

Max across replicas

Queue Depth

Waiting requests

Success Rate

99.7%

Request success

Time to First Token

P95 / P99 latency

142 ms

Per Output Token

P95 / P99 latency

18 ms

ai-cluster production Pods

Pods

Filter resources…

Create Pod

Name	Namespace	Ready	Status	Restarts	Age
llm-inference-7f8b9c	ai-prod	2/2	Running	0	2d
vllm-worker-0	ai-prod	1/1	Running	1	2d
embedding-svc-4a2c	ai-prod	1/1	Running	0	5h
rag-indexer-batch	ai-prod	0/1	Pending	0	12m
prometheus-server-0	monitoring	2/2	Running	0	14d
grafana-7b4d89	monitoring	1/1	Running	0	14d

6 resources 1 / 1

ai-cluster production Deployments

Deployments

Filter resources…

Create Deployment

Name	Namespace	Ready	Available	Age
llm-inference	ai-prod	2/2	2	2d
vllm-worker	ai-prod	1/1	1	2d
embedding-svc	ai-prod	1/1	1	5h
prometheus-server	monitoring	2/2	2	14d
grafana	monitoring	1/1	1	14d

5 resources 1 / 1

ai-cluster production Object Buckets

Object Bucket Claims

Filter resources…

Create Bucket

Name	Namespace	Storage Class	Status	Age
model-artifacts	ai-prod	ceph-rbd	Bound	14d
training-data	ai-prod	ceph-rgw	Bound	7d
backup-snapshots	kube-system	ceph-rbd	Bound	30d
rag-documents	ai-prod	ceph-rgw	Pending	2h

4 resources 1 / 1

ai-cluster production Services

Services

Filter resources…

Create Service

Name	Namespace	Type	Cluster IP	Age
llm-inference-svc	ai-prod	ClusterIP	10.96.1.42	2d
vllm-worker-svc	ai-prod	ClusterIP	10.96.2.18	2d
embedding-svc	ai-prod	ClusterIP	10.96.3.55	5h
prometheus-server	monitoring	ClusterIP	10.96.8.10	14d

4 resources 1 / 1

Core Technology Benefits 核心技術效益

Measurable performance and efficiency gains powered by Phison HCI's proprietary technologies.群聯 HCI 自研核心技術帶來可量化的效能與效率提升。

Lower Inference Cost更低推論成本

Reduce idle resources and directly lower the per-token inference cost.降低閒置率，直接減少每 Token 推論成本。

Linear Scale-Out線性橫向擴展

Add new nodes to linearly increase throughput without redeploying the model.新增節點即可線性提升吞吐量，無需重新部署模型。

Higher Concurrency更高並發容量

Combined with vGPU partitioning, a single host can serve more concurrent requests simultaneously.結合 vGPU 切割，同台主機可同時服務更多並行請求。

vGPU Resource Partitioning Technology vGPU 資源切割技術

A single GPU can be divided into multiple virtual GPU instances, allowing different workloads — such as training, inference, and batch processing — to share the same card. This eliminates idle GPU waste and enables fine-grained resource scheduling. Phison HCI Platform supports GPU virtualization and resource partitioning, allowing a single GPU to be dynamically allocated to multiple AI tasks or users and preventing GPU idle waste. 單張 GPU 可切割為多個虛擬 GPU 實例，讓訓練、推論、批次處理等不同工作負載共享同一張卡，消除 GPU 閒置並實現精細化資源調度。群聯 HCI 平台支援 GPU 虛擬化與資源切割，可將單卡動態分配給多個 AI 任務或使用者，避免 GPU 閒置浪費。

Core value 核心價值

Maximizes GPU utilization 最大化 GPU 利用率

Lowers AI adoption costs 降低 AI 導入成本

Enables multiple workloads to run in parallel 支援多工作負載並行運行

Supports secure multi-tenant isolation 支援安全的多租戶隔離

Applicable scenarios 適用場景

Shared AI workstations 共享 AI 工作站

Multi-department AI development 多部門 AI 開發

AI inference service platforms AI 推論服務平台

GPU resource pool management GPU 資源池管理

Multi-Node KV Cache Expansion Technology 多節點 KV Cache 擴充技術

Phison's self-developed KV cache expansion technology uses high-speed NVMe storage as an extension of GPU HBM. It addresses the context-length limitations of large models and supports shared cache across multiple nodes, significantly reducing GPU VRAM pressure and improving large-model inference efficiency. 群聯自研 KV cache 擴充技術，以高速 NVMe 儲存延伸 GPU HBM，突破大型模型上下文長度限制，支援多節點共享 Cache，顯著降低 GPU 顯存壓力並提升大模型推論效率。

Core technical features 核心技術特性

GPU / DRAM / SSD / Remote SSD hierarchical caching architecture GPU / DRAM / SSD / Remote SSD 分層快取架構

Dynamic KV cache expansion 動態 KV cache 擴充

Support for long-context inference 支援長上下文推論

Shared cache resources across multiple nodes 多節點共享快取資源

Technical benefits 技術效益

Improves model inference throughput 提升模型推論吞吐量

Reduces GPU memory bottlenecks 降低 GPU 記憶體瓶頸

Reduces the need to purchase high-end GPUs 降低高階 GPU 採購需求

Improves overall GPU usage rate 提升整體 GPU 使用率

Phison HCI Phison HCI

0 0 Downtime Operations 零停機維運

0 0 % UP % UP More Usable GPU Capacity 更多可用 GPU 算力容量

0 0 % UP % UP Faster TTFT vs. Recompute TTFT 較重算更快

Enterprise AI Infrastructure Challenges 企業 AI 基礎設施的核心障礙

Persistently Low GPU Utilization GPU 閒置率居高不下

Limited LLM Context Length LLM 上下文長度受限

Lengthy Deployment Cycles 部署週期冗長

Fragmented Multi-System Management 多系統管理破碎化

What Core Technology Does Phison Own? 群聯擁有哪些核心技術？

vGPU Partitioning + Time-Sharing vGPU 切割 + 分時共享

KV Cache Extension Across Multiple Nodes KV Cache 擴充多節點

Multi-Node Scale-Out Architecture 多節點橫向擴展架構

Phison HCI Architecture 群聯 HCI 架構

Unified Management Console 統一管理控制台

Model Status

Core Technology Benefits 核心技術效益

Lower Inference Cost更低推論成本

Linear Scale-Out線性橫向擴展

Higher Concurrency更高並發容量

vGPU Resource Partitioning Technology vGPU 資源切割技術

Core value 核心價值

Applicable scenarios 適用場景

Multi-Node KV Cache Expansion Technology 多節點 KV Cache 擴充技術

Core technical features 核心技術特性

Technical benefits 技術效益