0 0 Downtime Operations 零停機維運
Built-in health monitoring and automatic failover redirect traffic within seconds when a node goes offline, enabling rolling maintenance with zero service interruption. 內建健康監控與自動容移(Failover),節點離線時流量在數秒內重導,支援滾動維護,服務零中斷。
Phison Hyper-Converged Infrastructure software is the core architecture of the next-generation AI Data Platform. It integrates computing, storage, GPU resources, and an AI management platform to provide enterprises with a one-stop AI infrastructure solution. 群聯超融合基礎設施(HCI)軟體是下一代 AI 資料平台的核心架構。它整合運算、儲存、GPU 資源與 AI 管理平台,為企業提供一站式 AI 基礎設施解決方案。
Built-in health monitoring and automatic failover redirect traffic within seconds when a node goes offline, enabling rolling maintenance with zero service interruption. 內建健康監控與自動容移(Failover),節點離線時流量在數秒內重導,支援滾動維護,服務零中斷。
Reuse previously generated KV cache instead of rebuilding it, freeing GPUs for additional workloads. 重用先前產生的 KV 快取 (KV cache) 而非重新建構、釋放 GPU 資源以處理額外工作負載。
Retrieve cached context from VRAM, DRAM, or SSD tiers instead of recomputing prefill from scratch. 自 VRAM、DRAM 或 SSD 分層中擷取已快取的上下文,而非從頭重新計算 prefill。
Before adopting Phison HCI, enterprises deploying private AI must first overcome the following fundamental challenges.在採用 Phison HCI 之前,部署 Private AI 的企業必須先克服以下根本性挑戰。
Full-card or passthrough deployment often reaches only 20–30% GPU utilization, leaving compute underused and hardware ROI extremely low. 整卡或直通部署平均 GPU 使用率僅 20–30%,算力大量閒置,硬體投資 ROI 極低。
GPU HBM often cannot hold the KV cache large models need; long documents and conversations degrade quickly or fail to complete. GPU HBM 常不足以承載大型 KV cache;長文件與長對話場景效能急降或無法完成推論。
From procurement and networking to containers and model launch, traditional flows often take weeks or months, slowing AI innovation. 從採購、網路到容器平台與模型上線,傳統流程常需數週至數月,嚴重拖慢 AI 創新。
Compute, storage, network, containers, and monitoring are managed separately—IT teams operate across five or more systems, raising labor cost and config drift risk. 計算、儲存、網路、容器與監控分散管理,IT 需跨五套以上系統,人力成本高且易組態漂移。
Phison HCI builds on three self-developed technologies — vGPU partitioning and time-sharing, aiDAPTIV Cache Memory tiering from HBM to NVMe, and multi-node tensor/pipeline parallel scale-out — to eliminate GPU idle waste, extend effective memory across the cluster, and run large-model inference at production scale.群聯 HCI 以三項自研技術為基礎——vGPU 切割與分時共享、aiDAPTIV Cache Memory 從 HBM 到 NVMe 的分層快取、以及多節點 Tensor/Pipeline 平行擴展——消除 GPU 閒置、延伸叢集有效記憶體,支撐大型模型量產級推論。
Split a single GPU into vGPU instances with on-demand compute and memory allocation. Multiple models or tenants time-share the same card with QoS isolation, and quotas adjust dynamically at peak load. 單卡切割為 vGPU 實例,按需分配算力與顯存;多模型、多租戶分時共享並確保 QoS 隔離,尖峰期動態調整配額。
aiDAPTIV tiers cache from GPU HBM to aiDAPTIV Cache Memory, expanding effective memory 10×+. KV cache shares across nodes so Prefill results reuse across Decode workloads — supporting 128K+ token inference without OOM. aiDAPTIV Cache Memory 自 GPU HBM 分層至 NVMe SSD,有效記憶體延伸 10 倍以上;KV cache 跨節點共享,Prefill 結果供多 Decode 節點重用,支撐 128K+ tokens 推論。
Tensor Parallel + Pipeline Parallel split 70B–405B models across nodes. Cross-node KV cache sharing cuts inter-node traffic as throughput scales linearly with each node added. Tensor Parallel + Pipeline Parallel 切分 70B–405B 模型;跨節點 KV cache 共享降低跨機通訊,新增節點即可線性提升吞吐量。
Through software-hardware integration and modular design, enterprises can quickly deploy AI workstations, Private AI, AI agents, RAG, AI inference, and Edge AI applications, lowering adoption barriers and accelerating AI implementation.透過軟硬體整合與模組化設計,企業可快速部署 AI 工作站、Private AI、AI 代理、RAG、AI 推論與 Edge AI 應用,降低導入門檻並加速 AI 落地。
User Surfaces 使用者介面
Platform Services 平台服務
AI Platform AI 平台
Service Deployment & Management 服務部署與管理
Compute Resources 運算資源
Benefits 效益
Accelerated Deployment 加速部署
One-click deployment一鍵部署
GPU Optimization GPU 最佳化
vGPU time-slicing & sharingvGPU 時間切片與共享
Multi-Tenant Isolation 多租戶隔離
Resource & permission segregation資源與權限隔離
Full-Stack Observability 全端可觀測性
Real-time monitoring即時監控
Cost analytics成本分析
Phison HCI Software 群聯 HCI 軟體
Phison hyper-converged software unifies heterogeneous GPU, XPU, storage, and VM resources under one control plane — manage Kubernetes, VMs, AI inference, and monitoring in a single console to maximize AI infrastructure ROI without switching tools.群聯超融合軟體將異質 GPU/XPU、儲存與虛擬機資源統整於單一控制平面 — 在單一控制台管理 Kubernetes、虛擬機、AI 推論與監控,整合混合硬體、最大化投資價值,無需切換工具。
Measurable performance and efficiency gains powered by Phison HCI's proprietary technologies.群聯 HCI 自研核心技術帶來可量化的效能與效率提升。
Reduce idle resources and directly lower the per-token inference cost.降低閒置率,直接減少每 Token 推論成本。
Add new nodes to linearly increase throughput without redeploying the model.新增節點即可線性提升吞吐量,無需重新部署模型。
Combined with vGPU partitioning, a single host can serve more concurrent requests simultaneously.結合 vGPU 切割,同台主機可同時服務更多並行請求。
A single GPU can be divided into multiple virtual GPU instances, allowing different workloads — such as training, inference, and batch processing — to share the same card. This eliminates idle GPU waste and enables fine-grained resource scheduling. Phison HCI Platform supports GPU virtualization and resource partitioning, allowing a single GPU to be dynamically allocated to multiple AI tasks or users and preventing GPU idle waste. 單張 GPU 可切割為多個虛擬 GPU 實例,讓訓練、推論、批次處理等不同工作負載共享同一張卡,消除 GPU 閒置並實現精細化資源調度。群聯 HCI 平台支援 GPU 虛擬化與資源切割,可將單卡動態分配給多個 AI 任務或使用者,避免 GPU 閒置浪費。
Maximizes GPU utilization 最大化 GPU 利用率
Lowers AI adoption costs 降低 AI 導入成本
Enables multiple workloads to run in parallel 支援多工作負載並行運行
Supports secure multi-tenant isolation 支援安全的多租戶隔離
Shared AI workstations 共享 AI 工作站
Multi-department AI development 多部門 AI 開發
AI inference service platforms AI 推論服務平台
GPU resource pool management GPU 資源池管理
Phison's self-developed KV cache expansion technology uses high-speed NVMe storage as an extension of GPU HBM. It addresses the context-length limitations of large models and supports shared cache across multiple nodes, significantly reducing GPU VRAM pressure and improving large-model inference efficiency. 群聯自研 KV cache 擴充技術,以高速 NVMe 儲存延伸 GPU HBM,突破大型模型上下文長度限制,支援多節點共享 Cache,顯著降低 GPU 顯存壓力並提升大模型推論效率。
GPU / DRAM / SSD / Remote SSD hierarchical caching architecture GPU / DRAM / SSD / Remote SSD 分層快取架構
Dynamic KV cache expansion 動態 KV cache 擴充
Support for long-context inference 支援長上下文推論
Shared cache resources across multiple nodes 多節點共享快取資源
Improves model inference throughput 提升模型推論吞吐量
Reduces GPU memory bottlenecks 降低 GPU 記憶體瓶頸
Reduces the need to purchase high-end GPUs 降低高階 GPU 採購需求
Improves overall GPU usage rate 提升整體 GPU 使用率