Management Gridview HPC Suite

Gridview HPC Suite is an integrated monitoring, management and job scheduling software platform for HPC. Gridview is designed with several pluggable function modules. It could dynamically monitor the overall and detailed status of both computing center and cluster, provide comprehensive cluster management, real-time and historical alerting, and powerful job scheduling.

Everything

Under Control

Function modules

Computing Center Visualization

  • Interactive 3D visualization of operation status Precise problem alerting and locating

Cluster Monitoring

  • Comprehensive monitoring of rack, chassis, server, storage, switch, etc. with hundreds of monitoring items

Performance Analysis

  • With heat map to intuitively display node status and performance Easy to find performance bottlenecks as well as idle resources All historical data can be exported or analyzed onlin

Asset Management

  • Unified management of all assets with auto-discovery and visualization Flexible classification according to asset type or location

Cluster Management

  • With plentiful cluster management tools Supports rapid system batch deployment and “one-click” cluster configuration

Alert Management

  • Alerts in physical view. Notifications by email, SMS, etc. Flexible alert policies and historical alert analysis

Gridview offers powerful and flexible scheduling policies, fault tolerance, easy-to-use application Web Portals and clear accounting system. All these features greatly improving management efficiency and utilization of high performance computers.

Integrated Job

Submitting, Scheduling

and Management

Scheduling

  • Job scheduling policies: fair share, reservation, backfill, multilevel preemption, dynamic job priority, exclusive mode, GPU/MIC support, etc.e
  • Checkpointing/restart, suspend/resume
  • Residual and illegal process killing

Accounting

  • Flexible rate setting for user, group, queue, CPU, memory, software license, etc.
  • Prepaid and postpaid mode
  • Online bill which can be exported to other formats

Workflow

  • Real-time monitoring of jobs and resources
  • User-defined workflow with job order and dependence
  • Monitoring and control of workflow, such as terminate, suspend, delete, rerun, etc.

Web Portal

  • Application deployment, publishing and subscription
  • Dozens of predefined application portal including ABAQUS, ANSYS, CFX, Fluent, LS-DYNA, etc.
  • Customization and API support