Authors:

Eli Mesika

Gilad Chaplik

Feature pages are design documents that developers have created while collaborating on oVirt.

Most of them are outdated, but provide historical design context.

They are not user documentation and should not be treated as such.

Documentation is available here.

Detailed PM Health Check

Power Management Health Check

Summary

The requirement is to add a periodic health check of all Hosts with configured PM The scheduled job will try to send a status command to all PM enabled hosts periodically (once an-hour by default) and raise alerts for failed operations

Owner

Feature owner: Eli Mesika (emesika) Engine Component owner: Eli Mesika (emesika) QA Owner: Pavel Stehlik (pstehlik) Email: emesika@redhat.com

Current status

Target Release: 3.5
Status: Design
Last updated date: MAY 3 2014

Detailed Description

Add a class PmHealtCheckManager to handle the scheduled check This class will

    Read the related configuration values(see Configuration) and if feature is enabled reads the
    PMHealtCheckIntervalInSec  configurationvariable.
    Create the Quartz job in it initialize() method which will be called from backend::initialize()

CRUD

N/A

DAO

N/A

Metadata

N/A

Configuration

The following configuration variabled will be added to vdc_options

    PMHealthCheckEnabled (boolean, false by default) - Enable/Diable the Pm Health Check scheduled job
    PMHealthCheckIntervalInSec (int, default 3600) - Determines the number of seconds for scheduling the PM Healt Check operation

Those configuration value should be exposed to the engine-config tool.

Business Logic

The PmHealtCheckManager (if enabled) will create a Quartz job that runs each PmHealtCheckIntervalInSec and will do the following:

    Search for all Hosts with defined and enabled power management
    For each Host
        If the Host has just a Primary card, send a status command to this card, In case that this failed
        and Alert is generated, in case that it succeeded we check if there is an active alert for this host
        and remove it.
        If the Host has Primary & Secondary cards
           For sequential devices, both are tested but only warning alerts are generated if one of those
           cards is OK and one fails
           For concurrent devices both are tested and alert is generated no matter which card fails

API

N/A

User Experience

N/A

Installation/Upgrade

New configuration values will be installed (see Configuration)

User work-flows

User may see Alerts generated by the PM Healt Check job listed with other PM alerts generated by the system. In 3.5, the user may be able to clear those alerts as any other alerts on the system See Dismiss Alerts

Enforcement

Code should verify that PM Health Check cycle can not be run while another cycle is active, this is due the fact that in a general elecricity failure or shutdown, looping over all hosts and waiting for the communication timeouts may be time consuming

Affected oVirt projects

See RFE

Documentation / External references

Features/PMHealthCheck

Future Directions

N/A

Open Issues

N/A