Most of them are outdated, but provide historical design context.
They are not user documentation and should not be treated as such.
Documentation is available here.
NUMA and Virtual NUMA
Summary
This feature allow Enterprise customers to provision large guests for their traditional scale-up enterprise workloads and expect low overhead due to visualization.
- Query target host’s NUMA topology
- NUMA bindings of guest resources (vCPUs & memory)
- Virtual NUMA topology
You may also refer to the simple feature page.
Owner
- Name: Jason Liao (JasonLiao), Bruce Shi (BruceShi)
- Email: chuan.liao@hp.com, xiao-lei.shi@hp.com
- IRC: jasonliao, bruceshi @ #ovirt (irc.oftc.net)
Current status
- Target Release: oVirt 3.5
- Status: design
- Last updated: 25 Mar 2014
This is the detailed design page for NUMA and Virtual NUMA
Data flow diagram
Interface & data structure
Interface between VDSM and libvirt
- I-1.1 Host’s NUMA node index and CPU id of each NUMA node
- I-1.2 Host’s NUMA node memory information, include total and free memory
- I-1.5 Configuration of VM’s memory allocation mode and memory comes from which NUMA nodes
- I-1.6 Configuration of VM’s virtual NUMA topology
- I-1.1 Seek the host NUMA nodes information by using
getCapabilities
API in libvirt<capabilities> … <host> ... <topology> <cells num='1'> <cell id='0'> <cpus num='2'> <cpu id='0'/> <cpu id='1'/> </cpus> </cell> </cells> </topology> … </host> … </capabilities>
- I-1.2 Seek the host NUMA nodes memory information by using
getMemoryStats
API in libvirt, the below is the data format of API returned value{ total: int, free: int }
- I-1.5 Create a new function
appendNumaTune
in VDSM vm module to write the VM numatune configuration into libvirt domain xml follow the below format<domain> ... <numatune> <memory mode='interleave' nodeset='0-1'/> </numatune> … </domain>
- I-1.6 Modify function
appendCpu
in VDSM vm module to write the VM virtual NUMA topology configuration into libvirt domain xml follow the below format<cpu> ... <numa> <cell cpus='0-7' memory='10485760'/> <cell cpus='8-15' memory='10485760'/> </numa> ... </cpu>
Interface between VDSM and Host
- I-1.3 Statistics data of each host CPU core which include %usr (%usr+%nice), %sys and %idle.
- I-1.4 Data structure to be provided to MOM component
- I-1.7 NUMA distances capture from command
- I-1.8 Automatic NUMA balancing on host
- I-1.3 Sampling host CPU statistics data in
/proc/stat
, the whole data format is showing as below. We will use column 1 to 5 which include user, system, nice and idle CPU handlers to calculate CPU statistics data in engine side$ cat /proc/stat cpu 268492078 16093 132943706 6545294629 19023496 898 138160 0 57789592 cpu0 62042038 3012 52198814 1638619972 2438624 4 12068 0 16721375 cpu1 62779520 2733 25830756 1647361083 6001324 1 34617 0 16341547 cpu2 77892630 5788 32963856 1610093241 8367287 889 80447 0 8205583 cpu3 65777888 4559 21950279 1649220333 2216260 4 11027 0 16521086
- I-1.4 Data structure that provided to MOM component
MOM use the VDSM HypervisorInterface using API.py
Global.getCapabilities
function to get host NUMA topology data
'autoNumaBalancing': int
'numaNodeDistance': {'<nodeIndex>': [int], ...}
'numaNodes': {'<nodeIndex>': {'cpus': [int], 'totalMemory': 'str'}, …}
using API.py
Global.getStats
function to get host NUMA statistics data
'numaNodeMemFree': {'<nodeIndex>': {'memFree': 'str', 'memPercent': int}, …}
'cpuStatistics': {'<cpuId>': {'nodeIndex': int, 'cpuSys': 'str', 'cpuIdle': 'str', 'cpuUser': 'str'}, …}
- I-1.7 libivirt API do not support to get NUMA distances information, so we use command
numactl
to get the distances information$ numactl -H node distances: node 0 1 0: 10 20 1: 20 10
- I-1.8 In kernels who having Automatic NUMA balancing feature, use command
sysctl -a |grep numa_balancing
to check the Automatic NUMA balancing value is turn on or off$ sysctl -a | grep numa_balancing kernel.numa_balancing = 1
Interface between VDSM and engine core
- I-2.1 Report host support automatic NUMA balancing situation, NUMA node distances, NUMA nodes information, include NUMA node index, cpu ids and total memory, from VDSM to engine core
- I-2.2 Report host NUMA nodes memory information (free memory and used memory percentage) and each cpu statistics (system, idle, user cpu percentage) from VDSM to engine core
- I-2.3 Configuration of set VM’s numatune and virtual NUMA topology from engine core to VDSM
- I-2.1 Transfer data format of host NUMA nodes information
'autoNumaBalancing': int 'numaDistances': {'<nodeIndex>': [int], ...} 'numaNodes': {'<nodeIndex>': {'cpus': [int], 'totalMemory': 'str'}, …}
- I-2.2 Transfer data format of host CPU statistics and NUMA nodes memory information
'numaNodeMemFree': {'<nodeIndex>': {'memFree': 'str', 'memPercent': int}, …} 'cpuStatistics': {'<cpuId>': {'numaNodeIndex': int, 'cpuSys': 'str', 'cpuIdle': 'str', 'cpuUser': 'str'}, …}
- I-2.3 Transfer data format of set VM numatune and virtual NUMA topology
'numaTune': {'mode': 'str', 'nodeset': 'str'} 'guestNumaNodes': [{'cpus': 'str', 'memory': 'str'}, …]
Interface between engine core and database (schema)
- I-3.1 Schema modification of
vds_dynamic
table to include host’s NUMA node count and automatic NUMA balancing status. - I-3.2 Add table
vds_cpu_statistics
to include host cpu statistics information (system, user, idle cpu time and used cpu percentage). - I-3.3 Schema modification of
vm_static
table to include numatune mode configuration and virtual NUMA node count. - I-3.4 Add table
numa_node
to include host/vm NUMA node information (node index, total memory, cpu count of each node) and statistics information (system, user, idle cpu time, used cpu percentage, free memory and used memory percentage). - I-3.5 Add table
vm_vds_numa_node_map
to include the configuration of vm virtual NUMA nodes pinning to host NUMA nodes (this is a nested relationship table, store the map relations between vm NUMA nodes and host NUMA nodes which are all in table numa_node). - I-3.6 Add table
numa_node_cpu_map
to include the cpu information that each host/vm NUMA node contains. - I-3.7 Add table
numa_node_distance
to include the distance information between the NUMA nodes.
The above interfaces are defined with database design diagram
- Related database scripts change:
- Add
numa_sp.sql
to include the store procedures which handle the operations in tablenuma_node
,numa_node_cpu_map
,vm_vds_numa_node_map
andnuma_node_distance
. It will provide the store procedures to insert, update and delete data and kinds of query functions. - Modify
vds_sp.sql
to add some store procedures which handle the operations in tablevds_cpu_statistics
, including insert, update, delete and kinds of query functions. - Modify the function of
InsertVdsDynamic
,UpdateVdsDynamic
invds_sp.sql
to add new columnsauto_numa_banlancing
andvds_numa_node_count
. - Modify the function of
InsertVmStatic
,UpdateVmStatic
invms_sp.sql
to add two new columnsnumatune_mode
andvm_numa_node_count
. - Modify
create_views.sql
to add new columnsnumatune_mode
andvm_numa_node_count
in viewvms
andvms_with_tags
; add new columnsauto_numa_banlancing
andvds_numa_node_count
in viewvds
andvds_with_tags
. - Modify
create_views.sql
to add new views, including viewvds_numa_node_view
which joinsvds_dynamic
andnuma_node
; viewvm_numa_node_view
which joinsvm_static
andnuma_node
. - Modify
upgrade/post_upgrade/0010_add_object_column_white_list_table.sql
to add new columnsauto_numa_banlancing
andvds_numa_node_count
. - Add one script under
upgrade/
to create tables -numa_node
,vds_cpu_statistics
,vm_vds_numa_node_map
,numa_node_cpu_map
,numa_node_distance
and add columns in tablevds_dynamic
andvm_static
. - Create the following indexes:
- Index on column
vm_or_vds_guid
of tablenuma_node
- Index on column
vds_id
of tablevds_cpu_statistics
- Index on column
numa_node_id
of tablenuma_node_cpu_map
- Index on column
numa_node_id
of tablenuma_node_distance
- Indexes on each of the columns
vm_numa_node_id
andvds_numa_node_id
of tablevm_vds_numa_node_map
- Index on column
- Add
- Related DAO change:
- Add
NumaNodeDAO
and related implemention to provide data save, update, delete and kinds of queries in tablenuma_node
,numa_node_cpu_map
,vm_vds_numa_node_map
andnuma_node_distance
. AddNumaNodeDAOTest
forNumaNodeDAO
meanwhile. - Add
VdsCpuStatisticsDao
and related implementation to provide data save, update, delete and kinds of queries in tablevds_cpu_statistics
. AddVdsCpuStatisticsDAOTest
forVdsCpuStatisticsDAO
meanwhile. - Modify
VdsDynamicDAODbFacadeImpl
andVdsDAODbFacadeImpl
to add the map of new columnsauto_numa_banlancing
andvds_numa_node_count
. RunVdsDynamicDAOTest
to verify the modification. - Modify
VmStaticDAODbFacadeImpl
andVmDAODbFacadeImpl
to add the map of new columnsnumatune_mode
andnuma_node_count
. RunVmStaticDAOTest
to verify the modification.
- Add
- Related search engine change
Currently, we plan to provide below search functions about NUMA feature, each field support the numeric relation of “>”, “<”, “>=”, “<=”, “=”, “!=”.
- Search hosts with the below NUMA related fields:
- NUMA node number
- NUMA node cpu count
- NUMA node total memory
- NUMA node memory usage
- NUMA node cpu usage
- Search vms with the below NUMA related fields:
- NUMA tune mode
- Virtual NUMA node number
- Virtual NUMA node vcpu count
- Virtual NUMA node total memory
NUMA tune mode support enum value relation, the others support the numeric relation.
We will do the following modifications:
- Modify
org.ovirt.engine.core.searchbackend.SearchObjects
to add new entry NUMANODES. - Add
org.ovirt.engine.core.searchbackend.NumaNodeConditionFieldAutoCompleter
to provide NUMA node related filters auto completion; - Modify
org.ovirt.engine.core.searchbackend.SearchObjectAutoCompleter
to add new joins, one is HOST joins NUMANODES on vds_id, the other is VM joins NUMANODES on vm_guid. - Add new entries in entitySearchInfo accordingly. NUMANODES will use new added view vds_numa_node_view and view vm_numa_node_view.
- Modify
org.ovirt.engine.core.searchbackend.VdsCrossRefAutoCompleter
to add auto complete entry NUMANODES.
- Cascade-delete
- When user remove a virtual NUMA node, the related rows in table
numa_node_cpu_map
,vm_vds_numa_node_map
,numa_node_distance
(maybe in future, currently no distance information for virtual NUMA node) andnuma_node
should be removed meanwhile. - When user remove a vm, all the virtual NUMA nodes of this vm should be removed, follow above item to do the cascade-delete.
- When user remove a host, the related rows in table
numa_node_cpu_map
,vm_vds_numa_node_map
,numa_node_distance
,numa_node
andvds_cpu_statistics
should be removed meanwhile.
- When user remove a virtual NUMA node, the related rows in table
Interface and data structure in engine core
- Entities
VDS
has manyVdsNumaNode
objects in dynamic data (collect from vds capatibility)VdsNumaNode
is core entity for host NUMA topology, it links one statistics objectVdsNumaNodeStatistics
which contains some real-time data (free memory, NUMA node cpu usage etc.)VM
has manyVmNumaNode
object in dynamic data (configured by user)VmNumaNode
is core entity for VM NUMA topology.NumaTuneMode
is the memory tune mode (configured by user).VdsNumaNode
has one-to-many relationship withVmNumaNode
.VdsNumaNode.cpuIds
links withCpuStatistics.cpuId
to take a look inside NUMA node each CPU usage
- Action & Query
GetVdsNumaNodeByVdsId, GetVmNumaNodeByVmId, GetVmNumaNodeByVdsNumaNodeId, GetCpuStatsByVdsId
use same parametersIdQueryParameters
AddVmNumaNode, UpdateVmNuamNode, RemoveVmNuamNode
use same parametersVmNumaNodeParameters
to manage Virtual NUMA node in VMSetNumaTuneMode
use parametersNumaTuneModeParameters
to set the NUMA tuning mode for VMGetVdsNumaNodeByVdsId
will returnList<VdsNumaNode>
GetVmNumaNodeByVmId, GetVmNumaNodeByVdsNumaNodeId
will returnList<VmNumaNode>
GetVmNumaNodeByVdsNumaNodeId
will query theVmNumaNode
s under theVdsNumaNode
GetCpuStatsByVdsId
will returnList<CpuStatistics>
- When
VmNumaNodeParameters.vdsNumaNodeId
is set to null, theVmNumaNode
is unsigned.
Interface and data structure in ovirt scheduler
Add NUMA filter and weight module to oVirt’s scheduler, and add those to all cluster policies (inc. user defined).
- NUMA Filter
- Fetches the (scheduled) VM virutal NUMA nodes.
- Fetches all virtual NUMA nodes topology ( CPU count, total memory ).
- Fetches all hosts NUMA nodes topology ( CPU count, total memory ).
- Remove all hosts that doesn’t meet the matched NUMA nodes topology
- for positive, host NUMA node’s CPU count > virtual NUMA node’s CPU count
- for positive, host NUMA node’s total memory > virtual NUMA node’s total memory
- NUMA Weight Module
- Fetches the (scheduled) VM virutal NUMA nodes.
- Fetches all virtual NUMA nodes topology ( CPU count, total memory, NUMA distance ).
- Fetches all hosts NUMA nodes topology and statistics ( CPU usage, free memory ).
- Score the hosts according to each NUMA nodes score
- for positive, in case a VM of the group is running on a certain host, give all other hosts a higher weight.
- for positive, give the host higher weight if the host NUMA node’s CPU usage use up.
- for positive, give the host higher weight if the host NUMA node’s memory use up.
Scheduler generate virtual NUMA topology To be continue …
Interface and data structure in restful API
host NUMA sub-collection
/api/hosts/{host:id}/numanodes/
- Supported actions - GET returns a list of host NUMA nodes. (using query GetVdsNumaNodeByVdsId)
host NUMA resource
/api/hosts/{host:id}/numanodes/{numa:id}
- Supported actions
- GET returns a specific NUMA node information: CPU list, total memory, map of distance with other nodes. (using VdsNumaNode properties)
host NUMA statistics
/api/hosts/{host:id}/numanodes/{numa:id}/statistics
- Supported actions
- GET returns a specific NUMA node statistics data: CPU usage, free memory. (using VdsNumaNode property NumaNodeStatistics)
vm virtual NUMA sub-collection
/api/vms/{vm:id}/numanodes
- Supported actions:
- GET returns a list of VM virtual NUMA nodes. (using query GetVmNumaNodeByVmId)
- POST attach a new virtual NUMA node on VM. (using action AddVmNumaNode)
vm virtual NUMA resource
/api/vms/{vm:id}/numanodes/{vnuma:id}
- Supported actions:
- GET returns a specific virtual NUMA node information, CPU list, total memory, pin to host NUMA nodes. (using VmNumaNode properties)
- *PUT a virtual NUMA node configured on the VM. (using action UpdateVmNumaNode)
- DELETE removes a virtual NUMA node from the VM. (using action DeleteVmNumaNode)