vRealize Operations Manager 7.0 now available for download!

vRops 7.0 has dropped and is now available for download.  Some cherry-picked new features are:

  • Enhanced User Interface

The new release will make vRealize Operations even simpler to use, featuring an updated use case and persona-based ‘Quick Start’ dashboard.

  • Simplified Dashboard Creation and Sharing

In vRealize Operations 7.0, we’ve simplified the dashboard creation process, adding an intuitive canvas and multiple out of the box widgets to improve the user experience. Dashboard sharing and embedding will also become easier using smart links without requiring login, improving the cross-team collaboration and reporting for users.

  • Workload Right-sizing to avoid performance bottlenecks and reclaim over-allocated resources
  • Built-in vSphere config & compliance​: PCI, HIPAA, DISA, FISMA, ISO, CIS
  • Ability to extend to the entire data center and cloud with updated management packs for Storage, vRO, Kubernetes, Federation etc.
  • vSAN performance, capacity, and troubleshooting including support for stretched clusters and through vRealize Operations plug-in in vCenter

Looks as if 7.0 should finally resolve some of the bizarre omissions in that car-crash of a release 6.7
Time to get labbing!
vRops 7.0 What’s New – at vmware.com
vRops 7.0 Download – at vmware.com
 

Poor performance of highly specced VMs – vNUMA!

The scenario:

Database server with 10 vCPU and 192 GB ram
Physical host with 2 sockets of 10 cores, 265GB RAM total
Customer was reporting poor performance of their database server, an initial browse of the configuration seemed to show it had been configured well. Multiple paravirtual scsi adapters, enough CPU/RAM for the workload, storage performing well etc. Yet the CPU was almost consistently at 100% and adding more cores did not help.
Eventually spotted part of the problem – CPU Hot Add was enabled. Whilst this is a useful feature for smaller VMs so they can start small and grow as the workload grows, for VMS over 8 vCPUs this disables vNUMA and can lead to poor performance.

Why this is a problem

vNUMA is a technology which was introduced in vSphere 5.5 and improved upon in 6.5. It allows presenting an optimal NUMA layout available to the guest OS. This means the OS is able to optimally place data in memory in its local physical NUMA node, which is faster and more optimal than using memory on the physical processor that the vm is not scheduled on.

The Catch

vNUMA only calculates on CPU capacity. This is fine if you are using less memory than is attached to a single processor in your host, as the vNUMA presentation will be correct.
So in our example host above, a 10 vCPU VM with 128GB ram would be presented a single vNUMA node, as it fits within a single physical numa node. If we were to increase that to 12 vCPUs, it would present 2 vNUMA nodes as it crosses the amount of cores on one physical processor. As the memory is otherwise within the bounds of the physical numa boundary on one processor, we do not have to worry.
However, our database server has 10 cores and 196 GB RAM. Simply disabling CPU Hot Add and allowing vNUMA to take over will present 1 vNUMA node on the basis of the VM having 10 vCPUs. As vNUMA hasn’t taken memory into account, in this instance it has more memory than is in a physical NUMA node (128 GB). 1 vNUMA node in this instance performs poorly due to the total memory crossing two NUMA nodes.

The Solution

In this instance the solution is simple. Configure the vCPU socket layout to match your physical CPU socket layout. Here, we configure 2 sockets of 5 vCPUs which present 2 NUMA nodes to the guest os.
With this configuration, thanks to more optimised memory usage and using the same available resources, CPU usage dropped from 100% constant to around 70-80%. A nice saving without having to allocate more!

Takeaways:

  • On large (>8 vCPU) VMs – don’t enable CPU Hot Add
  • On VMs with more memory than a single numa node or more vCPU cores than on a single processor – manually set your vCPU socket configuration to match the number of sockets in your host system.

CPU incompatible adding host to cluster after patching for Spectre/Meltdown

Interesting problem here. Customer was adding some new hosts to an existing cluster, but got this error:

Move host into cluster
The host’s CPU hardware should support the cluster’s current Enhanced vMotion Compatibility mode, but some of the necessary CPU features are missing from the host. Check the host’s BIOS configuration to ensure that no necessary features are disabled (such as XD, VT, AES, or PCLMULQDQ for Intel, or NX for AMD).

Untitled picture
Usually, this would be due to different CPU hardware, or CPU features in the UEFI not being enabled to match the existing hosts.
In this case, the hardware and UEFI settings were the same. It was discovered as part of QA testing the new hosts were updated with the current patch level which includes CPU microcode updates for Spectre/Meltdown.
This changes the available CPU features and causes a problem. While you can have hosts with differing patch levels coexist within the same cluster for the purposes of a rolling upgrade (and vcenter will only enable the fixes once all hosts have been updated). You cannot add NEW hosts to a cluster that has this microcode installed until the existing hosts have been updated with it.

SOLUTION

In this instance it was a simple solution: use the host rollback option to revert to the previous build level, which matched the other hosts in the cluster and did not display differing CPU features due to the spectre/meltdown microcode.
Reboot the host, and at the ESXi boot screen, press SHIFT+R
You will be presented with this warning:

Current hypervisor will permanently be replaced
with build: X.X.X-XXXXXX. Are you sure? [y/n]

Press Y to revert to the previous build
You can read more about this process at vmware KB1033604
 
Alternatively, you can fully patch the cluster before adding in the new hosts.
 

ESXi 6.0 hosts 'No host data available’

For months now many vSphere 6.0 users have had no hardware info populated from their ESXi 6.0 hosts
hwinfo
The good news now is this has a patch. ESXi-6.0.0-20180704001-standard contains the following little nugget of goodness in the patch notes:

  • System identification information consists of asset tags, service tags, and OEM strings. In earlier releases, this information comes from the Common Information Model (CIM) service, but in ESXi600-201807401, it comes directly from the SMBIOS.
If you have been suffering this bug for many months this should be included in your next patching round.
 
Drop a comment if this resolves your issue

PSA: vSphere 5.5 end of life imminent!


Quick reminder that vSphere 5.5 reaches end-of-life this month – September 19th.
That old workhorse of a platform that many customers are still running for a plethora of reasons is finally reaching end of life. That means no more updates, and no more support from VMware. You will be limited to the self help portal and VMware does not offer new hardware support, server/client/guest OS updates, new security patches or bug fixes.
 

What are my options?

Quite simple here. You need to upgrade. There is no direct path to the latest and greatest vSphere 6.7, so you will need to upgrade to either 6.0 or 6.5 first.
Even if your 5.5 era hardware will not support vSphere 6.5, I would recommend you at least update vCenter to 6.5 even if you can only uplift your old hardware to ESXi 6.0. Not least so you have access to the beautiful vCenter HTML5 Client. Uplevel vCenter is always compatible with downlevel ESXi.
Further reading: VMware Upgrade Center

ESXi Hosts Losing Connectivity to VMFS Datastores

A customers environment was losing access to VMFS datastores on a regular basis.
Events in the log showed this:

Lost access to volume xxx due to connectivity issues. Recovery attempt is in progress and outcome will be reported shortly.

storage
Further info can be gleaned by checking the hostd log. SSH onto the host and use this command, which showed constant connects and disconnects as they were written to the log.

tail -f /var/log/hostd.log | grep "'Vimsvc.ha-eventmgr'"

A look into the vmkernel.log showed that locks were being generated
 

2018-07-10T01:10:27.499Z cpu32:33604)NMP: nmp_ThrottleLogForDevice:3333: Cmd 0x2a (0x43be1b769a40, 38961) to dev "naa.60050768010000002000000000012345" on path "vmhba1:C0:T4:L2" Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0. Act:NONE
2018-07-10T01:10:27.499Z cpu32:33604)ScsiDeviceIO: 2613: Cmd(0x43be1b769a40) 0x2a, CmdSN 0x8000005e from world 38961 to dev "naa.60050768010000002000000000012345" failed H:0x0 D:0x2 P:0x0 Valid sense data: 0x6 0x29 0x0.
2018-07-10T01:10:27.499Z cpu32:33604)ScsiCore: 1609: Power-on Reset occurred on naa.60050768010000002000000000012345
2018-07-10T01:10:35.036Z cpu25:32848)ScsiDeviceIO: 2595: Cmd(0x43be15960ac0) 0x2a, CmdSN 0x8000000f from world 38961 to dev "naa.60050768010000002000000000012345" failed H:0x8 D:0x0 P:0x0
2018-07-10T01:10:35.036Z cpu25:32848)ScsiDeviceIO: 2595: Cmd(0x43be197f48c0) 0x2a, CmdSN 0xfffffa8002110f80 from world 40502 to dev "naa.60050768010000002000000000012345" failed H:0x8 D:0x0 P:0x0
2018-07-10T01:10:35.036Z cpu25:32848)ScsiDeviceIO: 2595: Cmd(0x43be19c300c0) 0x2a, CmdSN 0xfffffa8002089940 from world 40502 to dev "naa.60050768010000002000000000012345" failed H:0x8 D:0x0 P:0x0
2018-07-10T01:10:35.036Z cpu25:32848)ScsiDeviceIO: 2595: Cmd(0x43be1b4f6340) 0x2a, CmdSN 0x8000007d from world 38961 to dev "naa.60050768010000002000000000012345" failed H:0x8 D:0x0 P:0x0
2018-07-10T01:10:35.036Z cpu25:32848)ScsiDeviceIO: 2595: Cmd(0x43be1b75a300) 0x2a, CmdSN 0x1aa2c from world 32814 to dev "naa.60050768010000002000000000012345" failed H:0x8 D:0x0 P:0x0
2018-07-10T01:10:37.844Z cpu21:32874)HBX: 283: 'DS1234': HB at offset 4075520 - Reclaimed heartbeat [Timeout]:
2018-07-10T01:10:37.844Z cpu21:32874)  [HB state abcdef02 offset 4075520 gen 29 stampUS 3975149635 uuid 5b43f84b-2341c5c8-32b6-90e2baf4630c jrnl  drv 14.61 lockImpl 3]
2018-07-10T01:10:37.847Z cpu21:32874)FS3Misc: 1759: Long VMFS rsv time on 'DS1234' (held for 2714 msecs). # R: 1, # W: 1 bytesXfer: 2 sectors
2018-07-10T01:12:12.584Z cpu8:38859)etherswitch: L2Sec_EnforcePortCompliance:152: client APP1421.eth0 requested promiscuous mode on port 0x6000006, disallowed by vswitch policy
2018-07-10T01:12:12.584Z cpu8:38859)etherswitch: L2Sec_EnforcePortCompliance:152: client APP1421.eth0 requested promiscuous mode on port 0x6000006, disallowed by vswitch policy
2018-07-10T01:24:08.185Z cpu39:35449 opID=9b64b9f9)World: 15554: VC opID 8de99fb1-d856-4a71-ab08-5501dfffc500-7011-ngc-d5-67-78e8 maps to vmkernel opID 9b64b9f9
2018-07-10T01:24:08.185Z cpu39:35449 opID=9b64b9f9)DLX: 3876: vol 'DS1234', lock at 63432704: [Req mode 1] Checking livenes

Solution

It turned out that this occured when some new hosts were added to the cluster which did not have ATS disabled. The existing hosts had ATS disabled due to a prior storage incompatibility (since resolved but the hosts never had it reenabled), and the host profiles for the new hosts did not have ATS disabled due to the storage no longer suffering the incompatibility.
In this situation, we could enable ATS on the old hosts now that the storage supported it:

# esxcli system settings advanced set -i 1 -o /VMFS3/UseATSForHBOnVMFS5

Or if preferred, disable ATS on the new hosts to match the settings on the old ones

esxcli system settings advanced set -i 0 -o /VMFS3/UseATSForHBOnVMFS5

 
Just ensure all hosts in the cluster are using the same ATS settings!
Further reading: KB2113956

Yet another virtualisation Blog

Thanks for joining me!
I started this blog primarily as I found in my meandering through the world of virtualisation as a Consultant the the most valuable resources i’ve found have oftentimes been the blogs of other kind souls who have documented their own trials and tribulations for the benefit of all of us. To that end I discovered that there have been many problems I have encountered which have not been quite so well documented, so this is my attempt to give a little back to the community.
I am an IT consultant with over 13 years in the industry. Focussed around all things virtualisation, servers, storage & cloud.
 

Good company in a journey makes the way seem shorter. — Izaak Walton

post