Consolidating disks – unable to access file since it is locked

Had a vm which had flagged up as requiring disk consolidation
Attempting consolidation failed with the error ‘ Unable to access file since it is locked’

The error stack showed the following : msg.fileio.lock

Solution:

Storage vMotion the disk to another data store and reattempt consolidation, this time it has cleared the locks and works. Nice and quick one, although not obvious.
Alternatively you could find the host that has a lock on the file and restart hostd, but depending on the environment this method can be a lot faster.

Module ‘MonitorLoop’ power on failed error – unable to power on vm

Came across this amazingly obtuse error today trying to deploy a new VM
Module ‘MonitorLoop’ power on failed
And was unable to power on the vm. I was surprised to find that despite the data store having some free space, this error actually indicates there is not enough space to inflate the Swap file (being 0k when powered off, and it can’t then extend to the correct size).
You can see this by going to the details of the task and digging into the error stack.

Move the VM to a data store with more space, or increase the free space on the data store and the VM will successfully power on.

vSphere 6.7 Update 3 released!

vSphere 6.7 Update 3 has now been released. Among the new features:

Ability to change vCenter Server PNID/Hostname

PNID (Primary Network IDentifier of vCenter Server) is the host name of a vCenter and 6.7 Update 3 now supports changing this post-deployment. This will be great news to anyone who has had to rename or change the domain of a vCenter on prior releases and found it required a whole rebuild.


Read more about this feature at this blog @ vmware.com

Support for multiple NVIDIA® vGPUs

VMware vSphere 6.7 Update 3 will introduce support for multiple NVIDIA GRID virtual GPUs (vGPU) per virtual machine to enable more graphics and compute intensive workloads running on vSphere. You will be able to configure up to four NVIDIA vGPUs connected to one virtual machine.

AMD EPYC™ Generation 2 support

The vSphere 6.7 U3 is compatible with the 2nd Generation of AMD EPYC™ processors.

Dynamic DNS support

With vSphere 6.7 Update 3, the usage of Dynamic DNS will be supported! vCenter will now support dynamically updating IP information in DNS, another manual job saved.

Driver Enhancements

Enhancements to VMXNET3: Guest encapsulation offload and UDP, and ESP RSS support to the Enhanced Networking Stack (ENS). Checksum calculations are offloaded from encapsulated packets to the virtual device emulation and you can run RSS on UDP and ESP packets on demand. The feature requires a corresponding VMXNET3 v4 driver.

Various driver updates will be shipped with 6.7 Update 3. The ixgben driver adds queue pairing to optimize CPU efficiency. The bnxtnet driver will support Broadcom 100 GbE network adapters and multi-RSS feeds. These are just some highlights, the following drivers are updated:

  • VMware nvme
  • Microchip smartpqi
  • Marvell qlnativefc
  • Broadcom lpfc/brcmfcoe
  • Broadcom lsi_msgpt2
  • Broadcom lsi_msgpt35
  • Broadcom lsi_msgpt3
  • Broadcom lsi_mr3
  • Intel i40en
  • Intel ixgben
  • Cisco nenic
  • Broadcom bnxtnet

More information about this update @vmware.com

Release Notes:
ESXi 6.7 Update 3
vCenter 6.7 Update 3
vSAN 6.7 Update 3

Enable CDP advertising – help the network team help themselves!

A customer of mine requested help in documenting which switch ports were connected to ESXi hosts. Rather than simply documenting this which may get out of date if not maintained, I instead suggested we enable CDP advertising on the vSwitch level, in order for the network team to be able to obtain this information themselves on an ongoing basis.

By default vSwitches come with CDP enabled in listen mode only, being able to detect information about the switches they are connected to but not relaying info about themselves to the switches.

Method

To configure advertising on a standard vSwitch, you SSH onto the host and run the following, changing the vSwitch name for the relevant one:

# esxcli network vswitch standard set -v vSwitch0 -c both

If running distributed switches, you can do this in the GUI of the web console. Select your distributed vSwitch and select Manage > Settings > Properties and click Edit.

Under Discovery Protocol change Operation to Both, and it will both listen for CDP info from the switch and Advertise its own CDP info also.

Unable to delete datastore – filesystem is busy

I had noticed that a customer had been building ESXi hosts but the local datastore on the host was being created as vmfs5 instead of vmfs6.

No problem – just delete the local datastore, assuming no VMs have been built on it, and recreate as vmfs6 right? not so simple. Attempting to delete the datastore threw up this error:


Cannot remove datastore ‘Datastore Name: because file system is busy. Correct the problem and retry the operation.

So whilst this occured due to local datastore being created on the older vmfs version, it could apply to any datastore you need to delete. Here we needed to try and find out what could be writing to the datastore that would affect the ability to delete it. Some things to check:

Dumpfiles

There is likely a dumpfile set up on the host. Run the following command to check

 # esxcli system coredump file list 

If it lists there is a dumpfile configured on the local disk, run this command to turn it off:

 # esxcli system coredump file remove --force

If the datastore being deleted is a shared datastore, run the following command to find the owner of the file:

# vmkfstools -D /vmfs/volumes/Datastore/vmkdump/684938663845.dumpfilevmkfstools -D /vmfs/volumes/Datastore/vmkdump/123456789101.dumpfile

The output will look like:

 Lock [type 10c00001 offset 200392704 v 10, hb offset 3875328  gen 3, mode 1, owner 52ebd042-43b191f0-0173-012345678910 mtime 250 

The last part of that id relates to the mac address of vnic0 of the owning host, e.g 01:23:45:67:89:10

Run the above vmkfstools command to delete.

Once you have deleted the datastore, run the following command to reenable the dump file elsewhere

 esxcli system coredump file add -d datastore_name

Scratch Location

Browse ESXi > Configure > System > Advanced System Settings and find setting ScratchConfig.CurrentScratchLocation  (). If the ESxi host is used as Scratch Location, edit to something like /tmp and reboot the host.

You can then delete the problematic datastore. Remember to go back and change the scratch location to the new datastore.

vSphere 6.7 U1 GA released! Release notes & downloads

VMware have finally bestowed upon us their latest release of vSphere 6.7 – Update 1!
This brings with it some rather welcome new festures and quality-of-life tweaks:

  • Migrate vCenter with Embedded PSC *between* vSphere domains, retaining data such as tags & licences.
  • vCenter can provide relevant links to KB articles
  • Burst filter to protect vCenter from identical alert flooding
  • HTML5 client now fully featured including new simplified workflows for VCHA
  • vCenter converge tool to migrate from external to easier to manage embedded PSCs
  • Provides upgrade path from 6.5 U2 to 6.7

And many more
vCenter Server 6.7 U1 :
Release Notes
Download
ESXi 6.7 U1:
Release Notes
Download
PowerCLI 11:
What’s New
Download

CPU incompatible adding host to cluster after patching for Spectre/Meltdown

Interesting problem here. Customer was adding some new hosts to an existing cluster, but got this error:

Move host into cluster
The host’s CPU hardware should support the cluster’s current Enhanced vMotion Compatibility mode, but some of the necessary CPU features are missing from the host. Check the host’s BIOS configuration to ensure that no necessary features are disabled (such as XD, VT, AES, or PCLMULQDQ for Intel, or NX for AMD).

Untitled picture
Usually, this would be due to different CPU hardware, or CPU features in the UEFI not being enabled to match the existing hosts.
In this case, the hardware and UEFI settings were the same. It was discovered as part of QA testing the new hosts were updated with the current patch level which includes CPU microcode updates for Spectre/Meltdown.
This changes the available CPU features and causes a problem. While you can have hosts with differing patch levels coexist within the same cluster for the purposes of a rolling upgrade (and vcenter will only enable the fixes once all hosts have been updated). You cannot add NEW hosts to a cluster that has this microcode installed until the existing hosts have been updated with it.

SOLUTION

In this instance it was a simple solution: use the host rollback option to revert to the previous build level, which matched the other hosts in the cluster and did not display differing CPU features due to the spectre/meltdown microcode.
Reboot the host, and at the ESXi boot screen, press SHIFT+R
You will be presented with this warning:

Current hypervisor will permanently be replaced
with build: X.X.X-XXXXXX. Are you sure? [y/n]

Press Y to revert to the previous build
You can read more about this process at vmware KB1033604
 
Alternatively, you can fully patch the cluster before adding in the new hosts.
 

ESXi 6.0 hosts 'No host data available’

For months now many vSphere 6.0 users have had no hardware info populated from their ESXi 6.0 hosts
hwinfo
The good news now is this has a patch. ESXi-6.0.0-20180704001-standard contains the following little nugget of goodness in the patch notes:

  • System identification information consists of asset tags, service tags, and OEM strings. In earlier releases, this information comes from the Common Information Model (CIM) service, but in ESXi600-201807401, it comes directly from the SMBIOS.
If you have been suffering this bug for many months this should be included in your next patching round.
 
Drop a comment if this resolves your issue