5 Awesome Tips to Troubleshoot vMotion

 

 

 

 

In Why IT Admins Adore vMotion, I described the main types of vSphere live migrations; vMotion, Storage vMotion and Enhanced vMotion. In this post, I’ll go over 5 tips to help you troubleshoot vMotion issues on the fly.

 

How does vMotion work?


A vMotion event can be initiated by a user, by vCenter Server itself or via an API call. There are 8 stages to a vMotion event as illustrated in Fig. 1.

Figure 1 - A vMotion task lifecycle

Figure 1 – A vMotion task lifecycle

 

Although the above steps are self-explanatory, there’s much more going under the hood. Let’s just say that vMotion is a marvelous piece of engineering. Here’s a video by VMware explaining vMotion and how to use it.

 

How to fix and prevent vMotion issues


If your environment is not adequately prepared for vMotion, you could come across issues at any stage of a vMotion task. To being with, revisit the requirements and make sure your environment is compliant

This KB article provides a complete list. I picked a salient few as described next.

 

1 – Is the vMotion VMkernel option enabled?

Every vMotion capable host must by definition have the vMotion option enabled for at least one VMkernel adapter otherwise a VM will simply fail to migrate to any host where the option has not been enabled.

In the following example (Fig. 3), I disabled vMotion from the VMkernel on one of my hosts. If I try to migrate a VM to this host, the vMotion task immediately fails.

Figure 2 - The error returned when a VM is migrated to a host not set up for vMotion

Figure 2 – The error returned when a VM is migrated to a host that is not set up for vMotion

 

Of course, this is just for illustrative purposes but given large environments, you can easily overlook enabling the setting on one or more hosts unless you’re automating or monitoring the process.

vMotion troubleshoot

Figure 3 – Enabling the vMotion option on a VMkernel adapter

 

2 – Check your advanced settings

Another common error you might come across is the infamous vMotion fails at 10% (1013150). Although you’re more likely to see this on older versions of ESXi / vCenter Server, I did come across it a couple of times when using vSphere 6.5. Regardless, if you’re experiencing vMotioning issues, make sure that the migrate.enabled advanced setting is set to 1 on all ESXi hosts.

Backup software will sometimes set this value to 0 (disabled) to ensure that backup jobs complete successfully by preventing a VM that is being backed up from vMotioning to another host. As it sometimes happens, an unplanned network or storage outage may prevent the Backup software from rolling the setting back to its original value leaving the VM in vMotion limbo.

Figure 4 - Ensuring the migrate.enabled advanced setting is set to 1

Figure 4 – Ensuring the migrate.enabled advanced setting is set to 1

 

3 – Run network diagnostics at the VMkernel level

VMkernel network connectivity can be another problem area resulting in timeouts or outright failed migrations. When testing network connectivity between hosts, you will want to test from a VMkernel perspective more so if your hosts are running multiple VMkernel adapters.

There are a couple of ESXi shell commands that help you do this. These are vmkping and nc. Vmkping uses a VMkernel’s TCP/IP stack to send ICMP traffic to a destination host as opposed to using the host’s physical interface TCP/IP stack which is something a normal ping utility would do.

In my environment, I’ve set all my hosts to use vmk0 for vMotion. As per the next example, I am verifying that host 192.168.16.69 is reachable from all the VMkernels adapters on the ESXi host where I’m running the command from. The -I parameter is used to tell vmkping which VMkernel should be used to test network connectivity.

Additionally, I’ve used the nc (netcat) command to ensure that the source host can connect to the vMotion network port (8000) on the destination host.

Figure 5 - Performing network diagnostics using vmkping and nc from ESXi's shell

Figure 5 – Performing network diagnostics using vmkping and nc from ESXi’s shell

 

4 – Dismount unused ISOs

If a virtual machine has a mounted ISO image residing on storage not accessible by the ESXi host where you want the VM migrated to, vMotion will fail with the following remote backing (1003780) error.

Figure 6 - The error returned when you try and migrate a VM with a mounted ISO image

Figure 6 – The error returned when you try and migrate a VM with a mounted ISO image

 

This is another common issue and an easy one to fix. Simply unmount the image from the CD/DVD drive from the VM’s settings as shown next.

Figure 7 - Unmounting an ISO image from a VM to allow it to migrate

Figure 7 – Unmounting an ISO image from a VM to allow it to migrate

 

Alternatively, untick the Connected option next to the CD/DVD device and answer Yes to the eject question.

Figure 8 - Confirming a user initiated media disconnect

Figure 8 – Confirming a user initiated media disconnect

 

5 – Is your time in sync?

Ensure that the clocks on your ESXi hosts are kept in sync by specifying a common NTP source. Time drift per se does not influence vMotion, however, there’s a slight issue. When a VM is migrated to a host with an out-of-sync clock, the VM’s guest OS clock will adjust accordingly. This means that if the time on ESXi is off, so will be that on your VM. Not a good thing if you’re migrating something like an AD domain controller which acts a PDC for the whole domain.

You are probably familiar with the Synchronize guest time with host VMware Tools option. What you may not know is that, independently of the option being disabled, there are instances where a VM will still synchronize its clock to that on the ESXi host. Some instances that trigger this behavior include snapshots, restarting vmtools and, in our case, vMotioning a VM.

To completely disable time synchronization at the VM level, the lines listed next have to be included in the VM’s configuration file as per this KB article.

Option Value
tools.syncTime 0
time.synchronize.continue 0
time.synchronize.restore 0
time.synchronize.resume.disk 0
time.synchronize.shrink 0
time.synchronize.tools.startup 0
time.synchronize.tools.enable 0
time.synchronize.resume.host 0

 

I use the following one-liner PowerCLI to quickly retrieve the time setting on all the hosts managed by specific vCenter instances.

foreach ($esx in (get-vmhost)) {$esx.Name + " -> " + (get-view $esx.ExtensionData.ConfigManager.DateTimeSystem).QueryDateTime().ToLocalTime()}
Figure 10 - A PowerCLI command that retrieves the time setting on vCenter managed ESXi hosts

Figure 10 – A PowerCLI command that retrieves the time setting on vCenter managed ESXi hosts

 

The following links describe the symptoms and possible solutions to other common and not so common vMotion issues.

 

Conclusion


We’ve seen how a few tips and checks go a long way to ensure optimal vMotion functionality as well and prevent some common vMotion troubleshooting issues you might bump into. In vSphere Networking Basics Part 1 and Part 2 posts, I go into some depth on how to set up VMkernels and how service-specific TCP/IP stacks can improve performance, so do have a look before you leave.

Altaro VM Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

Leave a comment

Your email address will not be published.