Top 10 Critical Skills Every vSphere System Administrator Should Know

Save to My DOJO

Top 10 Critical Skills Every vSphere System Administrator Should Know

Good evening, good morning or good afternoon, wherever you may be. This post is another one I’ve been thinking about for quite a while so I’m delighted to bring it to you. People have asked me over the years, “how exactly do you improve your skills on VMware?” or “what things should I work on?” It’s something that I answer a lot, either when I’m doing consulting or through my training classes. So here we are. I want to outline 10 critical skills that I feel anybody working with VMware, and specifically, the vSphere suite should know. It’s hard for me to actually rank them in terms of importance since I think they’re all equally important. So, yes, they are in no particular order – you should master all of them!

Let’s get started!

1. Being able to explain how vSphere interacts with CPUs, memory, networks, and storage

vSphere can act funny at times. There’s no doubt that understanding how VMs interact with the host will benefit you. You’ll find yourself up early on a weekend morning troubleshooting and having a good grasp of what is going on will help you.

The amount of CPUs you can assign to a VM depends on the total amount a host has available first and foremost. This number is determined by the number of physical CPUs (sockets), the number of cores that a physical CPU has, and whether or not hyper-threading technology is supported on the CPU. Most of the time nowadays, hyper-threading is the new normal. Just be sure it’s enabled in the BIOS.

When it comes to provisioning VMs, don’t just start drooling over your massive ESXi host and just stack up massive VMs that are larger than they need to be. This practice will get you later. The biggest issue with assigning too many vCPUs to a VM is CPU scheduling.  Unlike memory, which is directly allocated to VMs and is not shared (in most cases), CPU resources are a shared resource that must wait in a line to be scheduled and processed by the hypervisor. The hypervisor has to find a free physical CPU/core to handle each request. Read this article that outlines virtual machine sizing to get you started.

With memory, we have things like TPS (Transparent Page Sharing), Ballooning, and swapping. Generally, memory is harder to troubleshoot because there are multiple layers of memory. When you power on a VM, you also use disk space with the swap file. If you’re wondering where your datastore space is going, I’d start looking here. It’s often forgotten about.

From a network perspective, we have different types of adapters, some perform much better. Like the vmxnet3 adapter. Know that virtual machines communicate using physical network adapters that can become overloaded. Load balancing is key. If you want to know more, I’d highly recommend checking out our Networking Basics series.

When we think of storage, we need to consider things like read and write patterns. Is your workload heavy on reads or heavy on writes? It’s very important to make sure that you understand that IO/second is a key metric that can’t be taken lightly. Generally, your performance will suffer if you don’t anticipate your future needs and grow your disk performance/capacity. This is a huge reason I’m a fan of vSAN.

2. Use performance charts to view and improve performance

Performance charts are fantastic if you know how to use them. Know what a counter is, and know that counters are the key to finding exactly what your problem is. Overview performance charts are not the answer. The only reason to use an overview chart is if you have not been experiencing any problems and you’re just quickly glancing at the environment and you don’t have anything to be concerned about. Want to learn more? Here’s an article that outlines some great ways to use performance charts.

3. Understanding the response of vSphere HA when an ESXi host fails

When an ESXi host crashes, where do your VMs go? You need to understand that. You need to know why shared storage is so critically important. Basically, to put it bluntly, if your virtual machines are NOT on shared storage they are still going to be on that failed host when it fails. It might seem obvious to some, but I’ve seen it all. Some environments will have HA turned on but not have any VMs moved to a shared datastore. If the host goes down, that’s a bad day.

When you enable HA, the FDM agent is configured on the ESXi host. A cluster contains one single master host and that master host tells the other hosts in the cluster what exactly to do when a host fails. If the master host fails, the other hosts will have an election and elect a new master. Learn more about HA setup and configuration.

4. Use templates and cloning to deploy new virtual machines

Templates and clones are the perfect thing for any environment. Even small environments! I see a lot of people who are new to vSphere just try to manually setup each VM. It’s a huge waste of time. Shorten your deployment times by automating your builds. The best part of templates is you get the same identical build each time. We’ve got a great guide on getting started with templates.

5. Know the difference between vSwitch and Port Group configurations

Know the difference between a vSwitch and a Port Group. I often hear these terms misused and flip-flopped.

A vSphere Virtual Switch allows a number of virtual machines connected to it to communicate with one another, pretty much like their physical counterparts would when connected to a physical switch. In addition, vSwitches bridge virtual networks to physical ones using the ESXi host’s network cards. It helps to think of the physical NIC associated with a vSwitch as being like the uplink port on a physical switch.

A port group, as the name implies, is a grouping of switch ports. By applying a network policy to a port group, one can enforce security and traffic shaping rules. Additionally, if you have VLANs set up on your physical switches, you can assign a VLAN ID to a port group such that any vm on it will reside on that specific VLAN.

6. Understand the upgrade process

Not necessarily every single specific aspect of each version (they’re all different), but be confident in tackling it.

Ahh, the good old upgrade. We’ll all have to do it at some point. The worst thing anybody can do is to build an environment and just let it sit and rot away. It makes me cringe getting a call or email wanting an upgrade from vSphere 5.0 to vSphere 6.7 like it’s nothing.

Upgrading is a huge undertaking, but knowing the process makes it simple. In a basic environment, let’s just say a vCenter, a few ESXi hosts and some shared storage you will want to start with your vCenter upgrade first. Be sure that your hosts are not too old that they won’t be supported once the upgrade is completed. You can check the compatibility matrix here to verify that.

After vCenter is upgraded, then do your ESXi hosts. Build a baseline in Update Manager and get it done. Preferably at the cluster level so you can use DRS and not have any downtime throughout the process. Next, upgrade your virtual machine hardware and VMware Tools inside the VMs. It’s also possible that you might want to upgrade your VMFS datastores to a new version. I generally do that last. Rough flowchart of the process below.

vSphere upgrade process

That in a nutshell is the upgrade process and has been that way for many many versions.

7. Know how to troubleshoot vMotion issues

vMotions fail for a few reasons. Sometimes you’re just needing to move a VM from one host to another quickly and it just won’t budge. I’ll post the top 3 reasons below, but I highly recommend reading our guide to troubleshooting vMotion which also has a few other reasons listed.

  1. VMkernel port is not configured properly.
  2. Remove unused ISOs on local disk.
  3. NTP issues among the hosts.

Those are some of the things I’ll check myself. The good news is that VMware does give you warnings about ISOs, so that one is generally an easy one to troubleshoot. Just remove that ISO file from the virtual CD-ROM!

8. Create and manage alarms

vCenter provides a list of default alarms, which monitor the operations of vSphere inventory objects. It’s important to find a sweet spot so you’re only notified when something you really want to know about is happening. I do believe you can suffer from overload if you’re getting notifications about anything and everything. A couple alarms I always recommend are datastore usage, CPU and memory usage and if an HA host fails. CPU and Memory usage alarms can be mitigated with the use of DRS but if a host fails you definitely want to know. There is nothing worse than coming in on a Monday morning and not having any idea that a host went down until Sally from the Accounting department asked why you rebooted her machine! Check out our post on setting up vSphere alarms.

9. Understand the benefits of distributed switches

Distributed switches are often overlooked because they are only sold with Enterprise Plus licensing. A distributed switch is used to centralize the network configuration of managed ESXi hosts. Unless standard switches are required for a specific reason, a vDS voids the need to create a standard switch on every ESXi host.

vSphere Distributed Switches add a suite of other features including private VLANs, link aggregation control and port mirroring to boost NIC teaming, port state monitoring, inbound traffic shaping for better network performance, traffic filtering, network I/O control, NetFlow for traffic analysis (one of my favorites), and even vDS configuration backup and restoration. Being able to understand and know when to implement a distributed switch vs. a standard switch is a must know skill.

10. And finally – understand the benefits of HCI. vSAN and NSX

vSAN and HCI are not going anywhere. I myself happily admit that I need to rack up more knowledge around NSX. It’s my weakest point of knowledge in the stack. Having the skills to work with these tools will be an absolute requirement in 5-10 years. The days of legacy based storage and networking will be in the past. It’s funny because I’ve been told by others to not jump on the HCI bandwagon, and I’ll admit that when it first came out I was skeptical, but it’s proven itself over the years. vSAN is very impressive and having the skills to manage it are going to be key. It makes sense for most SMBs as well. If you want to read more, look into our interview with 9 vExperts on VSAN and Hyper-Convergence and got their opinions on it.

Wrap Up

I hope this post helps you decide where to spend some time ramping up skills. I enjoyed getting it all “on paper” as well. If you think of anything you want us to cover, let us know. I’m always looking for ideas!

Any skills you feel we missed? Anything you think should be added? Let us know in the comments section below!

[the_ad id=”4738″][thrive_leads id=’18673′]

Altaro VM Backup
Share this post

Not a DOJO Member yet?

Join thousands of other IT pros and receive a weekly roundup email with the latest content & updates!

3 thoughts on "Top 10 Critical Skills Every vSphere System Administrator Should Know"

  • Mike Pagán says:

    I think this is a really good article and I’ll be sharing it with my team at work.

    One item I would like to comment on is the vSphere upgrade order. I have read in the past that VMware Tools should be upgraded before VMware Hardware Compatibility level. I don’t know if this is me carrying over a process from older versions or not but I did find the KB article below that states “Note: Upgrading VMware Tools must be done before upgrading the virtual hardware except for the guests running Linux distributions or FreeBSD releases that have vendor supported open-vm-tools installed in the guest.”

    I thought that this was worth mentioning.

    Source: https://kb.vmware.com/s/article/1010675

  • adrian says:

    Excellent article! But yes, as Mike already mentioned, the correct upgrade order would be vCenter -> ESXi -> VMware Tools -> VM Hardware

    • Luke Orellana says:

      Hi Adrian,

      The order really depends on the system. Also, VMware recommends not upgrading the hardware version at all unless you need specific features in the new version.

Leave a comment

Your email address will not be published.