Oh noes! I’ve lost my vCenter appliance root and/or grub password, halp!

I recently encountered a situation where an issue with a vCenter Server Appliance 6.0 required logging into the shell as the “root” user, but either the password was recorded incorrectly, or the password which was set was typed incorrectly (twice).  Regardless, it was not possible to log in as root, nor was the grub password known (most likely the same password as root when the appliance was initially configured), so we were stuck between a rock and a hard place.


VMware has a KB article that details how to reset the VCSA root password, however and unfortunately, this required entering the grub boot loader password to edit the boot file, so it was kind of a “chicken before the egg” scenario.  Luckily, I found a blog post on UnixArena.com that detailed using a Red Hat Enterprise Linux ISO to boot into recovery and gain access to a file that allows you to bypass the grub password which in turn allowed me to change the root password.  However, once the root password was changed, the grub boot loader was still unprotected by a password which is no bueno.  With some assistance from VMware Support, I was able to set a new grub boot loader password on the VCSA and all was good with the world again.

This post aggregates information from several different sources, and I’ve added in some material of my own to tie the whole process together and it a little easier to follow.  Thanks to UnixArena.com, VMware Support, and Tecmint.com for the resources.

Now, on to the good stuff…

First, you need to download a Red Had Enterprise Linux .ISO – you are required to create an account to request an evaluation, which allows you to download the .ISO.  The version I used for this post was RHEL 7.4.

Upload the .ISO to a vSphere datastore and mount it in the CD-ROM drive of your VCSA.  Power down the VCSA, take a snapshot of it, and then edit the “boot options” to “Force BIOS Setup” so that you can enter the VCSA’s BIOS and modify the boot order.


Once you’re in the BIOS, change to the “Boot” tab and use the “+” key to move “CD-ROM Drive” to the top of the boot order list.  Use the “right arrow” key to move to the “Exit” tab and choose “Exit Saving Changes”.  The VCSA should reboot and boot to the RHEL .ISO.


Use the “down arrow” to select “Troubleshooting”.


Use the “down arrow” key or “R” to select the “Rescue a Red Hat Enterprise Linux system” line, then press “Enter”.


The next screen will prompt you to mount the file system in “read-write mode” by selecting option 1.


When prompted, press “Enter” (or Return) to get a shell.


Once the shell is loaded, you should see this:


Change to the “mnt/sysimage/boot” directory (cd /mnt/sysimage/boot), view the contents (ls –lrt) and you should see the “grub” folder.


Change to the “grub” folder (cd grub) and view the contents (ls –lrt) and you should see a “menu.lst” file.


This next step is optional, and if you’ve taken a snapshot of the VCSA before making any changes (which I hope you did) you could always just roll back, but I like to make a backup of the file I’m about to modify, which in this case is “menu.lst”.  Enter the command “cp menu.lst menu.lst.bak” and a copy of the “menu.lst” file will be made named “menu.lst.bak” which could be used to recover the file if you make a mistake in the next step.


Use the “vi” editor to modify the “menu.lst” file by entering the command “vi menu.lst”

The hashed grub password is highlighted below – use the “down arrow” key to move to the line beginning with “password” and type “dd” to remove the line.  Then, enter the command “:wq” to exit and save the file.


Note that the “password” line is removed.


Exit the shell by entering the commands “cd” and then “exit”.  Be sure to unmount the RHEL .ISO or you will boot back into it.

When the grub boot menu appears, press “space”.  Now that the grub password has been removed, you should see that the instructions to enter “p” to “unlock additional options” is no longer present, and you can proceed to edit mode immediately.

Make sure that the “SUSE Linux Enterprise Server…” line is selected, and press “e”.


On the next screen, select the line beginning with “kernel” and press “e” again to edit the boot command.


Append “init=/bin/bash” to the line below, and then press the “enter”.


With the below line highlighted, press “b” to boot into the shell.


When you get a shell, type “passwd root” to change the root password.


Once you’ve entered a matching password twice, you should get a success message that the password has been changed.  Apparently it thinks the password I used is too simple, but whatever, lab.


Reboot the VCSA by issuing the “reboot” command or “Power > Reset” VM option.

When you see the boot menu again, you should notice that there is still no grub password set, meaning that anyone who gains access to the console of the VM and can reboot it can change the root password.  Obviously, if someone is crafty enough to mount your RHEL image and go through the process we just followed, they could still remove the grub password and then change root, so it’s important to have “least privileged” role based access, shield your management network from user facing subnets, and that sort of thing.

The next portion of this post will focus on putting a grub password back in place.

Once the VCSA has booted, press “Alt + F1” to gain console access, then enter the “root” username  and your recently set root password.


Once you’ve authenticated, enter the command “ssh.get” to verify that SSH is currently disabled.  If the status returned is “True” skip to the next section.  If the status returned is “False”, enter the command “ssh.set –enabled true” to enable SSH.  Verify that SSH is now enabled by entering the “ssh.get” command again.

Alternatively, you can use a web browser to the VCSA’s “VAMI page” by going to https://vcenterhostnameorip:5480, logging in as root, selecting “Access” from the navigation menu, and enabling SSH and bash shell.  Since I already had the VCSA VM console open, I did it there.


The next steps we are going to use an SSH client like PuTTY instead of the VCSA VM console so that we can use copy and paste functions easier, which will help ensure the MD5 hash gets entered correctly.  Connect to your VCSA using SSH and login using the root account and the newly set password.

Once logged in to the SSH session, enable shell by entering the commands “shell.set –enabled true” and then “shell”.


At the shell, enter the command “grub-md5-crypt”, and then correctly enter a matching grub password twice.  You will need to copy the md5 hash to clipboard, so highlight it and then paste into Notepad or another text editor for safe keeping.


Next, we will need to modify the “menu.lst” file that we removed the previous hashed password from earlier.  Edit the menu.lst file by entering the command “vi /boot/grub/menu.lst”.

Once in the text editor, press the “insert” key, “down arrow” to the line underneath “timeout” and press “enter”.  “up arrow” once to the newly create blank line, type “password –md5 “ (space after –md5) and then paste in the copied md5 hash.


Once the hash has been pasted into the “menu.lst” file, press “Esc” to exit “edit mode”, then enter “:wq” to save and quite.  Reboot the system and verify that the grub boot loader is once again password protected.  You should see it prompt for “p” instead of “e” if the menu.lst file modifications were successful.  If “e” is still displayed, verify the contents of the menu.lst file are correct and there aren’t any missing characters or anything like that.

Press “p” to enter your new grub password to ensure everything is good to go – if it unlocks the option to edit boot commands, your job is done.  Don’t forget to remove the VM snapshot once you’ve determined your changes are successful.







Nutanix Announces Xtract for VM’s, Simplifying the Migration to AHV

Today Nutanix announced a new product called “Xtract for VM’s”, which is a tool to simplify the migration from other hypervisors (currently ESXi only) to Nutanix AHV.

While several options currently exist for migrating from ESXi to AHV, such as in place cluster conversions, Cross Hypervisor DR (if both source and destination are Nutanix clusters), a more manual svMotion/import, Xtract for VM’s facilitates a prescriptive and controlled approach for moving workloads from ESXi to AHV.

Xtract for VM’s uses a simple “one click” wizard approach to target one or more VM’s for migration.  To migrate a VM, a “Migration Plan” is created using a simple wizard and the following criteria are configured:

  • Select or more VM’s (batch processing for efficiency)
  • Specify guest operating system credentials (to install AHV device drivers)
  • Specify network mappings to retain network configuration (correlate the source network in vSphere to destination network in AHV)
  • Specify a migration schedule, if required, to seed data in advance


When a VM is configured for migration, a copy of the source ESXi VM is created in AHV, and then any changes to the source VM are synchronized to the destination VM up until the point of cutover.  Downtime for the cutover is minimized and only incurred when the source ESXi VM is powered down and the destination AHV VM is powered up.  The source VM is left on the ESXi host in an unmodified state so that it can be reverted to if an issue is encountered during testing.  Migration can be paused, resumed, or canceled at any time.


Xtract for VM’s is available at no additional charge to all Nutanix customers.  However, there are some caveats and requirements for use which can be found on Nutanix’s support site including, but not limited to:

  • Source node must be running ESXi 5.5 or higher
  • vCenter 5.5 or higher must be present and used for migration
    • migration direct from ESXi hosts is not possible
  • There are certain disk limitations that aren’t supported, such as:
    • Independent disks
    • Raw Device Mappings (RDM’s)
    • Multi writer (shared) disks
  • Guest OS must be supported by AHV

Nutanix has built an increasingly compelling argument for migrating to AHV from other hypervisors, as acquisitions (Calm, automation and orchestration) and product enhancement (such network visualization, micro-segmentation, self service portal, AFS/ABS services, etc.) have made their solution more than “just another hypervisor” and now have an answer to most any use case or requirement.

Customers who were hesitant to adopt a “relatively new” hypervisor a couple years ago, or that had a particular use case (such as micro-segmentation via VMware NSX tying them to vSphere) may now have a viable alternative and I suspect that more customers will, at a minimum, be investigating the possibility of migrating away from their existing hypervisor.

If you’re like me, you like options and flexibility in a solution.  Competition is good, and if nothing else, maybe your next VMware renewal will be a little bit cheaper 😉  Easy to use in-box tools such as Xtract for VM’s that simplify the migration process and increase its probability for success make it an easier “sell” aside from the “dollars and cents” argument.

Read more about Xtract for VM’s on the Nutanix Blog, download it HERE, or read the user guide HERE.



Planning Firewall Rule Sets for Micro-segmentation

I recently gave a short presentation at the local VMUG Usercon on “my journey to micro-segmentation” and thought I’d adapt part of my slide deck to make a blog post on how I go about planning and implementing firewall rules for applications or tenants using the VMware NSX Distributed Firewall.

The more I work with the Distributed Firewall the more I realize there really isn’t one “right way” to secure your VM’s with it…it’s less about how the sausage is made and more about the end result.  However, there are a few steps I see as a requirement for laying down firewall rules over an existing production environment to ensure rules are scoped accurately and without blocking necessary traffic to avoid an issue with operations.

Step 1 – Gather Information

It is important to gather as much information about the traffic needs of the application or tenant upfront.  There are plenty of well known protocols/ports that we can rattle off from memory for various things…a web server will probably need HTTP/S over ports 80/443, a file server will probably be talking SMB over port 445, SQL Server on port 1433, LDAP over 389 or 636, etc.  These are the easy ones.

However, there are inevitably going to be in-house developed apps or industry specific apps that use custom ports or protocols that are in an application tier or within a tenant that may be harder to identify.  The vendor may have good documentation for all ports and directions of communication, but I think this is more the exception than the rule.

If you go to your application owner or support team and ask “what does each tier of your application talk to and over what protocols and ports” you will probably get a reaction pretty close to this:


If you do get something back from them, there’s a good chance it’s not right…or at least not the full picture.  This is not a knock on app teams, it’s just the reality of the situation that servers and applications are far chattier than many people realize or give thought to.  Being able to block traffic at the virtual NIC level is extremely powerful and when you have to take into consideration east-west traffic flows that were typically never firewalled in a “physical world” there is a lot more work that needs done.

Step 2 – Trust…But Verify

To have any legitimate chance for success at laying down DFW rules on an existing application or tenant without breaking things you need a traffic monitoring tool.  That statement should probably be in CAPS, italicized, and the color red.  You will inevitably block traffic that you were unaware of if you do not monitor the actual traffic flows to and from the VM.  It will happen.  Even if you do somehow know all ports and protocols that are in use by a particular system, it’s not uncommon to find misconfigurations in the guest OS such as pointing to incorrect DNS servers that could be problematic when firewalling a VM.

There are lots of solutions out there that can get you this data.  You could pull it from syslog, maybe your network team already has some sort of NetFlow aggregator, or some in house developed solution.  Whatever you choose, the important part is that you are able to pull relevant, accurate, and easy to consume information from it that allows you to be actionable.  Manually parsing 20,000 lines in a .CSV is none of those things.

For this, we chose vRealize Network Insight.  Not only does it log all flows from the vSphere Distributed Switch, there are tons of “NSX’y” things it does above and beyond traffic flows such as environmental health checks, alerting, and visualization of the network both in the logical and physical realms.  I wish VMware would consider bundling in some of the vRNI features at certain tiers of NSX licensing – the easier you make it to consume the solution the more people will buy of it.  VMware did pay a pretty penny for the acquisition from Arkin so I also understand the need to monetize it.  Regardless, we found it fairly reasonably priced and when you take into consideration the potential financial loss for causing an application outage, it was a no brainer.

The following screenshot is an example of the information you can extract from vRNI.  In my opinion, the most powerful piece of vRNI is the “search” feature.  It’s extremely intuitive to query and that’s how I generate all of my flow data used in firewall rule development.


I will typically use a query like “show flows where src ip =” or “show flows where vm name like [vm name]”.  What I’m after is showing both sides of communications flow…I want to know all traffic inbound to the app / VM / tenant as well as all communication outbound from it and adjust my queries as such.

Once I’ve gotten the data I want from vRNI I will export the reports to a .CSV file and further massage the data…possibly removing traffic I know will not be required, removing rows based on the IP or subnet, etc.  This is one of those areas that has to be left open to interpretation, as there are a ton of different ways to scope and filter the data to be applicable to your specific environment.  If you’ve done your job well, at this point you should have identified probably 95% of the required traffic required and can begin creating rules in the Distributed Firewall.

Step 3 – Proceed With Caution

With great power comes great responsibility – as mentioned previously, the ability to apply firewall rules at the vNIC level, before it’d ever hit physical media let alone route through a physical firewall, is extremely powerful.  It introduces all sorts of new ways you can really screw up your day if you don’t take a few precautionary steps.


I’m going to give a brief overview of my approach to micro-segmentation with the DFW so that the rest will make sense.  When I deploy NSX into an environment, I do so from a position of extreme caution.  I want to avoid an administrator making a mistake, a firewall rule being too broadly scoped, or a bug/issue with the solution itself from creating an outage.  Again, there are lots of ways to go about this – I’m not saying my way is THE way, but hear me out.

  1. The first thing I do is ensure that the “default rule” in the NSX DFW is set to allow.  In earlier versions of NSX, you were given the choice during deployment of NSX Manager whether or not you wanted your default rule to “allow” or “deny” traffic.

    If you chose “deny” you’d probably end up “islanding” your environment, as there are no DFW rules yet so of course traffic will be blocked.  However, the default rule is set to “allow” by default during installation in current versions of NSX, and I don’t think you are even given the option to choose otherwise.  Anyone (myself included here) who played around with NSX in their lab has at one point or another probably blocked all their traffic by mistake and had to issue a REST API call to remove it.

    Instead of having a “global” default rule set to deny, I’ve been doing it on a per-app or per-tenant basis – at least during the rollout phase where the majority of the systems in the environment are not being firewalled yet.  Having the default rule set to “allow” ended up being highly prescient, which I’ll touch on in the next section.

  2. The next thing I do is place any VM not actively being firewall in the “exclusion list” on the NSX Manager.  This prevents any of that VM’s traffic from being processed for filtering by the DFW.

    By doing this I’m hoping to avoid an issue where I have VM’s that don’t have DFW created for them yet, and somehow the default rule gets flipped to “block” or a rule is too broadly scoped and ends up causing me problems.

    There is actually a bug in one of the recent versions of NSX where a condition occurs that causes all VM’s to be removed from the exclusion list by mistake, suddenly opening them up to DFW filtering.  If your default rule is set to “block” and you don’t have rules in place allowing the necessary traffic, you now have an outage on your hands.  Thanks VMware.  This did actually happen to me and I suddenly felt very glad I had left my “global” default rule as “allow”, therefore an outage was avoided.

  3. I use the “applied to” field in the Distributed Firewall to limit the scope of systems considered for processing of that particular firewall rule.  The default setting is to apply a newly created firewall rule to the “distributed firewall”, therefore any VM not on the exclusion list is checked against it for processing.  In a large environment, that’s going to be hundreds or even thousands of rules being checked that have nothing at all to do with that system.  There’s been times I was troubleshooting an issue by showing what rules were applied to the vNIC, and if literally every rule in the environment showed up in the list, it’d have greatly complicated troubleshooting.

    If the firewall rule set applies to an entire tenant, I’ll create a Security Group that contains that tenant’s VM’s and have all the rules in that section have their “applied to” field configured for that Security Group.  If a firewall rule set applies to a single VM or tier of VM’s, I may select the individual VM or possibly a Security Group in the “applied to” field.

    Having “applied to” configured limits the scope and failure domain that a misconfigured rule may impact…instead of it applying to the entire environment, maybe you just block traffic on a handful of VM’s and the damage is much smaller in scope.

Step 4 – You’ve Got the Data, Now Do Something With It

By now you’ve probably (or not) talked with your application owners about how their application communicates with the environment, you’ve generated flow data from vRealize Network Insight, and massaged the .CSV output to further refine the data.

It’s now time to take that output and create your initial firewall rule set in the DFW.  The below screenshot depicts a sample firewall rule set for a tenant.  There are multiple applications within this tenant, and its VM’s have been placed into Security Groups by app or function.  The flow data from vRNI was used to allow the appropriate traffic in or out bound.  A Security Group containing all the tenant’s VM’s was used as the scoping object in the “Applied To” field, for the reasons mentioned previously.

At the end of this rule set you will see the two “default” rules for this tenant.  The “outbound” default rule has a source of [tenant security group], a destination of “any”, and a service of “any”…with the source and destination being reverse for the “inbound” default rule.  The “Action” is currently set to “Allow” during the analysis phase, so that logging can be enabled on the “default rules” to see if any traffic that didn’t match one of the previous rules in the rule set registers as a “hit”.  These “hits” are obviously going to result in blocked traffic once the default rules get set to “block”.


With logging enabled, we can go to the host(s) that contain the app or tenant’s VM’s and parse the “dfwpktlogs.log” log file to see if any traffic that wasn’t accounted for is hitting the default rule.  This is kind of the “last chance” to rectify any missed traffic – there may be things legitimately blocked and logged here that you do not need to be concerned about…outbound web traffic to Microsoft on a Windows server for example.  It’s the “other” we are concerned about now.

To parse the “dfwpktlogs.log” file, open an SSH session to your host(s) and enter the following commands:

  • cd /var/log
  • cat dfwpktlogs.log | grep 1125 | grep 192.168.1 | grep 2017-05-16


The above command parses the dfwpktlogs.log command, filtering by rule ID 1125 (the outbound “default rule”), filtering by IP subnet, and filtering by date (to avoid returning flows from days where logging was enabled previously)

Enabling logging on a “default rule” can generate a large amount of data, so it’s recommended to only leave logging enabled temporarily – a time period measured in hours or maybe a day.  Enable it during times that would represent “normal” business function or during a time that some core process runs for that app/tenant to give yourself the most valid logging data.

If you see traffic that you believe should be allowed but is instead hitting the default rule, either modify an existing rule to include the traffic or create a new rule within the rule set to allow it.  Once the logs are clean, or you’re only seeing traffic you expect to be blocked (i.e. outbound internet traffic to Microsoft from a Windows server) then you’re ready to flip the “default rules” to “block”.

Step 5 – Ongoing Operations

So you’ve planned out all your micro-segmentation rules, you’ve created the initial rule set, you’ve monitored the dfwpktlogs.log files to make sure you didn’t miss anything and adjusted the DFW rules where necessary, and you’ve switched your “default rule” to block, and everything went well…….now what?

First thing – pat yourself on the back.  While not overly difficult, properly planning the micro-segmentation of an application or tenant can be quite time consuming to account for all necessary traffic to avoid issues.


OK, now that’s out of the way…you’ll inevitably get a panicked email from an application owner saying “we performed an upgrade and now the application won’t start”….the upgrade changed or added some ports used and they weren’t in the original rule set created for the app, now they’re blocked.

While you can certainly generate some new flow data from vRNI, I’ve found the quickest and easiest place to check for a blocked flow is the “Flow Monitoring” section in the NSX management GUI.  The time window to show flow data for is completely configurable – if you’re like me, you rarely find out about an issue shortly after it happens…most likely you’ll be going back several days to find that needle in the haystack.  By using an appropriate time window, selecting from the “Blocked Flows” tab, and using the “filter” mechanism, you should be able to find the issue with little effort.



Hopefully you found this post helpful.  As mentioned several times already, the NSX Distributed Firewall is extremely powerful and it gives you great flexibility on how to accomplish an increased security posture in your environment.  This methodology is not necessarily THE way, but it’s my way and has worked out pretty well for so far.  As always, I’m open to hearing about new and better ways if you have a different way to do it.

Nutanix Announces Support for HPE Servers and a New Consumption Model

Today Nutanix announced support for HPE ProLiant server hardware and a new consumption model called “Nutanix Go”.  Both announcements support Nutanix’s position that the “enterprise cloud” should be flexible, easy to consume, and with the power of the public cloud….what I like to call the “have it your way” model.


HPE ProLiant Support

The announcement of support for HPE server hardware probably doesn’t come as a surprise to many because it’s very similar in nature to the announcement of support for Cisco UCS hardware just a few months ago.  While Nutanix had OEM agreements in place with both Dell and Lenovo hardware, customers wanted the flexibility to use Cisco UCS – their existing server hardware standard,  and after a validation process, Nutanix offered a “meet in the channel” procurement model where a customer buys the Nutanix software from an authorized Nutanix reseller and then buys the validated server hardware from an authorized Cisco reseller.  The announcement for HPE follows this same model using select HPE ProLiant server hardware (currently DL360-G9 and DL380-G9).

While it’s safe to say that there will probably be some gnashing of the teeth regarding this announcement just like there was from the Cisco UCS one (especially in light of HPE’s recent acquisition of SimpliVity), I see it as a win for everyone involved – the customer gets another choice for server hardware and the software that runs on it, channel partners have more “tools in their tool chests” to offer best in class solutions to their customers, and vendors get to move more boxes.

As mentioned earlier, Nutanix plans to support two HPE ProLiant server models initially – DL360-G9 and DL380-G9.  The DL360 is a 1U server with 8 small form factor drive slots and 24 DIMM slots.  The targeted workload for this server (VDI, middleware, web services) would be similar to the Nutanix branded NX3175…things that may be more CPU intensive than storage IO/capacity intensive.  The DL380 is a 2U server with 12 large form factor drive slots and 24 DIMM slots.  The targeted workload for this server would be similar to the Nutanix branded NX6155/8035…things that may generate larger amounts of IO or require more storage capacity.

Nutanix will offer both Acropolis Pro and Ultimate editions in conjunction with the HPE Proliant server hardware.  Starter and Xpress editions will not be available at this time.  However, one interesting tidbit is the fact that software entitlements are transferable across platforms, meaning that a customer could leverage Nutanix software on an existing HPE server hardware investment (assuming it met the validated criteria) and at a later date “slide” that software on over to a different HPE server model or perhaps a Cisco UCS server at the time of a server hardware refresh, if they so chose.

Support is bundled with the software license as a subscription in 1, 3, or 5 year terms.  Just like the model with Nutanix running on Cisco UCS hardware, the server hardware vendor still fields hardware concerns, Nutanix will support the software side, and when in doubt, call Nutanix – if the issue is on the hardware side, concerns will be escalated through TSA Net for handoff to HPE support.

As far as availability timelines are concerned, it should be possible to get quotes for this solution at announcement (today – May 3 2017), with the ability to place orders expected for Q3 2017, and general availability targeted for Q4 2017.

Nutanix Go

Nutanix labels Nutanix Go as “On-premises Enterprise Cloud infrastructure with pay-as-you-Go billing”.  In a nutshell, a customer now has the ability to “rent” a certain number of servers for a defined term, ranging from 6 months to 5 years depending on configuration and model, with pricing incentives for longer term agreements, and billing / payment occurring monthly.

While an outright purchase is probably still the most advantageous in terms of price, there are plenty of scenarios beyond price where the flexibility of quickly scaling up or down in a short time period without keeping hardware with a 3 or 5 year lifecycle on the books…having costs fall under OPEX instead of CAPEX, “de-risking” projects with uncertain futures, augmenting existing owned Nutanix clusters, etc.  Customers will have the ability to mix “rented” nodes with “owned” nodes within the same cluster, enabling a sort of “on premises cloud bursting” capability.

The pricing for Nutanix Go is structured in such a way that the TCO is supposed to be significantly less than running a similar workload in AWS while mitigating some of the “use cases” that may traditionally necessitate consuming a public cloud.

Nutanix Go includes hardware, software, entitlements, and support under one SKU.  It’s priced per block, per term length, and as mentioned previously, billing and payment occur monthly.  Currently, there is a minimum of 12 nodes required for an agreement which in my opinion is a bit high.  I’d like to see something more a long the lines of what is the required minimum for a Nutanix cluster…something like 3 or 4 nodes that might be more attractive to small and medium sized business.  On the flip side, since it is Nutanix keeping the hardware on their books and allowing the customer to rent it, I can see why they’d want a certain minimum to make it worth their while.  Perhaps this will change in the future.

As far as availability is concerned, Nutanix Go is initially only available to US customers, with rollout country by country for the rest of the world in the second half of 2017.


In summary, “more choices” is always a good thing, and further proof that the “power” is in software.  I’m sure many customers, both potential and existing, will find these new consumption models to be a welcome addition.

Quest Kace 7.0 Upgrade = Domain Authentication Issues

Over the past couple weeks we had issues with a handful of legacy Windows Server 2003 boxes that would randomly “lose connection to the domain” – they were unable to access other resources on the domain and could not authenticate interactive logons using domain accounts.  Credit goes to my coworkers who were the ones to uncover the source of the issue, but I wanted to get this out there in case anyone else runs into it and bangs their head on the wall trying to figure out WTF happened.

A few days after upgrading the Dell/Quest/whoever KACE appliance to version 7.0 the first round of servers had domain authentication issues and exhibited the following errors in the Event Log:


It appears as though the “storage” referenced by these two errors is actually lack of memory, not lack of available space on the system drive.


When attempting to logon to the server with a domain account (local accounts still worked fine):


“konea.exe” (KACE agent) with ~12K open handles


Killing the konea.exe process restored domain functionality almost immediately (as would a reboot as a side effect of the service restarting) but without disabling the Windows service for it, it’d only be a matter of time before the issue returned.  We ended up disabling the service temporarily until the issue was escalated through KACE support.

Initially we thought the issue was isolated only to 2003 servers, but it occurred on a handful of 2012 servers about a week after the incident on the 2003 servers.  I’m assuming that is due to 2003 not being able to support as many open handles as more modern Windows operating systems, but as the uptime increased on the newer servers, they too would fall victim to it eventually.

While the KACE server version was upgraded to 7.0, the agents deployed on most of the servers were still 6.4.  KACE support stated this should normally not be an issue, but recommended that the agent be upgraded to the same level as the server (7.0) in order to “resolve” the issue.  As of the writing of this post, we are still waiting to hear what actually “broke” with the 6.4 agent > 7.0 server interaction.

They confirmed that an increasing number of other customers were also reporting this issue.  7.0 is still fairly new and I imagine many customers have not moved to it yet, but if they do and the agents are not upgraded at the same time, people will likely run into this issue.  If you are running KACE 7.0 and still have older agents deployed, beware that you could start having servers fall off the domain.


VMware NSX Distributed Firewall Rules – Scoping and Direction Matter

I, like I’m sure many of you, were not traditionally firewall or security admins prior to adding VMware NSX to your vSphere environments.  As such, there’s been a bit of a learning curve for me regarding what I knew [or thought I knew] regarding physical firewalls and how that translates [or doesn’t] to the NSX Distributed Firewall (DFW).

As I’ve been rolling out NSX DFW rules to various types of systems with different accessibility requirements, I ran across some unexpected behavior when scoping the rules.

Let’s look at an example 2 tier application consisting of a “web server” and an “app server”.  If this were a traditional physical firewall setup, the web server would probably be in the DMZ, or at least a different subnet from the app server, the traffic would route through the firewall and rules would be applied to allow or restrict traffic.


As a theoretical example, for our web tier, we’re allowing HTTP/HTTPS/FTP inbound to the web server from “any” source (presumably, any number of public networks), letting FTP back outbound to “any” destination, DNS outbound to our internal DNS servers, and SMB traffic to the app server where files are stored.  We make the assumption that while FTP traffic may be allowed outbound to any destination, it’s only going to reach that destination if it allows FTP inbound.  Everything else is denied by default.  Pretty straight forward.

For the app server, we’re allowing SMB inbound from “any” source (maybe there are several hundred internal VLAN’s that users could access the server from and it is not accessible externally), RDP is allowed inbound from “any” source, we have some various Active Directory / LDAP related ports open for domain membership, pings are allowed outbound to “any” due to a monitoring application hosted on the server, and DNS is allowed outbound to our DNS servers.  Everything else is denied by default.

Based on these firewall rules, when comparing what traffic is allowed in or out of each server, there is really only one traffic pattern which should match between the two, which is SMB from the web server to the app server (highlighted).

However – everything is not as it seems…

At this point, I have created DFW rules functionally identical to the first diagram in this post.  Let’s go through some various connectivity checks…


From the web server, we can access file shares on the app server, thanks to a combination of firewall rule 4 allowing SMB traffic outbound from the web server to the app server, and firewall rule 5, allowing SMB traffic inbound to the app server from “any” source.


From a user workstation, we can pull up the default website on the web server, thanks to firewall rule 1 allowing inbound HTTP traffic from “any” source.  So far so good.


Let’s try to ping it from the same workstation…no dice, and as expected, since ICMP is not allowed anywhere in the rule set “Web Tier” (rules 1 through 4).


Now let’s try the same tests from the app server itself…wait – that’s strange…both ICMP and SMB traffic is allowed from the app tier to the web tier, even though there are no rules applying to the security group containing the web server which specifically allows that traffic in.  Is such a thing even possible?



The “problem”…

Let’s use the “Apply Filter” option in the Distributed Firewall to determine which rule(s) are to blame.  I specified the “Source” as the app VM, the “Destination” as the web VM, changed the action to “Allow” (this could also be handy to see what rule was blocking traffic you thought should be allowed by choosing the “Block” option), and then selected ICMP as the “Protocol”.


And now we can see that Rule 1038 that allows the Security Group containing the app VM to send ICMP traffic to “any” destination has matched the filter.


When I think of firewall rules in the “traditional” manner, I would expect allowing outbound ICMP from our application server to a destination of “any” wouldn’t also imply that ALL VM’s in my NSX environment should also allow that traffic inbound.  The whole point of “zero trust” and “default deny” is that unless traffic is explicitly allowed, it should be denied.  Perhaps to someone who comes from a network/security background and has used many different firewalls, this would be seen as expected behavior in certain scenarios – but that is not intuitive to this virtualization guy.

In a nutshell, there are a couple things in play here…

  1. Scoping matters.  By selecting a destination of “Any”, NSX truly means ANY.  Even though you may not have allowed a particular traffic type inbound on some unrelated system, because we have this “Any” rule, our application server can talk to it over that protocol.  I can see this being particularly problematic in a multi tenant environment, or maybe some kind of PCI environment where you have to prove a definitive dividing line between different systems.  One improperly scoped rule later and you have unintended consequences.
  2. Direction matters.  Hidden by default is the column titled “Direction”.  When creating a new firewall rule, this column is hidden, and the default value is “In/Out”, which is the root of our problem here.  If we’d configured Rule 1038’s “Direction” value as “Out”, it wouldn’t have been implied that it should be allowed “In” on the web server.  In my opinion, VMware should not have this column hidden by default, and an administrator should have to choose a direction on the rule without a value being pre-populated.  In addition, I could find no way to manipulate the “Direction” value when using “Service Composer” – the default value is In/Out and there’s no way (at least in the GUI,) to change it.

The “fix”…

The first way to “fix” this issue is to always assign the appropriate directional value to each firewall rule.  Through a combination of “In” and “Out” rules, your traffic should be allowed in the direction you expect without any “unintended consequences”.  The rules are still Stateful, meaning that if we allow ICMP out to “Any” from the app VM (but only in the “Out” direction), that traffic is allowed to return back to the app VM without requiring a second rule stating so.

Add the “Direction” column to your view


Then, click the “Edit” icon next to the “Direction” value


Then select the appropriate value from the “Direction” drop down menu


Let’s go ahead and modify these DFW rules with the appropriate “Direction” and test again.


As you can see here, from the app VM to the web VM, HTTP, SMB, and ICMP which previously worked is now blocked.


Scoping matters…

The other important thing to consider is the rule scoping – in the example above, the web server allows HTTP/HTTPS traffic inbound from “Any” source.  Perhaps in this case the web server is publicly facing and there’s no real need for internal systems to access it directly.  In such a scenario, an IP Set allowing only public IP addresses to communicate with it could be used.

Here I’ve created two “IP Sets” on my NSX Manager.  One contains “all subnets” that I’ve called “ipset_all-networks” with a range of and the other is a called “ipset_all-private-networks” with the three private IP spaces specified (if you only use a small part of one private IP space, you could certainly get that granular, too).


Then, I created a Security Group called “sg_all-public-networks”, chose a static member of my IP Set called “ipset_all-networks”, then created an exclusion using my IP Set “ipset_all-private-networks” to block any internal IP address from matching the rule.  I could use this Security Group in place of the “Any” scoping object on my publicly facing web server, or even inverse it so that no public IP’s are allowed when scoping a rule.


Obviously, there are many ways to as they say “skin a cat” with the Distributed Firewall, but as I found out…direction and scoping matter.

Got a better or more efficient way to manage the NSX Distributed Firewall rules?  I’m all ears!  😛

Password Protect the Pi-hole Admin Page

I recently got Pi-hole configured on my Raspberry Pi 3 to block ads on my home network.  So far the Pi-hole has worked great and the amount of ads it has blocked is impressive.  I have about 13 devices that all connect wirelessly to my home network including several TV’s, and it’s blocking several thousand ads per day, with a significant bandwidth savings to boot.

There is an admin page where you can view all these interesting stats (http://%5Bpi-hole-IP-address%5D/admin).  There is one problem with this page though – it’s not password protected, so anyone that knows the IP address of your Pi-hole (AKA, anyone who can connect and view their client’s IP info) and also knows it’s a Pi-hole and the admin page is /admin can reach it.

In addition, in the “/admin” page is a section called “Query Log”, and as the name indicates, it’s a log of all the DNS lookups performed for devices on your network.  While I don’t particularly have anything to hide, it’s also not information I want freely available for anyone to review either.

This post will detail how to configure authentication on the Pi-hole admin page.  One of the Pi-hole developers (Jacob Salmela) has a pretty detailed set of instructions on how to enable this (kudos for the info), but I found that with my Linux/Pi-hole newbness, there were some gaps I had to fill in, and figured maybe someone else will find this useful as well.

  1. Open an SSH session to your Raspberry Pi.  The first step in this process is to create a password file in a hidden directory.  This password file will be hashed in a later step.Enter the command “sudo mkdir /etc/lighttpd/.htpasswd”
  2. Change to the hidden directory by entering the command “cd /etc/lighttpd/.htpasswd”.
  3. This step creates a script that will hash a user’s password.  Enter the command “sudo touch [filename.sh]”.  Then, enter the command “ls” to verify the script exists in the directory.  I called my file “hashme.sh”.
  4. Now, we will need to add the following content into the script file by entering the command “sudo nano hashme.sh” to modify it in nano (text editor).
    hash=`echo -n “$user:$realm:$pass” | md5sum | cut -b -32`
    echo “$user:$realm:$hash”
    After you’ve pasted in the script content, enter “Ctrl+X”, then “Y” to save the changes, then hit “Enter” to accept the “File Name to Write”.
  5. Now we need to make the file executable by entering the command “sudo chmod 755 [filename.sh]
  6. In this step we will run the script with three arguments (user, realm, password) which will then get hashed.  Enter the command “sudo ./[filename.sh] ‘[username]’ ‘[realm]’ ‘[password]’”.  The output will look something like “username:realm:[string of numbers and letters].  Copy the output to your text editor of choice (outside of the SSH session) as we will need it in the next step.
  7. Now we will create the password file.  You will paste the output of the previous command into nano after issuing this command “sudo nano /etc/lighttpd/.htpasswd/lighttpd-htdigest.user”.
    Once you’ve pasted in the output from the previous step, enter “Ctrl+X”, then enter “Y” to save the changes, and then “Enter” to accept the file name to write to.
  8. I found the next couple of steps to be a bit hard to understand in the developer’s blog post (mainly, where exactly the code had to be inserted).  It took a few tries to get it right, so I recommend backing up the lighttpd.conf file prior to making any changes – it makes recovering from a problem easy.  Because we are still in the “hidden” .htpasswd directory, enter the command “cd ..” to go up one directory.First, we will back up the lighttpd.conf file by entering the command “sudo cp lighttpd.conf lighttpd.conf.bak”.
    Then enter the command “ls” to verify the backup file exists.
    If you need to rollback the changes made to the lighttpd.conf file, just enter the command “sudo cp lighttpd.conf.bak lighttpd.conf” and the unmodified file will be restored.
  9. Now that we’ve made our backup of the lighttpd.conf file, it’s time to modify the original.  Enter the command “sudo nano /etc/lighttpd/lighttpd.conf”.The highlighted section below is where we will be pasting in the additional content.  Hit “Enter” at the arrow.
    Copy the following text and place it in the blank space created by your “enter” key strikes:backend = “htdigest”
    auth.backend.htdigest.userfile = “/etc/lighttpd/.htpasswd/lighttpd-htdigest.user”auth.require = ( “/path/to/protect/” =>
    “method”  => “digest”,
    “realm”   => “myrealm”,
    “require” => “valid-user”
    )Change the “auth.require = ( “/path/to/protect/” =>” field to “auth.require = (“/admin/” =>
    Then hit “Ctrl + X”, then “Y”, then “enter” to save the changes.
  10. Now we need to restart the lighttpd service by entering the command “sudo service lighttpd restart”.
  11. If your changes to lighttpd.conf were successful, you should receive no errors and go right back to the command prompt.
  12. Now, you need to go to your admin page and see if you are prompted for credentials.  If you’re currently logged into the admin page, hit “Ctrl + F5” or try opening the page in a private/incognito window.  Enter the username and password configured in the previous step, and you should log right into the admin page.


Nuke Ads from Orbit with Raspberry Pi and Pi-hole [it’s the only way to be sure]

The Problem:

Earlier this week, I turned on my new Samsung “smart TV” and was greeted with a notification that I needed to accept “new terms and conditions”, and that I would “be sent targeted ads” [i.e. they’re collecting your data and selling it (you)]. Not only that, but if I declined, “certain smart features of the TV may no longer work”.  After “declining”, none of my apps worked.  Say what now?  You mean to tell me you’re going to neuter the TV I paid for if I don’t agree to being pimped out?  Bad move, Samsung.  This has obviously ruffled a lot of feathers, and rightly so.  There is a “mega thread” on Reddit that goes into great detail about it.

Imagine watching Netflix and having an ad or “commercial” pop up thanks to your smart TV?  Yeah no.  What’s likely occurring is that Samsung is subsidizing the cost of their TV’s with the revenue generated by their advertisements.  The fact I just bought a mid-upper range 40″ Samsung 4K smart TV for only $275, which just a few years ago would’ve cost over $1,000, is not lost on me.  A clever idea on paper, but horrible in practice, and whichever executive signed off on that idea should be reassigned to the toaster division.

While I don’t like the idea of what Samsung is doing at all, I can either disconnected my TV from my local network and break all the handy smart TV functionality like Netflix, Amazon Prime Video, Youtube, etc. or “deal with it”.  For now, I’ve chosen to “deal with it” by using something called “Pi-hole”, which essentially turns a Raspberry Pi into a DNS server for your local network, which intercepts advertisements being sent to your client devices and replaces them with white space.  While I suppose this may not stop Samsung from collecting the data, it prevents it from disrupting my use of the TV.

While there are browser based plugins like AdBlock that work quite well for computers and mobile devices, that doesn’t help me much with my TV.  There is only one option – nuke the ads from orbit…it’s the only way to be sure.


The Gear:

I ordered a Raspberry Pi 3 kit from Amazon for about $50, which came with a clear case, power supply, two heat sinks, and the Raspberry Pi board itself.  In addition, I purchased a 16 GB class 10 micro SD card for about $6 bucks (side note, can’t believe how cheap high capacity removable flash media is now).

I won’t bother with the assembly instructions as it’s pretty straight forward, but figured I would create a post that detailed the steps required to install the Raspberry Pi operating system (Raspbian Jessie Lite which is the GUI / desktop-less version), configure basic settings, and install Pi-hole.

The Solution:

  1. Download the latest version of Raspbian Jessie Lite from https://www.raspberrypi.org/downloads/raspbian/ and extract the .IMG file from the .ZIP archive.  The light version contains no GUI, which is probably unnecessary anyway for the way this Raspberry Pi will be used.
  2. Download Win32 Disk Imager from https://sourceforge.net/projects/win32diskimager/ and install it.
  3. Download SD Card Formatter from https://www.sdcard.org/downloads/formatter_4/ and install it.
  4. Plug in the SD card to your computer and launch the SDFormatter App.  Click the “Format Option – Option” button and set “Format Size Adjustment” to “ON”, and then click “Format”.
  5. Next, run Win32 Disk Imager.  Click the “Browse” button and browse to Raspbian Jesse .IMG file you extracted from the .ZIP archive.
    Ensure the correct drive letter for your SD card is selected under “Device” and then click “Write” to burn the .IMG file to it.  This may take several minutes to complete.
    Once it is done writing the image to your SD card, you should receive a success message.
  6. Eject your SD card, insert it in the Raspberry Pi, and power the unit up.  It will be easiest if you hard wire the Raspberry Pi to your router or switch using the ethernet port.  You will need to attach a monitor and keyboard as well so that you can enable SSH for remote administration.
  7. Login to the Raspberry Pi with the default credentials – username = pi and the password = raspberry
  8. Perform the initial configuration by entering the command “sudo raspi-config”.
  9. The “Raspberry Pi Software Configuration Tool” window will open, and a variety of options for configuration will be displayed.The first step we will do is to enable SSH so that we can access the Raspberry Pi from a remote system for the remainder of the configuration.Select “7 Advanced Options” and then “A4 SSH”.  Answer “Yes” when asked “Would you like the SSH server to be enabled?”.

    Select “Finish” from the raspi-config menu and answer “Yes” when asked to “Reboot now?”.  From this point on, you should be able to perform the rest of the configuration from a remote system at the comfort of your own desk.

  10. Assuming you have DHCP enabled on your router or switch, your Raspberry Pi should’ve received an IP address. You will need to know this IP address to connect to use SSH from another system.  I logged into my router’s admin portal and found a device named “raspberrypi”, which is the default name given, and then noted its IP address.
  11. Open an SSH session to the Raspberry Pi’s IP address and login using the default credentials.
  12. Once you’re logged in to the SSH session, you will see a warning that SSH is enabled but the default credentials have not been changed, which poses a security risk.  We will be changing this, among other settings, by running the raspi-config wizard again.  Enter the command “sudo raspi-config”.
  13. First, we will choose “1 Expand Filesystem” to utilize the remainder of the SD card.  Mine is 16 GB and it’d be nice to have it all available for use.  You will see a message that the root partition has been resized and it’ll require a reboot to complete.  No need to reboot yet though.
  14. Next, choose “2 Change User Password”.  Click “Ok” on the next window, and then you will be prompted to enter and re-enter a new password.  Assuming you entered the password correctly twice, you should see a success window.
  15. Next, choose “4 Internationalisation Options”.  We will be configuring our timezone and Wi-fi County here.  Select “I2 Change Timezone” and select your major region, then your applicable timezone.
  16. Next, select “7 Advanced Options” and then “A2 Hostname”.  This will allow us to change the default hostname from “raspberrypi” to something custom.  Perhaps you have a naming convention on your network that you need to adhere to, or you don’t want it to be blatantly obvious what the device is used for based on the name.  Enter your hostname then select “Ok”.
  17. Now that most of the basic configuration items have been addressed, return to the main raspi-config menu and choose “Finish”. A reboot will be required to apply the changes.
  18. Open a new SSH session to the Raspberry Pi and login with your new credentials.  You will notice that the terminal now shows the customized hostname you selected in the previous steps.
  19. Now that we have basic configuration and network connectivity, it is time to download and install any updates available for your Raspberry Pi.  The first command you need to enter is “sudo apt-get update”.  You should see the updates begin downloading.
  20. When you see the message “Reading package lists… Done”, the download is complete and it’s time to install the updates.  Enter the command “sudo apt-get dist-upgrade”.  You will be notified that some amount of additional disk space will be required, and you will need to answer “Y” to continue.  Depending on the amount of updates available, it could take a few minutes to complete.  Once it’s done installing updates, enter “sudo reboot” for good measure.
  21. Now we will install the Pi-hole software.  Log back into the Raspberry Pi with an SSH session and enter the command “curl –L https://install.pi-hole.net | bash”.  Alternatively, if you don’t want to pipe to bash, you can use the “alternative semi auto installation” instructions located here.
    The install script will launch, which performs various prerequisite checks and downloads the necessary files before launching the “Pi-hole automated installer” wizard.
  22. The first message you will see is a notification that “the installer will transform your device into a network-wide ad blocker”. Well, that is why we’re here after all.  The next window notifies you that the software is free and open sourced, but “powered by your donations”.  If you like the results, kick a little money their way.
  23. The next window states that you need to use a STATIC IP address, since it is after all a server and if its IP were to change, it’d break your Pi-hole DNS service.  Because we never set a static IP address in any of the previous steps, we will have the chance to do so now.
  24. Now you will be asked to select an interface.  “eth0” is the Ethernet port on your Raspberry Pi, and “wlan0” is the WiFi adapter.  I am going to hard wire my Pi-hole to my router for the simplest and most reliable service, so I have selected “eth0”.
  25. The next window asks you to select which protocol(s) to use – since I am not using IPv6 on my network, I’ve left just IPv4 selected.
  26. The next window displays the current IP information and asks if you’d like to use that as your static address.  Since I currently have a DHCP address leased, I do not want to configure a static IP with this address, so I’ve selected “No” and will enter the new information on the next window.  If you decide to reuse the IP address issued to you by DHCP instead of one outside the DHCP pool, it’s possible that a duplicate IP address could be issued (depending on how smart your router/switch is) and cause an issue.
  27. Enter your IP address in “CIDR notation” – meaning that instead of specifying an IP and subnet mask like “”, you’d enter the IP like “”, implying that it’s a 24 bit mask.
  28. Most likely, the default gateway you received from DHCP will be the same one you want to use when configuring the static IP.
  29. Verify the information is correct and if so, answer “Yes”.
  30. The next screen asks you to select an “Upstream DNS Provider”.  I use OpenDNS currently and have selected that for use by the Pi-hole.  OpenDNS may give you some additional filtering flexibility as opposed to your local internet service provider’s DNS service.
  31. You will be shown a summary of your DNS configuration – if everything looks correct, select “Yes”.
  32. You will see some commands execute in the background, and then the “Installation Complete” window should appear.  It tells you that in order for the Pi-hole to do its job, the devices on your network need to use it as their DNS server.  Also, since we did assign a new IP address as part of the configuration, a reboot will be required.  Select “Ok”, and then enter the “sudo reboot” once you’re back at the terminal prompt.
  33. At this point, the Pi-hole is ready for use by your clients.  If you’re using DHCP on your router or switch, the easiest way to accomplish this is probably to modify your DHCP options so that the Pi-hole’s IP address is handed out as the DNS server.  If you have static IP addresses set on any of your devices, you will have to modify their DNS server information manually.The option in red below “inserts” my routers IP address into the DHCP config as an available DNS server.  For purposes of verifying that the Pi-hole is doing its job, I have disabled this setting which forces the clients to only use the Pi-hole and nothing else.  For long term use, you could either configure a public DNS server like OpenDNS or Google in the “DNS Server 2” field, or set “Advertise router’s IP in addition….” To “Yes”.
  34. Now, we will test that the Pi-hole is doing its job and blocking ads.  Before doing this testing, be sure to disable any browser-based ad blockers like “AdBlocker” so that we don’t mask the results of the test.  Also, since you’ve already set your DHCP options to use the Pi-hole, you will need to manually override your device’s IP settings to use the router, or something other than the Pi-hole, as your DNS server.Now, pick a site that is chock full of ads – while I’m not the Hollywood gossip type, I figure they spam you pretty well, so I went to www.thehollywoodgossip.com and sure enough, several Amazon ads were there (I guess someone has been shopping for pastel colored Yeti tumblers in this house).
  35. Now that we have our “control”, go ahead and revert to using the Pi-hole as the DNS server on your device.  Refresh the page (Ctrl + F5 if using Windows) and you shouldn’t see any ads this time.rbp26
  36. Also, the Pi-hole hosts a webpage that gives you ad blocking statistics, which I find really interesting.  In your browser, navigate to http://%5Byour-pihole-ip%5D/adminIn just a few minutes of time, I can see that 16.5% of my internet traffic has been for ads – not an insignificant amount.  It’s also worth noting that this page is open for access by default, so anyone that knew the IP of your Pi-hole and was aware of how to pull up the “admin” page could open your query log and see everything you’ve accessed on the internet.pi-hole-stats

    I found a good blog post that details how to add authentication to force password protection to a page hosted on your Raspberry Pi.  This is probably a good idea to do, for obvious reasons.  I haven’t had a chance to try yet, but will update this post when I get it setup.

Installing Nutanix NFS VAAI .vib on ESXi Lab Hosts

This post covers the installation of a Nutanix NFS VAAI .vib on some “non-Nutanix” lab hosts.

Why would one do this?  Several months ago I stood up a three node lab environment accessing “shared” storage using a Nutanix filesystem whitelist (allows defined external clients to access the Nutanix filesystem via NFS).  While the Nutanix VAAI plugin for NFS would normally be installed on the host as part of the Nutanix deployment, it obviously was not there on my vanilla ESXi 6.0 Dell R720 servers accessing the whitelist….which made things like deploying VM’s from template and other tasks normally offloaded to the storage unnecessarily slow.

Since Nutanix just released “Acropolis Block Services / ABS” GA in AOS 4.7 (read more about it at the Nutanix blog) there’s probably less of a reason to use filesystem whitelists for this purpose now, but alas, maybe someone will find it useful (*edit* – it’s worth noting that ABS doesn’t currently support ESXi.  I haven’t tried to see if it actually workyet but needless to say, don’t do it from a production environment and expect Nutanix to help you *edit 1/27/17* as of AOS 5.0 released earlier this month, ESXi is supported using ABS)  At the time of this blog post, Windows 2008 R2/2012 R2, Microsoft SQL and Exchange, Red Hat Enterprise Linux 6+, and Oracle RAC are supported.  NFS whitelists aren’t supported by Nutanix for the purpose of running VM’s, either.

  1. The first step is to SCP the Nutanix NFS VAAI .vib from one of your existing CVM’s.  Point your favorite SCP client to the CVM’s IP, enter the appropriate credentials, and browse to the following directory:/home/nutanix/data/installer/%version_of_software%/pkg2016-06-27 07_49_20-PhotosCopy the “nfs-vaai-plugin.vib” file to your workstation so that it can be uploaded to storage connected to your ESXi hosts using the vSphere Client.
  2. Once the .vib is uploaded to storage accessible by all ESXi hosts, SSH to the first host to begin installation.  You may need to enable SSH access on the host as it’s disabled by default.  This can be done by starting the SSH service in %host% > Configuration > Security Profile > Services “Properties” in the vSphere Client.
  3. Once logged in to your ESXi host, we can verify that the NFS VAAI .vib is missing by issuing the “esxcli software vib list” command.vib-listIf the .vib were present, we’d see it at the top of the list.
  4. Now we need to get the exact path to location you placed the .vib on your storage.  This can be done by issuing the “esxcli storage filesystem list” command.  You will be presented with a list of all storage accessible to the host, the mount point, the volume name, and UUID.storage-listHighlight the “mount point” of the appropriate storage volume so that we can paste it into the next command.  Alternatively, you could use the “volume name” in place of the UUID in the mount point path, but this was easier for me.
  5. Next, we will  install the .vib file using the “esxcli software vib install -v “/vmfs/volumes/%UUID_or_volume_name%/%subdir_name%/nfs-vaai-plugin.vib”” command.  I created a subdirectory called “VIBs” and placed the nfs-vaai-plugin.vib file in it.  Be careful as the path to the file is case sensitive.vib-installIf the install was successful, you should see a message indicating it completed successfully and a reboot is required for it to take effect.  Assuming your host is in maintenance mode and has no running VM’s on it, go ahead and reboot now.
  6. Once the host has rebooted and is back online, start a new SSH session and issue the “esxcli software vib list” command again and you should see the new .vib at the top of the list.install-confirmationVoila!  You can now deploy VM’s from template in seconds.itsbeautifulmeme

Nutanix – Taking It to the .NEXT Level

I was happy to participate in the opening “Nutanix Champion” event to “ring in” the day one keynote.  When I got back stage during the rehearsal, it was evident a lot of people worked real hard to do something fun for the opening acts (Angelo Luciani, Julie O’Brien  and surely many more…so, kudos to you!).



Picture credit to https://twitter.com/@ClaireBelly

And now, my take on some of the announcements today….

Acropolis File Services:

This announcement fits into the recurring theme of “power through software” – leveraging commodity hardware to deliver additional value based on software upgrades and enhancements.  Acropolis File Services allows you to leverage your existing investment to expose the Nutanix file system for scale out file level storage.

Initially SMB 2.1 will be supported with other protocols (NFS, never versions of SMB, etc) on the roadmap.  AHV and ESXi hypervisors will be supported at GA.  Other features include user driven restore leveraging file level and server level snapshots (think file level recovery and disaster recovery, respectively) with asynchronous replication on the roadmap for Q4.

There are some interesting use cases I can see for this, such as user profile storage and replication for desktop and application virtualization environments and low cost scale out file services using the included Acropolis Hypervisor + Nutanix storage nodes.

Acropolis Block Services:

Similar to Acropolis File Services, Acropolis Block Services exposes the Nutanix file system as an iSCSI target for bare metal servers and applications.  Though the file system is exposed to a bare metal workload, all the great Nutanix features are preserved for them (snapshot, clone, data efficiency services, etc).  Again, this is a demonstration of “power through software” and the evolution of the platform – these features and support were first available for VM’s, then files, and now bare metal.

The Nutanix file system is presented via the iSCSI protocol a little differently than iSCSI is normally implemented.  Instead of resiliency built into the protocol being leveraged (ALUA, multipathing, etc).  Multipathing is handled by the back end, paths are managed dynamically and in the event of a node failure, failover is handled on the backend.  While “best practices” for iSCSI are usually well documented based on the vendor or platform, not having to rely on as much client side configuration and optimization removes the human element and thus risk for “PEBKAC” issues.  I’ve seen the “human element” manifest itself in more than one iSCSI implementation.

I see this feature being a big deal for shops that have both hyperconverged and traditional 3 tier deployed in the same datacenter.  Due to some extenuating circumstance (like a crappy software vendor that STILL doesn’t support virtualization in the year 2016) or an investment in physical servers the business wishes to extract value from, physical servers and/or traditional SAN storage must exist in parallel.  Being able to present traditional “block” storage to a bare metal server or app may remove that last roadblock on the journey.

“All Flash on All Platforms”

As the cost of flash continues to plummet, it continues to become more pervasive in the datacenter.  The capacity of SSD’s has surpassed that of traditional spindles (though at a cost right now) and I foresee a day where “flash first” is the commonplace policy.  As such, starting with the Broadwell / Nutanix G5 platform, all flash config will be available on all platforms. 

Microsoft Cloud Platform System Loaded from Factory

This was announced last week, but in a nutshell Nutanix and Microsoft collaborated to offer the Microsoft Cloud Platform Standard (CPS) installed from the factory.  This offers a more turnkey private cloud that accelerates the time to value and allows for “day one” operation. All “patching operations” are integrated into the Nutanix “One Click Upgrade” platform, further streamlining the day to day care and feeding that historically has burned up so much administrator time.

In addition, Nutanix will support the entire stack from the hardware up to the software, just like they have been doing for vSphere, Hyper-V, and AHV already.  There’s a lot to be said for “one throat to choke” when it comes to technical support.  I’m sure we’ve all been an unwilling participant in the “vendor circular firing squad” at some point.  My experience with Nutanix support has always been excellent, both from the technical capability of the support engineers as well as the customer service they deliver.

Prism Enhancements

Some big enhancements are coming to Prism.  Building upon “Capacity Planning” in Acropolis Base Software 4.6, “What if?” modeling will be added.   Instead of just projections based on existing workloads, you’ll be able to model scenarios such as onboarding of a new client or introduction of a new application or service at a granular level.

One of the benefits of the “building block / right sized” hyperconverged model is being able to accurately size for your existing workload, allowing for sufficient overhead, without overbuying based on best effort projections where your environment might be 3 years out.  Calculation based on existing utilization and growth was the first step, and then modeling “what if _____” is the next evolution of accurately projecting the next “building block” required to meet compute, IOPS, and capacity needs for “just in time forecasting”.  Maybe a “buy node now” button is in order through Prism 😛

Network Visualization

Enhancements to Prism will allow for quick configuration and visualization of the network config in AHV – both the config in the hypervisor and the underlying physical network infrastructure.  This makes finding the root cause of an issue much quicker and lowering the overall time til resolution by being more aware of the underlying network infrastructure.  Josh Odgers has a great blog post covering this with some nice screenshots of the Prism UI so I won’t bother reinventing the wheel http://www.joshodgers.com/2016/06/15/whats-next-2016-prism-integrated-network-configuration-for-ahv/

Community Edition:

Another notable milestone in the Nutanix ecosystem is 12,000+ downloads of Community Edition to date (and nearly 200 activations a week).  Hosted Community Edition trials will be available free in two hour blocks through the Nutanix portal as a “test drive”.  Another option for getting your hands on Nutanix CE are to install it on your own “lab gear” – Angelo Luciani has a great blog post on using an Intel NUC (are multiple NUC’s NUCii?)  https://next.nutanix.com/t5/Nutanix-Connect-Blog/The-Prestige-Continues-Community-Edition/ba-p/10399

Or….maybe on a drone?


Other Interesting Notes:

During the general session, some statistics were presented regarding the adoption of Acropolis Base Software 4.6.  500 clusters were updated to 4.6 within 7 days of release, and an overall 43% adoption within 100 days.  It was also noted that there was a significant performance increase available in 4.6 and as such 43% of customers received up to a 4x performance increase at no cost – I’ll say it again, power through software.


Another statistic I found interesting was that 15% of the customer base is now running AHV.  I suspect that percentage will increase significantly over the next 12 months with all the new features now native to AHV combined with the ease of “One Click” online hypervisor conversion.

A Picture to Sum It Up:

As someone who’s dealt with infrastructure that was everything but invisible, I think this says it all…


“Power Through Software”:


Other Blogs to Check Out:

I know I didn’t capture all the announcements here, but some other good blog posts I’ve seen today are worth a read…

Josh Odgers:

http://www.joshodgers.com/ (there’s a whole series here titled “What’s .NEXT 2016”

Marius Sandbu:


Eduardo Molina: