Month: January 2017

Quest Kace 7.0 Upgrade = Domain Authentication Issues

Over the past couple weeks we had issues with a handful of legacy Windows Server 2003 boxes that would randomly “lose connection to the domain” – they were unable to access other resources on the domain and could not authenticate interactive logons using domain accounts.  Credit goes to my coworkers who were the ones to uncover the source of the issue, but I wanted to get this out there in case anyone else runs into it and bangs their head on the wall trying to figure out WTF happened.

A few days after upgrading the Dell/Quest/whoever KACE appliance to version 7.0 the first round of servers had domain authentication issues and exhibited the following errors in the Event Log:

konea2

It appears as though the “storage” referenced by these two errors is actually lack of memory, not lack of available space on the system drive.

konea1

When attempting to logon to the server with a domain account (local accounts still worked fine):

konea3

“konea.exe” (KACE agent) with ~12K open handles

konea-exe

Killing the konea.exe process restored domain functionality almost immediately (as would a reboot as a side effect of the service restarting) but without disabling the Windows service for it, it’d only be a matter of time before the issue returned.  We ended up disabling the service temporarily until the issue was escalated through KACE support.

Initially we thought the issue was isolated only to 2003 servers, but it occurred on a handful of 2012 servers about a week after the incident on the 2003 servers.  I’m assuming that is due to 2003 not being able to support as many open handles as more modern Windows operating systems, but as the uptime increased on the newer servers, they too would fall victim to it eventually.

While the KACE server version was upgraded to 7.0, the agents deployed on most of the servers were still 6.4.  KACE support stated this should normally not be an issue, but recommended that the agent be upgraded to the same level as the server (7.0) in order to “resolve” the issue.  As of the writing of this post, we are still waiting to hear what actually “broke” with the 6.4 agent > 7.0 server interaction.

They confirmed that an increasing number of other customers were also reporting this issue.  7.0 is still fairly new and I imagine many customers have not moved to it yet, but if they do and the agents are not upgraded at the same time, people will likely run into this issue.  If you are running KACE 7.0 and still have older agents deployed, beware that you could start having servers fall off the domain.

 

Advertisements

VMware NSX Distributed Firewall Rules – Scoping and Direction Matter

I, like I’m sure many of you, were not traditionally firewall or security admins prior to adding VMware NSX to your vSphere environments.  As such, there’s been a bit of a learning curve for me regarding what I knew [or thought I knew] regarding physical firewalls and how that translates [or doesn’t] to the NSX Distributed Firewall (DFW).

As I’ve been rolling out NSX DFW rules to various types of systems with different accessibility requirements, I ran across some unexpected behavior when scoping the rules.

Let’s look at an example 2 tier application consisting of a “web server” and an “app server”.  If this were a traditional physical firewall setup, the web server would probably be in the DMZ, or at least a different subnet from the app server, the traffic would route through the firewall and rules would be applied to allow or restrict traffic.

nsx-firewall-blog-v2

As a theoretical example, for our web tier, we’re allowing HTTP/HTTPS/FTP inbound to the web server from “any” source (presumably, any number of public networks), letting FTP back outbound to “any” destination, DNS outbound to our internal DNS servers, and SMB traffic to the app server where files are stored.  We make the assumption that while FTP traffic may be allowed outbound to any destination, it’s only going to reach that destination if it allows FTP inbound.  Everything else is denied by default.  Pretty straight forward.

For the app server, we’re allowing SMB inbound from “any” source (maybe there are several hundred internal VLAN’s that users could access the server from and it is not accessible externally), RDP is allowed inbound from “any” source, we have some various Active Directory / LDAP related ports open for domain membership, pings are allowed outbound to “any” due to a monitoring application hosted on the server, and DNS is allowed outbound to our DNS servers.  Everything else is denied by default.

Based on these firewall rules, when comparing what traffic is allowed in or out of each server, there is really only one traffic pattern which should match between the two, which is SMB from the web server to the app server (highlighted).

However – everything is not as it seems…

At this point, I have created DFW rules functionally identical to the first diagram in this post.  Let’s go through some various connectivity checks…

dfw-rules-3dfw-rules-4

From the web server, we can access file shares on the app server, thanks to a combination of firewall rule 4 allowing SMB traffic outbound from the web server to the app server, and firewall rule 5, allowing SMB traffic inbound to the app server from “any” source.

dfw-rules-6

From a user workstation, we can pull up the default website on the web server, thanks to firewall rule 1 allowing inbound HTTP traffic from “any” source.  So far so good.

dfw-rules-5

Let’s try to ping it from the same workstation…no dice, and as expected, since ICMP is not allowed anywhere in the rule set “Web Tier” (rules 1 through 4).

dfw-rules-7

Now let’s try the same tests from the app server itself…wait – that’s strange…both ICMP and SMB traffic is allowed from the app tier to the web tier, even though there are no rules applying to the security group containing the web server which specifically allows that traffic in.  Is such a thing even possible?

dfw-rules-meme-2

dfw-rules-8v2

The “problem”…

Let’s use the “Apply Filter” option in the Distributed Firewall to determine which rule(s) are to blame.  I specified the “Source” as the app VM, the “Destination” as the web VM, changed the action to “Allow” (this could also be handy to see what rule was blocking traffic you thought should be allowed by choosing the “Block” option), and then selected ICMP as the “Protocol”.

dfw-rules-9v2

And now we can see that Rule 1038 that allows the Security Group containing the app VM to send ICMP traffic to “any” destination has matched the filter.

dfw-rules-10

When I think of firewall rules in the “traditional” manner, I would expect allowing outbound ICMP from our application server to a destination of “any” wouldn’t also imply that ALL VM’s in my NSX environment should also allow that traffic inbound.  The whole point of “zero trust” and “default deny” is that unless traffic is explicitly allowed, it should be denied.  Perhaps to someone who comes from a network/security background and has used many different firewalls, this would be seen as expected behavior in certain scenarios – but that is not intuitive to this virtualization guy.

In a nutshell, there are a couple things in play here…

  1. Scoping matters.  By selecting a destination of “Any”, NSX truly means ANY.  Even though you may not have allowed a particular traffic type inbound on some unrelated system, because we have this “Any” rule, our application server can talk to it over that protocol.  I can see this being particularly problematic in a multi tenant environment, or maybe some kind of PCI environment where you have to prove a definitive dividing line between different systems.  One improperly scoped rule later and you have unintended consequences.
  2. Direction matters.  Hidden by default is the column titled “Direction”.  When creating a new firewall rule, this column is hidden, and the default value is “In/Out”, which is the root of our problem here.  If we’d configured Rule 1038’s “Direction” value as “Out”, it wouldn’t have been implied that it should be allowed “In” on the web server.  In my opinion, VMware should not have this column hidden by default, and an administrator should have to choose a direction on the rule without a value being pre-populated.  In addition, I could find no way to manipulate the “Direction” value when using “Service Composer” – the default value is In/Out and there’s no way (at least in the GUI,) to change it.

The “fix”…

The first way to “fix” this issue is to always assign the appropriate directional value to each firewall rule.  Through a combination of “In” and “Out” rules, your traffic should be allowed in the direction you expect without any “unintended consequences”.  The rules are still Stateful, meaning that if we allow ICMP out to “Any” from the app VM (but only in the “Out” direction), that traffic is allowed to return back to the app VM without requiring a second rule stating so.

Add the “Direction” column to your view

dfw-rules-11

Then, click the “Edit” icon next to the “Direction” value

dfw-rules-12

Then select the appropriate value from the “Direction” drop down menu

dfw-rules-13

Let’s go ahead and modify these DFW rules with the appropriate “Direction” and test again.

dfw-rules-16

As you can see here, from the app VM to the web VM, HTTP, SMB, and ICMP which previously worked is now blocked.

dfw-rules-17

Scoping matters…

The other important thing to consider is the rule scoping – in the example above, the web server allows HTTP/HTTPS traffic inbound from “Any” source.  Perhaps in this case the web server is publicly facing and there’s no real need for internal systems to access it directly.  In such a scenario, an IP Set allowing only public IP addresses to communicate with it could be used.

Here I’ve created two “IP Sets” on my NSX Manager.  One contains “all subnets” that I’ve called “ipset_all-networks” with a range of 1.1.1.1-254.254.254.254 and the other is a called “ipset_all-private-networks” with the three private IP spaces specified (if you only use a small part of one private IP space, you could certainly get that granular, too).

dfw-rules-14

Then, I created a Security Group called “sg_all-public-networks”, chose a static member of my IP Set called “ipset_all-networks”, then created an exclusion using my IP Set “ipset_all-private-networks” to block any internal IP address from matching the rule.  I could use this Security Group in place of the “Any” scoping object on my publicly facing web server, or even inverse it so that no public IP’s are allowed when scoping a rule.

dfw-rules-15

Obviously, there are many ways to as they say “skin a cat” with the Distributed Firewall, but as I found out…direction and scoping matter.

Got a better or more efficient way to manage the NSX Distributed Firewall rules?  I’m all ears!  😛