I, like I’m sure many of you, were not traditionally firewall or security admins prior to adding VMware NSX to your vSphere environments. As such, there’s been a bit of a learning curve for me regarding what I knew [or thought I knew] regarding physical firewalls and how that translates [or doesn’t] to the NSX Distributed Firewall (DFW).
As I’ve been rolling out NSX DFW rules to various types of systems with different accessibility requirements, I ran across some unexpected behavior when scoping the rules.
Let’s look at an example 2 tier application consisting of a “web server” and an “app server”. If this were a traditional physical firewall setup, the web server would probably be in the DMZ, or at least a different subnet from the app server, the traffic would route through the firewall and rules would be applied to allow or restrict traffic.
As a theoretical example, for our web tier, we’re allowing HTTP/HTTPS/FTP inbound to the web server from “any” source (presumably, any number of public networks), letting FTP back outbound to “any” destination, DNS outbound to our internal DNS servers, and SMB traffic to the app server where files are stored. We make the assumption that while FTP traffic may be allowed outbound to any destination, it’s only going to reach that destination if it allows FTP inbound. Everything else is denied by default. Pretty straight forward.
For the app server, we’re allowing SMB inbound from “any” source (maybe there are several hundred internal VLAN’s that users could access the server from and it is not accessible externally), RDP is allowed inbound from “any” source, we have some various Active Directory / LDAP related ports open for domain membership, pings are allowed outbound to “any” due to a monitoring application hosted on the server, and DNS is allowed outbound to our DNS servers. Everything else is denied by default.
Based on these firewall rules, when comparing what traffic is allowed in or out of each server, there is really only one traffic pattern which should match between the two, which is SMB from the web server to the app server (highlighted).
However – everything is not as it seems…
At this point, I have created DFW rules functionally identical to the first diagram in this post. Let’s go through some various connectivity checks…
From the web server, we can access file shares on the app server, thanks to a combination of firewall rule 4 allowing SMB traffic outbound from the web server to the app server, and firewall rule 5, allowing SMB traffic inbound to the app server from “any” source.
From a user workstation, we can pull up the default website on the web server, thanks to firewall rule 1 allowing inbound HTTP traffic from “any” source. So far so good.
Let’s try to ping it from the same workstation…no dice, and as expected, since ICMP is not allowed anywhere in the rule set “Web Tier” (rules 1 through 4).
Now let’s try the same tests from the app server itself…wait – that’s strange…both ICMP and SMB traffic is allowed from the app tier to the web tier, even though there are no rules applying to the security group containing the web server which specifically allows that traffic in. Is such a thing even possible?
Let’s use the “Apply Filter” option in the Distributed Firewall to determine which rule(s) are to blame. I specified the “Source” as the app VM, the “Destination” as the web VM, changed the action to “Allow” (this could also be handy to see what rule was blocking traffic you thought should be allowed by choosing the “Block” option), and then selected ICMP as the “Protocol”.
And now we can see that Rule 1038 that allows the Security Group containing the app VM to send ICMP traffic to “any” destination has matched the filter.
When I think of firewall rules in the “traditional” manner, I would expect allowing outbound ICMP from our application server to a destination of “any” wouldn’t also imply that ALL VM’s in my NSX environment should also allow that traffic inbound. The whole point of “zero trust” and “default deny” is that unless traffic is explicitly allowed, it should be denied. Perhaps to someone who comes from a network/security background and has used many different firewalls, this would be seen as expected behavior in certain scenarios – but that is not intuitive to this virtualization guy.
In a nutshell, there are a couple things in play here…
- Scoping matters. By selecting a destination of “Any”, NSX truly means ANY. Even though you may not have allowed a particular traffic type inbound on some unrelated system, because we have this “Any” rule, our application server can talk to it over that protocol. I can see this being particularly problematic in a multi tenant environment, or maybe some kind of PCI environment where you have to prove a definitive dividing line between different systems. One improperly scoped rule later and you have unintended consequences.
- Direction matters. Hidden by default is the column titled “Direction”. When creating a new firewall rule, this column is hidden, and the default value is “In/Out”, which is the root of our problem here. If we’d configured Rule 1038’s “Direction” value as “Out”, it wouldn’t have been implied that it should be allowed “In” on the web server. In my opinion, VMware should not have this column hidden by default, and an administrator should have to choose a direction on the rule without a value being pre-populated. In addition, I could find no way to manipulate the “Direction” value when using “Service Composer” – the default value is In/Out and there’s no way (at least in the GUI,) to change it.
The first way to “fix” this issue is to always assign the appropriate directional value to each firewall rule. Through a combination of “In” and “Out” rules, your traffic should be allowed in the direction you expect without any “unintended consequences”. The rules are still Stateful, meaning that if we allow ICMP out to “Any” from the app VM (but only in the “Out” direction), that traffic is allowed to return back to the app VM without requiring a second rule stating so.
Add the “Direction” column to your view
Then, click the “Edit” icon next to the “Direction” value
Then select the appropriate value from the “Direction” drop down menu
Let’s go ahead and modify these DFW rules with the appropriate “Direction” and test again.
As you can see here, from the app VM to the web VM, HTTP, SMB, and ICMP which previously worked is now blocked.
The other important thing to consider is the rule scoping – in the example above, the web server allows HTTP/HTTPS traffic inbound from “Any” source. Perhaps in this case the web server is publicly facing and there’s no real need for internal systems to access it directly. In such a scenario, an IP Set allowing only public IP addresses to communicate with it could be used.
Here I’ve created two “IP Sets” on my NSX Manager. One contains “all subnets” that I’ve called “ipset_all-networks” with a range of 126.96.36.199-254.254.254.254 and the other is a called “ipset_all-private-networks” with the three private IP spaces specified (if you only use a small part of one private IP space, you could certainly get that granular, too).
Then, I created a Security Group called “sg_all-public-networks”, chose a static member of my IP Set called “ipset_all-networks”, then created an exclusion using my IP Set “ipset_all-private-networks” to block any internal IP address from matching the rule. I could use this Security Group in place of the “Any” scoping object on my publicly facing web server, or even inverse it so that no public IP’s are allowed when scoping a rule.
Obviously, there are many ways to as they say “skin a cat” with the Distributed Firewall, but as I found out…direction and scoping matter.
Got a better or more efficient way to manage the NSX Distributed Firewall rules? I’m all ears! 😛