Quest Kace 7.0 Upgrade = Domain Authentication Issues

Over the past couple weeks we had issues with a handful of legacy Windows Server 2003 boxes that would randomly “lose connection to the domain” – they were unable to access other resources on the domain and could not authenticate interactive logons using domain accounts.  Credit goes to my coworkers who were the ones to uncover the source of the issue, but I wanted to get this out there in case anyone else runs into it and bangs their head on the wall trying to figure out WTF happened.

A few days after upgrading the Dell/Quest/whoever KACE appliance to version 7.0 the first round of servers had domain authentication issues and exhibited the following errors in the Event Log:

konea2

It appears as though the “storage” referenced by these two errors is actually lack of memory, not lack of available space on the system drive.

konea1

When attempting to logon to the server with a domain account (local accounts still worked fine):

konea3

“konea.exe” (KACE agent) with ~12K open handles

konea-exe

Killing the konea.exe process restored domain functionality almost immediately (as would a reboot as a side effect of the service restarting) but without disabling the Windows service for it, it’d only be a matter of time before the issue returned.  We ended up disabling the service temporarily until the issue was escalated through KACE support.

Initially we thought the issue was isolated only to 2003 servers, but it occurred on a handful of 2012 servers about a week after the incident on the 2003 servers.  I’m assuming that is due to 2003 not being able to support as many open handles as more modern Windows operating systems, but as the uptime increased on the newer servers, they too would fall victim to it eventually.

While the KACE server version was upgraded to 7.0, the agents deployed on most of the servers were still 6.4.  KACE support stated this should normally not be an issue, but recommended that the agent be upgraded to the same level as the server (7.0) in order to “resolve” the issue.  As of the writing of this post, we are still waiting to hear what actually “broke” with the 6.4 agent > 7.0 server interaction.

They confirmed that an increasing number of other customers were also reporting this issue.  7.0 is still fairly new and I imagine many customers have not moved to it yet, but if they do and the agents are not upgraded at the same time, people will likely run into this issue.  If you are running KACE 7.0 and still have older agents deployed, beware that you could start having servers fall off the domain.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s