When is your Always On VPN not Always On?
An Always On VPN client uses a machine certificate to connect to the VPN gateway and connect to the network on startup. This feature is wonderful since it allows VPN clients to process machine group policies and even makes it easy for users with expired passwords to reset their passwords. But how can you verify that the machine channel is connected if you can’t see any indicators on the logon screen? As soon as I log in, the VPN client connects with my user credentials.
I have had a long-held suspicion that our Always On VPN client hasn’t been working as advertised, however it was generally on nights/weekends when working from home, so I rarely put any thought into verifying it. Plus it was intermittent and I never could nail down what I felt was the issue. After all, surely the VPN admins would have noticed something by now right?
In my last post I talked about building a Hyper-VM VM and using DJOIN to build VPN-enabled VMs from home. Consider this Part 2 of that process. In that post, I mentioned that I had to log in after each reboot of the Task Sequence to reconnect the VPN since the machine channel didn’t appear to be working. Today I finally spent some time testing it and SPOILER….I was right!!
Schrödinger’s VPN – If I can’t see that the VPN client is connected, is it connected? If it’s connected when I log in, was it connected all along, or did it just connect when I logged in? How can I verify connectivity without logging in?
Getting familiar with the client
There were probably several other ways to tackle this issue, but I ultimately wanted a way to witness the machine tunnel connection progress.
The first problem here is that I don’t manage our VPN client so I don’t REALLY know how it’s SUPPOSED to work or even have access to the configuration console. I’m just a consumer of it.
I know that we hide the client’s credential provider from the logon screen for…reasons. I re-enabled it for my testing and while it does give a visual indicator showing connectivity, it doesn’t tell you what’s broken. I also wasn’t sure sure if it was telling me that the machine channel was connected or the user channel.
I also found the client log files and discovered how to enable DUMP 💩 logging level, which includes all the crap in logs. (poop jokes – you’re welcome). This helped greatly since I was able to follow along with the background processes and determine the root cause.
1337 hax0rz ski11z
I just stole that heading from the internet, just like I stole my “solution” from Johan who apparently stole it from some other folks on the internet. This one’s been around for a while. I’ve used it several times and think it’s a very useful tool, as long as you revert your changes and you don’t bake this into every image you deploy!
DISCLAIMER – This is a BIG security risk. Do not leave this in production.
The hack is pretty simple. You replace either sethc.exe (hit left shift 5 times) or utilman.exe (the little icon next to the network icon on the Windows 10 logon screen) in C:\Windows\system32 with a renamed copy of cmd.exe. You can do this via WinPE/WinRE or within Windows with Admin rights. I mounted my Hyper-V hard disk and followed the steps below .
From Windows/Hyper-V Mounted Disk
- Navigate to c:\Windows\system32
- Find either sethc.exe or utilman.exe and take ownership
- Rename the file to sethc.exe.OLD or utilman.exe.old (This is important since you’ll want to reverse this when you’re done.)
Once you’ve taken ownership, you can copy cmd.exe from the same location (c:\Windows\System32) then rename it to the file you just took ownership of. Double click the new file and a command window should open.
I’m sure someone has a script for this, I just haven’t looked. If you want to do it within WinPE, just follow Johan’s steps here.
It May Not Work
Apparently, this may not work if you have certain security tools – like Windows Defender – running on the machine, so go ahead and rip those things off while you’re at it. I mean, you’re already doing some bad stuff anyway, what’s a little bit more insecurity?
Checking for a Dead Cat
If you made it this far, you can start checking for a dead cat, er, connected VPN client. Simply boot up your machine and depending on which file you used, either hit SHIFT 5 times quickly or click on the Ease of Access icon in the lower right next to the network connectivity icon. Command Prompt will open and now you can run most Windows commands.
The first step was to check to see if the VPN connection was up. It wasn’t. I only had my internal network’s IP address on my wired/wireless adapter. I checked with our VPN admin and she confirmed that I should have an IP address at that stage (which I assumed, but wasn’t risking it).
Next I opened services.msc and found the VPN service and confirmed that it was started.
Then it was on to the VPN client Logs! When I looked at the logs, I found the issue. The VPN gateway configuration sends the client the internal PKI Offline Root CA and Intermediate CA(s) certs. In our case, we have 2 issuing CAs and our clients can get their domain computer auth certs from either one, but never the same cert template from both servers. In our config, we had only uploaded one of the Intermediate CAs on the gateway. This means that clients with certs issued by the missing CA could never validate their cert against the cert chain.
In the end, I can finally be at peace with the quantum state of our VPN Client and I can answer the question “When is your Always On VPN not Always On?” with “When the Intermediate issuing CA is missing from the certificate validation chain that’s issued by the VPN gateway server to the clients upon initializing their VPN connection at device boot time.”
I need to get out of my house! Hope you found this helpful or at least cathartic. I know we’ve all been there.