Over the last several installs of vRealize Automation (formerly vCloud Automation Center), I have encountered a handful of caveats that have caused issues requiring quite a bit of troubleshooting to resolve. Hopefully this information will save time for others working on a fresh configuration.
1. vCAC/vRA fails to register to SSO
In one particular use case, I was utilizing a vCSA that was setup in a lab as the SSO/Identity requirement as well as being the endpoint for deployments. When the vCSA was originally deployed, there was no naming scheme determined and it was deployed by IP address as the host name and later changed after a naming scheme was decided upon. The point here is that the name changed for the Identity component. We noticed the issue when we were configuring the connection from a vCAC 6.1 Appliance to the SSO source. The error was ‘Invalid “Host Settings” in the remote SSO server. Expected: <SSO Hostname>’. An interesting thing to point out is that each time I tried to reproduce the scenario, the SSO hostname it displayed was identical to what I had typed in to connect to. So, how did we fix this? Let’s start with what the real issue was. When setting up SSO on either the Identity Appliance or the vCSA it stores the hostname in a txt file located at /etc/vmware-identity/hostname.txt. This is a one-time thing, so changing the name of the appliance does not change this file. When vCAC connects to either of these appliances, it checks this file for validation. Correcting this is as simple as editing the file and changing the text to be exactly the new name of the appliance and attempting the connection from vCAC again. Check out Grant Orchard’s explanation as well.
2. Issues with Manual IaaS DB Install
In some scenarios, I have had to do manual installations of the IaaS database rather than using the installation wizard. At this point I am not sure if the issue is with vRA 6.2 but it was definitely an issue in vCAC 6.1. After doing a manual install of the database, the VRMVersion and VRMDBVersion are not set in the dbo.VRM_ID table, which caused me some grief in setting up a non-standard endpoint using the VMware Common Components Catalog. The error I was presented with was an incorrect DB version. This could possibly cause issues with future upgrades of environments as well. I dug through the scripts that build the DB from the manual process and noticed right away that this was actually the expected outcome as the script was written to do this. I am confident that it was an oversight and after discussing with VMware R&D, we determined that just setting those values in the DB will fix the issue.
3. Compute resources not showing up
This one may seem obvious in some cases, but in complex scenarios such as a deployment across 4 data centers with approximately 50 hosts in each, it was not. To keep things simple, the customer chose to name clusters by their purpose Like Prod, Dev, QA etc. Some locations happened to have the same environments so the cluster names were the same. I didn’t think much of it at first, but when I created a new endpoint to connect to an additional vCenter server that had similar cluster names, the Compute Resources piece of vCAC got confused. This caused us to not show multiple clusters but one cluster and the responding endpoint would change back and forth, I suspect from data collection timing. When viewing the hosts in the Compute Resource, I saw all of the hosts from both clusters. This obviously caused issues with deployments of Catalog Items and needed to be fixed immediately. To do this, we had to remove the endpoint and clean up the vCAC database’s dbo.Host table. Then we renamed clusters and recreated the endpoint.
4. Issues with vCAC certificates
My favorite overall issue to date was with certificates for the environment. I was working with a customer that was building a completely greenfield datacenter and deploying vCAC. The Windows team began working on building new AD servers, DNS, DHCP and PKI. Being greenfield all components were the latest and greatest OS and versions. Where this became a problem that took weeks to hash out completely was with PKI. This PKI environment was built around Windows 2012 R2 CA’s. There was an offline root CA and a signing Intermediary. All certificates were generated with SHA256 and 4096bit encryption. We experienced issues applying these certificates to the Linux appliances but all Windows IaaS servers were fine with the certs. I have worked with certificates quite a bit with both vCAC as well as vCD and felt my overall process was fine. The error messages seen from the appliances when attempting to apply, were as usual, not helpful. Being that there were no requirements listed from VMware on certificates for the environment I had to spend a bunch of time determining what the issue was on my own. Google was of no help (thus why I am writing this now) because not many people are working with a brandnew PKI environment.
So here’s what I found out. I completed an install of vCAC 6.1 in the customer’s lab and had no issues so began picking apart the differences. First, to make sure the appliances themselves were ok, I requested a certificate from the CA in the lab to apply in the production install. This worked just fine, so what was different? In the Lab there was no intermediary CA only a root CA. The certs were SHA1 rather than SHA256. The certs were also 1024bit rather than 4096bit. So where was the actual problem? We built a new 2012 R2 server as a root CA in prod and set things up with SHA1 and 2048bit to test. (I knew we could do 2048bit from past installations.) Attempting to apply this certificate to the appliances still failed. What the heck? Further analysis of the certificates themselves led me to the signature algorithm used between these 2 different PKI’s. In the lab, we were using SHA1RSA and in Prod it was RSASSA-PSS. AH HA! Could this be the issue? It sure was! Microsoft introduced RSASSA-PSS as an option back in 2008 R2 CA’s but in 2012 R2 is now the default. Now that we knew what the problem was, I got on the phone with GSS and discussed the matter in conjunction with my customer. We needed to find out what the Maximums document would look like for certificates in a vCAC/vRA environment. Weeks after handing over all this information to them and assisting them along the way, I bring you KB2106583.
I hope this makes things a little easier since all together the above issues have taken me nearly 2 months to resolve. If you need additional help with your Cloud Management Platform strategy or with your installation of VMware vRealize Automation, contact us.