Azure has been updating their signing keys rather frequently (twice in the last month), and each time the keys have been updated, our authentication solution has failed until we've refreshed the Federation Metadata file from Azure.
The documentation from Azure indicates that applications really ought to be able to handle key rollover automatically (see https://docs.microsoft.com/nb-no/azure/active-directory/active-directory-signing-key-rollover) - and indeed, Mellon does handle this gracefully, but only when the key is already known in the Federation Metadata file (known as the MellonIdPMetadataFile in Mellon). However, the failures occur when Azure moves to a new key that Mellon didn't know about in advance.
What's actually happening is that Azure has been adding new keys, and starting to use them after 1-2 weeks. However, since we don't refresh the Metadata automatically, the first Mellon knows about the new key is when it gets used... at which point, we fail to verify the signature, and authentication fails.
The simple & naive solution would be to refresh the Federation Metadata automatically - but that left me with an uncomfortable feeling, which I shall explain below.
However, after a full analysis, it turns out that this naive solution is indeed correct (with a few caveats, which I shall also explain).
Why was I nervous? Consider the authentication flow:
- We send a request to Azure to authenticate a user
- We get a response back with a token, and verify that it really is from Azure by checking the signature on that Token
- We then grant access to the user
- We send a request to Azure to authenticate the user
- It gets redirected to Attacker’s server instead
- We get a response back which actually comes from the Attacker’s server
- We reject that response, because it is not (cannot be!) signed with a valid key
- We fetch the latest federation metadata file from the Azure server
- Again, we get redirected to the Attacker’s server instead, and get a metadata document with his key added
- We send a request to Azure to authenticate the user
- It gets redirected to Attacker’s server instead
- We get a response back which actually comes from the Attacker’s server
- We validate the response, because it’s signed with a key that we’ve been told is valid
Based on this attack scenario, it seems that automatic refreshes of the metadata file need to either be authenticate, or be signed in some way so that we know that this metadata file does really come from a trusted source.
Of course, we already have a mechanism to do this. The Federation Metadata from Azure is served over HTTPS, which means not only is the data encrypted in transit, but (more relevantly) the connection to the server can be authenticated as "yes, this really is the Microsoft Azure server"
The code sample in the Azure documentation has the slightly cryptic comment here:
MetadataSerializer serializer = new MetadataSerializer()
{
// Do not disable for production code
CertificateValidationMode = X509CertificateValidationMode.None
};
Here, the example code is disabling the check that the Metadata is really coming from the expected server... which is rather a bad idea in any code, not just Production.
Indeed, fetching the Federation Metadata automatically is safe, but only if it is done over HTTPS, and the certificate from the HTTPS server is verified as coming from a trusted certificate authority.
Indeed, any MitM can serve a doctored Metadata document over HTTPS with a certificate indicating that it's coming from login.microsoftonline.com - but only Azure will be able to provide a certificate that has been issued to login.microsoftonline.com and has been signed by a suitably trusted Root CA.
To be fully secure then, we should use curl with the "--with-ca-bundle" parameter or wget with the "--ca-certificate" parameter to be 100% sure that the Metadata document has come from a trusted server.