Tuesday, September 26, 2023

About Roaming solutions


Ok... so today was one of those days when someone reach me out because he got to me confused about a concept, and, since there is so much information about the topic the concept got mixed up on his head, so now I’m writing this for future reference in case it happens again.

First, let's start defining what roaming is on Wi-Fi terms, basically is the movement of a wireless client from one AP to another with everything it implies, the association, authentication, derivation of the encryption keys if there are any. Long time ago that last part implied an interruption in the service since there was no methods to “fast roam”; thankfully on today’s world that has been fixed

Cultural parenthesis that I learned on my CWSP, the 802.11 standard name of roaming is “transition”, no one uses it, but is good for you to know in case you suddenly start reading the protocol as it once happened to me.


Roaming on open SSIDs

Let’s take a step back, when a client connects to an Open SSID the process is straight forward since there is no encryption, it will just perform an open system authentication a then an association. So, what happens when the connected client roams? It needs a re-association process and reauthentication, these are the frames that control the roaming are similar but they are not the same and it’s expected from a wireless expert to know the differences on them. On the case you are on an Open SSID, you will only need these frames to re-connect to the next AP without any hiccups in your connection.



Image from cisco.com



Roaming on open PSK

Following the line of stepping back, let’s explain a few concepts youneed to be aware to understand roaming.  From CWSP-206 book


PMKSA – Pairwise Master Key Security Association. The context resulting from a successful 802.1X authentication exchange between the peer and Authentication Server (AS), or from a pre-shared key (PSK)

PMKID – Pairwise Master Key Identifier. The PMKID is an identifier of a security association.

PTKSA – Pairwise Transient Key Security Association. The context resulting from a successful 4-Way Handshake exchange between the peer and Authenticator.


Now that you know these concepts exist let’s start from the simple and go our way up, we already covered open SSIDs so let’s go to PSK. With WPA/WPA2 the client will perform a 4-way handshake after the Open Authentication and associaton, in this case when you roam you will need something similar to the first scenario described, reassociation frame and re-authentication frame, in this type of roaming you won’t need any reauthentication enhancements, since the auth is local it should take about 50-100ms, all this thanks that the authenticator will handle the whole authenticaiton process; this we call a slow roaming.

Slow roam steps

1.     Open System authentication

2.     Association

3.     802.1X/EAP authentication

4.     4-way handshake

 

Roaming with 802.1x 

Here is where the confusion starts, when you use 802.1x authentication we start using fast roaming enhancements; why? Because the normal roaming or the so called slow roaming would be in use otherwise, with 802.1x we are using RSN or a robust security network, that is more complex. On an 802.1x network roaming you have three parts, the client device or supplicant, the AP/WLC or authenticator and the authentication server, the interaction between the 3 parts on a good day will take about 200-300 ms (which should be fine) but it can up to 500ms or more in case the authentication server is not local, which on a live network with voice and video will cause some issues.

If you have work with VoIP or Video networks, you might already know this but UDP traffic does not see a benefit to resend any of the lost packets in a transmission, now if you have a delay of half a second on your roaming you can figure how many packets, we are going to lose that will traduce in choppy voice, audio/video loss or even full drop.

I could keep talking about topics to provide context about roaming for a while so I’ll be selective and add only one more before moving into our solutions, so in wireless we can have intracontroller roaming and intercontroller roaming; the first is when the client roams between APs on the same controller and the second is when you roam between APs connected to different controllers. Why you should care about this? Because whenever you roam between controllers the roaming gets more complex and it will take longer. Same is the case between L2 roaming and L3 roaming, L2 happens when the client stays on the same VLAN and addressing domain when roaming but L3 happens when the AP that you move to can’t provide a IP address on the same addressing space you were working and add an extra factor to our whole process, the DHCP request, we are going to skip that last concept and fix ourself on the security part.

Moving on, up to this point I think you know that what we want is to implement a “good” roaming, for that we need it to work seamless. As a clarificaiton in this post I will assume that you have a proper cell overlap and a good RF health. 

So what are our options to manage the roaming better.

Preauthentication

In the battle to remove the delay in the roaming process this is the first method that came out,  this IEEE standard method is used by the client station when scanning APs it might choose to move, it basically performs a full 802.1x authentication over the ethernet infrastructure for the purpose of remaining connected on-channel with its current AP while preparing the connection to the possible next AP. It does have a couple inconveniences as that it does need to do a full authentication to each of the potential APs that the client can roam to, while doing this for typical roaming conditions if you move around the whole building it might not work as well as we wish and that it might authenticate to APs you will never roam too since the client can’t predict where are you moving too.

Some extra information is that it uses EAPoL frames that are treated as data frames and forwarded to the distribution system, it uses a special Ethertype value of 88-C7 to distinguish this frames as roaming.

The good part about this is that is standard way to roam, so it can be used on any WLAN architecture, but at the end is not very efficient since it will only cut a few milliseconds of roaming time (1 to 3) and as you might figure it does not scale well.

PMK Catching

Pairwise master key (PMK) catching is another method to improve roaming, also known as “Fast Roam-back”, so as you might get from the alternative name it works when you already connected to an AP roam out of the service area and then come back to it.

Basically, it catches the PMKSAs or the security associations on the AP for a certain period of time with the purpose that whenever an client comes back it does not need to complete a full re-authentication but to use the previous keys that were negotiated on the first authentication. To make this work the client must have and keep an PMKID and transmit it to the AP on the re-association request, knowing that the PMKID will be associated with the PMKSA the authentication will be skipped and will move into the 4-way handshake directly.

This is another method that is not considered very effective since it only provides a fast roam back to a previous AP and new APs require a full authentication, on the bright side it does a better job decreasing the roaming time and it does not cause overhead to the network, it scales well and it’s standardized by the IEEE so it is supported on all the WLAN deployments.

Opportunistic Key Caching (OKC)

One of my favorites (I shouldn’t say this… is not standardized!) or at least it was for a while, is a solution that came out a while ago, it needs the interaction of the AP and the client side. The PMK and PMKID are retrieved from the initial authentication to the first AP the client connects, these are distributed to each APs that are possible candidates for the roam; remember that the PMKID is based on the BSSID the AP is using.

Once the key and ID is distributed the client can roam just using a re-association and it can either show it’s PMKID on the frame or the AP can provide it to the client, it makes not much difference on the administrator side since the AP at the end will use the MAC address to recognize the client, match it to a PMKID; now if the client is identified it goes to the 4-way handshake where it indicates that it found a match, if the client is not recognized it sends an EAPoL-Start frame and starts a full auth.

Unfortunately, this method is not supported by every deployment since is not standard; it is widely spread but you might find clients or infrastructure that do not support it, on the bright side it does only use the initial 802.1x authentication so it scales well.

Fast Transition (FT) or 802.11r

This is one of the latest standard on the 802.11, it is a bit complicated so I will try to summarize it as best as I can and will assume you’re familiar with the concepts. Here is a quick list on what you should be familiar with:

PMK, PTK, GMK, GTK, Fast basic service set (BSS) transition, Fast BSS transition 4-way handshake, Fast BSS initial mobility domain, mobility domain, Over-the-air, Over-the-DS, all these concepts can be found with one look into the 802.11 standard and easily found via google, just to let you know the recommendation to be familiar with this does not only comes from me but by the CWNP gurus Lee Badman and Robert Bartz. Spoiler alert, they are right.

As you might know the robust security networks (RSN) and authentication and key management (AKM) follow a process to derive the keys (I will write a post on this later for your reference), from this process is where you will get the PMK-R0 that derives from the PSK, PMK-R1 that derives from the PMK-R0. So basically the 802.11r standard is about allowing non-AP stations to preauthenticate with the AP to which it might roam later, the difference with the “Preauthentication” method is that in a FT BSSID won’t need to do a full authentication to the next AP, instead it will take a PTK (derived from the PMK, an to be more specific the PMK-R1) to communicate with the client, so it basically will take a previously negotiated PMK and derive it’s own keys to talk between AP and client.


Image from cisco.com


I know… this is a bit confusing, but the takeaway is that it takes advantage of what it was already negotiated on the network and skip some steps to save time, let’s say it skips ahead to the point it takes generates the keys to encrypt the communication. One last thing on this idea, the preauth is optional on the standard but the method is normally adopted with it to save even more time while roaming.

Now, to close the idea why I ask you to read about over-the-air and over-the-ds? Quite simple, these are the methods you can share the keys with the next AP. On the over-the-air we send the credential between target AP to roam as its name describes, it simplifies the process reducing the frames for the re-association from 8 to 4 with the fast transition cutting the time to roam in about a half. 

Image from cisco.com


Over-the-DS as you might know send the frames to the AP you’re connected; these are forwarded to the target AP to roam via your wired infrastructure and the new PTK will be created by the client and the new AP.

Image from cisco.com

As complicated as it is this is very advantageous and easy to implement, it is standard and even required by now days voice-enterprise certification, it is considered the most effective roaming method available an to be honest you only need a couple of commands to enable it, the only problem is that the adoption has been slow.

Other roaming solutions

And here you have, you can see the confusion between the methods since all of them are called fast roaming, but some are better than others. There is one other solution that is a single channel architecture, this is a proprietary solution that the controller will manage the roaming between APs where the client will think all the time it is connected to the same AP. I won’t dig deep into this since I’m not a big fan due to the disadvantage that the use of this architecture might have on the RF, must be used discretionally.


P.S. 

If you hang out up to this point you might figure the confusion I talked at the start it was between OKC and 802.11r


Reference articule: Cisco employeer, "Chapter: Chapter 12 - Configuring Mobility Groups", from cisco.com,  September 28, 2011

Reference book: CWSP-206: Certified Wireless Security Professional: Study and Reference Guide by Tom Carpenter. Certitrek publisher.

Thanks for reading 

Dan Lopez

 


No comments:

Post a Comment

Wireless Math

 Hi guys, and happy thanks giving. Today I will keep the short for the sake of the holidays but topic in question is something that I find i...