Ok... so today was one of those days when
someone reach me out because he got to me confused about a concept, and, since
there is so much information about the topic the concept got mixed up on his
head, so now I’m writing this for future reference in case it happens again.
First, let's start defining what roaming is on Wi-Fi
terms, basically is the movement of a wireless client from one AP to another
with everything it implies, the association, authentication, derivation of the
encryption keys if there are any. Long time ago that last part implied an
interruption in the service since there was no methods to “fast roam”;
thankfully on today’s world that has been fixed
Cultural parenthesis that I learned on my CWSP, the
802.11 standard name of roaming is “transition”, no one uses it, but is good
for you to know in case you suddenly start reading the protocol as it once
happened to me.
Roaming on open SSIDs
Let’s take a step back, when a client connects to an
Open SSID the process is straight forward since there is no encryption, it
will just perform an open system authentication a then an association. So, what
happens when the connected client roams? It needs a re-association process and
reauthentication, these are the frames that control the roaming are similar but they are not the same and it’s expected from a wireless expert to
know the differences on them. On the case you are on an Open SSID, you will
only need these frames to re-connect to the next AP without any hiccups in your
connection.
Roaming on open PSK
Following the line of stepping back, let’s explain a few concepts youneed to be aware to understand roaming. From CWSP-206 book
PMKSA – Pairwise Master Key Security
Association. The context resulting from a successful 802.1X authentication
exchange between the peer and Authentication Server (AS), or from a pre-shared
key (PSK)
PMKID – Pairwise Master Key Identifier.
The PMKID is an identifier of a security association.
PTKSA – Pairwise Transient Key Security
Association. The context resulting from a successful 4-Way Handshake
exchange between the peer and Authenticator.
Now that you know these concepts exist let’s start from the simple and go our way up, we already covered open SSIDs so let’s go to PSK. With WPA/WPA2 the client will perform a 4-way handshake after the Open Authentication and associaton, in this case when you roam you will need something similar to the first scenario described, reassociation frame and re-authentication frame, in this type of roaming you won’t need any reauthentication enhancements, since the auth is local it should take about 50-100ms, all this thanks that the authenticator will handle the whole authenticaiton process; this we call a slow roaming.
Slow roam steps
1. Open System authentication
2. Association
3. 802.1X/EAP authentication
4. 4-way handshake
Roaming with 802.1x
Here is where the confusion starts, when you use 802.1x authentication we start using fast roaming enhancements; why?
Because the normal roaming or the so called slow roaming would be in use otherwise, with 802.1x we are using RSN or a
robust security network, that is more complex. On an 802.1x network roaming you have three parts, the
client device or supplicant, the AP/WLC or authenticator and the authentication
server, the interaction between the 3 parts on a good day will take about
200-300 ms (which should be fine) but it can up to 500ms or more in case the
authentication server is not local, which on a live network with voice and
video will cause some issues.
If you have work with VoIP or Video networks, you
might already know this but UDP traffic does not see a benefit to resend any of
the lost packets in a transmission, now if you have a delay of half a second on
your roaming you can figure how many packets, we are going to lose that will
traduce in choppy voice, audio/video loss or even full drop.
I could keep talking about topics to provide context about roaming for a while so I’ll be selective and add only one more before moving into our solutions, so in wireless we can have intracontroller roaming and intercontroller roaming; the first is when the client roams between APs on the same controller and the second is when you roam between APs connected to different controllers. Why you should care about this? Because whenever you roam between controllers the roaming gets more complex and it will take longer. Same is the case between L2 roaming and L3 roaming, L2 happens when the client stays on the same VLAN and addressing domain when roaming but L3 happens when the AP that you move to can’t provide a IP address on the same addressing space you were working and add an extra factor to our whole process, the DHCP request, we are going to skip that last concept and fix ourself on the security part.
Moving on, up to this point I think you know that what we want is to implement a “good” roaming, for that we need it to work seamless. As a clarificaiton in this post I will assume that you have a proper cell overlap and a good RF health.
So what are our options to manage the roaming better.
Preauthentication
In the battle to remove the delay in the roaming
process this is the first method that came out, this IEEE standard method is used by the
client station when scanning APs it might choose to move, it basically performs
a full 802.1x authentication over the ethernet infrastructure for the purpose
of remaining connected on-channel with its current AP while preparing the
connection to the possible next AP. It does have a couple inconveniences as that it does need to do a full authentication to each of the potential APs that the
client can roam to, while doing this for typical roaming conditions if you move
around the whole building it might not work as well as we wish and that it
might authenticate to APs you will never roam too since the client can’t
predict where are you moving too.
Some extra information is that it uses EAPoL frames that are treated as data frames
and forwarded to the distribution system, it uses a special Ethertype value of
88-C7 to distinguish this frames as roaming.
The good part about this is that is standard way to roam, so it can be used on any
WLAN architecture, but at the end is not very efficient since it will only cut
a few milliseconds of roaming time (1 to 3) and as you might figure it does not
scale well.
PMK Catching
Pairwise master key (PMK) catching is another method
to improve roaming, also known as “Fast Roam-back”, so as you might get
from the alternative name it works when you already connected to an AP roam out
of the service area and then come back to it.
Basically, it catches the PMKSAs or the security
associations on the AP for a certain period of time with the purpose that
whenever an client comes back it does not need to complete a full
re-authentication but to use the previous keys that were negotiated on the
first authentication. To make this work the client must have and keep an PMKID and
transmit it to the AP on the re-association request, knowing that the PMKID will
be associated with the PMKSA the authentication will be skipped and will move
into the 4-way handshake directly.
This is another method that is not considered very
effective since it only provides a fast roam back to a previous AP and new APs
require a full authentication, on the bright side it does a better job
decreasing the roaming time and it does not cause overhead to the network, it
scales well and it’s standardized by the IEEE so it is supported on all the
WLAN deployments.
Opportunistic Key Caching (OKC)
One of my favorites (I shouldn’t say this… is not standardized!)
or at least it was for a while, is a solution that came out a while ago, it
needs the interaction of the AP and the client side. The PMK and PMKID are retrieved
from the initial authentication to the first AP the client connects, these are
distributed to each APs that are possible candidates for the roam; remember that
the PMKID is based on the BSSID the AP is using.
Once the key and ID is distributed the client can roam
just using a re-association and it can either show it’s PMKID on the frame or
the AP can provide it to the client, it makes not much difference on the administrator
side since the AP at the end will use the MAC address to recognize the client,
match it to a PMKID; now if the client is identified it goes to the 4-way handshake
where it indicates that it found a match, if the client is not recognized it
sends an EAPoL-Start frame and starts a full auth.
Unfortunately, this method is not supported by every
deployment since is not standard; it is widely spread but you might find
clients or infrastructure that do not support it, on the bright side it does
only use the initial 802.1x authentication so it scales well.
Fast Transition (FT) or 802.11r
This is one of the latest standard on the 802.11, it
is a bit complicated so I will try to summarize it as best as I can and will assume
you’re familiar with the concepts. Here is a quick list on what you should be familiar
with:
PMK, PTK, GMK, GTK, Fast basic service set (BSS) transition,
Fast BSS transition 4-way handshake, Fast BSS initial mobility domain, mobility
domain, Over-the-air, Over-the-DS, all these concepts can be found with one
look into the 802.11 standard and easily found via google, just to let you know
the recommendation to be familiar with this does not only comes from me but by the
CWNP gurus Lee Badman and Robert Bartz. Spoiler alert, they are right.
As you might know the robust security networks (RSN)
and authentication and key management (AKM) follow a process to derive the keys
(I will write a post on this later for your reference), from this process is
where you will get the PMK-R0 that derives from the PSK, PMK-R1 that
derives from the PMK-R0. So basically the 802.11r standard is about allowing non-AP
stations to preauthenticate with the AP to which it might roam later, the difference
with the “Preauthentication” method is that in a FT BSSID won’t need to do a
full authentication to the next AP, instead it will take a PTK (derived from
the PMK, an to be more specific the PMK-R1) to communicate with the client, so
it basically will take a previously negotiated PMK and derive it’s own keys to
talk between AP and client.
I know… this is a bit confusing, but the takeaway is
that it takes advantage of what it was already negotiated on the network and
skip some steps to save time, let’s say it skips ahead to the point it takes generates
the keys to encrypt the communication. One last thing on this idea, the preauth
is optional on the standard but the method is normally adopted with it to save
even more time while roaming.
Now, to close the idea why I ask you to read about
over-the-air and over-the-ds? Quite simple, these are the methods you can share
the keys with the next AP. On the over-the-air we send the credential between target
AP to roam as its name describes, it simplifies the process reducing the frames
for the re-association from 8 to 4 with the fast transition cutting the time to
roam in about a half.
As complicated as it is this is very advantageous and
easy to implement, it is standard and even required by now days voice-enterprise
certification, it is considered the most effective roaming method available an
to be honest you only need a couple of commands to enable it, the only problem
is that the adoption has been slow.
Other roaming solutions
And here you have, you can see the confusion between
the methods since all of them are called fast roaming, but some are better than
others. There is one other solution that is a single channel architecture, this
is a proprietary solution that the controller will manage the roaming between
APs where the client will think all the time it is connected to the same AP. I
won’t dig deep into this since I’m not a big fan due to the disadvantage that
the use of this architecture might have on the RF, must be used discretionally.
P.S.
If you hang out up to this point you might figure the confusion I talked at the start it was between OKC and 802.11r
Reference articule: Cisco employeer, "Chapter: Chapter 12 - Configuring Mobility Groups", from cisco.com, September 28, 2011
Thanks for reading
Dan Lopez
No comments:
Post a Comment