A client was having issues with a real time video or voice streaming dropping out for a few seconds every 20 minutes - only via WiFi using a Ubiquiti AP-AC-LR access point.
Preface
The issue was happening with Linux or Windows and different computers, so that comfortably rules out client side applications or hardware. The key point to remember here is that it happens every 20 minutes, on the dot. It’s also only real time streaming of ddata that is affected. Sites like youtube won’t expose this since there’s a client side buffer.
The access point used was an up-to-date Ubiquiti AP-AC-LR - nothing too flash about the settings, all seemed pretty bog standard.
Equipment:
- UBNT AP AC LR
- pfsense box
- Controller v6.2.26
- AP AC LR Firmware v5.43.36
Troubleshooting
Next point of call, working backwards from the clients. This might be an issue with the AP itself, the switch it’s connected to, the power supply(POE powered), the router(pfSense), the NBN modem or further up the chain from there. Since nothing stands out to me at this point, I’ll have to troubleshoot each of these.
Low hanging fruit first in troubleshooting.
- Checking for packet drops or high latency over the internet showed nothing. ISP status doesn’t have any ongoing issues and this problem has been happening for months at this point. Feels client equipment side due to the 20 minute timing.
- Nothing to note of error in the Ubiquiti controller page. All sane default options for the SSID.
- Since this AP is powered via POE from the switch, I checked if anything stood out there. Making sure to also set maximum power allowance. Same issue.
- Next was plugging in the AP to a different port. Same issue.
- I tried using the POE power splitter brick that comes with the AP in case it’s still switch related. Same issue.
- Known good tested ethernet cable from the switch to the AP. Same issue.
- Logging into the router(pfSense) to check logs or for any errors. Nothing immediately stood out.
Hardware testing
At this point, I’m still not 100% confident on the Ubiquiti AP not being faulty and factory resetting it would be next on the cards. Some light googling around shows that this might be a common cause but nothing concrete.
Since I had a spare TP-Link EAP245 v3 FW: 5.0.3 access point laying around I could try to rule out the equipment in between and be doubly sure the clients themselves are ok.
Success - replicating the setup on the TP-Link shows the issue has gone away. 20 minutes passed, all good. Video/Voice stream clear and uninterrupted. Switching back to the Ubiquiti setup has the same problem again.
So that now rules out the internet connection, the NBN modem, the router and the switch(POE). Seems like a faulty AP? right? At first glance, yes, but we can dig deeper!
Okay, let’s factory reset it and try again….nope, same issue.
Throwing it out or sending it back for warranty seems to be tempting at this point - except, why is it like clock work at 20 minutes. If it was faulty, that would stand to reason it would be erratic.
What network mechanism starts/stops or resets at 20 minutes? I’m sure a network engineer might have an inkling of an idea - but for me, I have to work from the ground up on this.
Wireshark and tcpdump to the rescue
Since we’re now dealing with a possible network issue, it’s time to bust out the packet captures to watch on the wire.
For convenience, I’ll start a packet capture on the pfSense box, looking at the interface the AP is on. Marked, Number 1.
Diagnostics > Packet capture > Enable promiscuous mode > LAN > count: 0 > Do reverse DNS lookup » Start
Source Destination Protocol Length Info
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
PFSense:FF:FF Broadcast ARP 42 Who has 192.168.100.100? Tell 192.168.100.1
Client:FF:FE PFSense:FF:FF ARP 56 Who has 192.168.100.1? Tell 192.168.100.100
PFSense:FF:FF Client:FF:FE ARP 42 192.168.100.1 is at 00:0d:b9:48:5f:fe
When the interruption occurs using the client machine, I notice that traffic to the 192.168.100.100 stops and the router is sending out multiple ARP requests just before for the client that aren’t being answered. Then the interruption occurs, and finally traffic resumes after the device sends it’s own ARP request for the gateway. Interesting! so the ARP entry isn’t getting refresh on the router.
Back to the the pfSense router, Diagnostics > ARP Table. Filtering for the computer, when it first requests an IP and connects to the network, the default expire time is 20 minutes….looks familiar! Turns out, 20 minutes is the default timeout in pfSense/freeBSD.
If the ARP entry is expiring from the router, that would explain the traffic flow interruption. You can also see that traffic restores once the client sends an ARP itself, looking for the gateway. So why doesn’t the client respond to the router’s requests at the beginning?
Let’s check a capture from the client’s end, Number 4, to check if they even received the ARP request. It doesn’t. So, the issue looks like the AP again.
Next, I’m going to packet capture using tcpdump built into the Ubiquiti AP to see if the ARP actually gets to the physical interface side, Number 2. Enable SSH access via Ubiquiti controller: System > Device SSH Authentication > Enable Device SSH Authentication and note the username and password. Using Ubiquiti’s tcpdump debug page as a guide, we want to find the physical interface to listen on. Find the interfaces, ip link
and iwconfig
to sort them, usually marked EthX with a bridge interface for the WLAN side; AthX. We want to first check the interface: Eth0
After logging in, issue the following cmd: tcpdump -i eth0 arp
or follow above link to save or pipe it to wireshark.
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
ARP, Request who-has 192.168.100.100 tell 192.168.100.1, length 42
This shows that the ARP request is actually getting to the interface. That rules out anything along the path to the AP. It also shows the amount of requests the router tries before the entry is expired. It isn’t just once.
Trying to now capture from Number 3, which is the interface on the WLAN side - tcpdump -i ath0 arp
shows that the ARP doesn’t make it across. Here’s our ground zero. Between the two interfaces.
I recall seeing a Proxy ARP option in the wireless settings in the Ubiquiti controller for the network. But by default it is off. Makes sense as you don’t really want broadcasts flooding over to the WLAN side - for performance reasons.
Also interestingly, this is the new GUI that Ubiquiti released and the older one had an option called “Block LAN to WLAN multi/broadcast”. Looks like the Proxy ARP options replaces that older one.
Turning on Proxy ARP, has fixed the issues. It does what it says on the tin. If the AP knows of a device with that IP on the WLAN side, it will proxy an ARP reply to the device that requested it, thus saving not having to send it out over the air.
Mystery solved.
But why did the TP-Link AP work out of the box? Doing a packet capture on the client side while connected to it shows that by default it passes on broadcasts compared to the Ubiquiti. There isn’t an option to disable it as of writing this.
There’s also another consideration with using devices like the TP-Link that floods the air from the LAN side(aside from performance). From a security posture, this also reveals MAC addresses and from that you can glean what devices are on the LAN network using a simple wifi packet capture. Tools like kismet spring to mind. Want to see when someone is home? fire up kismet, see if their phone/devices is connected, done. This type of meta data is valuable. All depends how far down the security rabbit hole you’re willing to go.
See also
- Juniper Switch SSH 'Expecting SSH2_MSG_KEX_ECDH_REPLY' Error
- PfSense AX88179 vs RTL8153 USB 3 Ethernet Compatibility
- Client to NVR Isolated CCTV with VLAN and a Virtual Machine
- Cisco SG300 Port Security and Static Mac Filtering
- How To Access An IP Camera's Embedded Linux OS Through A Serial UART/TTL Port