Latency caused by UDP fragmentation

While testing a WebRTC application recently, I noticed some unexpected delays in the execution of a series of lines in my OpenSIPS control script. For OpenSIPS to behave properly it is essential that the child threads don’t get blocked and that the lines in the script are executed at speed. Seeing a delay of several seconds during handling of a BYE request (i.e. hangup at one end terminating the call) was therefore quite concerning.

Initially, I believed the delay must be linked to DNS queries in my NAT detection routines, but as I looked closer it became clear that it was in fact arising from very slow transmission of large messages over UDP. The UDP connection was from OpenSIPS on one server to rtpengine on another. It is used to send control commands – ‘offer’, ‘answer’ and ‘delete’ – one way and then receive responses back in the reverse direction. The commands and responses can vary in size quite a lot because they may include a copy of the SDP or a report on the quality of a call.

The two servers in question were Virtual Private Servers provided by two different Internet hosting companies. Using tcpdump to capture and wireshark to analyse the network traffic, it was possible to see that small command messages were transmitted very fast while large messages such as the response to the ‘delete’ command, were delayed by up to 5 seconds. Searching on the web, it looked like the problem was linked to fragmentation of the network packets. UDP datagrams whose size exceeded the Maximum Transmission Unit (MTU) of 1500 bytes were being broken into smaller fragments, transmitted over the Internet and reassembled at the far end. For some reason that I was not able to identify, the fragmented packets were being given very low priority as they traversed the networks within or between the two hosting providers.

It seemed unlikely that I would be able to get the hosting companies to investigate so instead I searched for a tunnelling solution. I thought that, if I could somehow transmit the UDP over a TCP connection, then it would no longer be delayed by some obstinate piece of intermediate network equipment. After a lot of Google searches and a few unsuccessful tests with free UDP tunnelling utilities, I discovered a tool called socat. This utility is installable as a yum package in CentOS, it is well documented and it was fairly easy to find some working examples not dissimilar to what I wanted to do.

The commands I used, which worked first time, were as follows. On the rtpengine host server:

socat TCP4-LISTEN:2223,fork UDP4:127.0.0.1:2223

On the OpenSIPS host server:

socat UDP4-LISTEN:2223,fork TCP4:<rtpengine-address>:2223

On the rtpengine host server, I had to make rtpengine listen for ng commands on the localhost address instead of the published Internet IP. I also had to add a rule to iptables to allow connections over TCP to port 2223 from the OpenSIPS server. On the OpenSIPS server, I just had to change the address specified in the rtpengine module parameters from the remote address of the rtpengine host server, to 127.0.0.1.

I’m happy to say that the results were brilliant. All the horrible delays vanished and, so far, I have seen no problems. All that is left now is to configure the servers such that socat runs as a daemon or is otherwise automatically started before OpenSIPS starts.

The same solution could be applied to other UDP connections, either as a way to overcome the problems of datagram fragmentation or possibly as a way to get through awkward NAT routers or firewalls that don’t handle UDP very well. I think I will be using socat again, possibly quite soon.