Although we have long entered the ethernet communication era, the serial communication still finds a lot of usage in different market segments, with a noticeable change of the physical COM port being moved to USB-to-Serial adaptor in most cases. I have heard many complaints on Windows USB adaptor based serial communication being slow and unpredictable, but when coming to its latency analysis, I have not yet found any source offering good insight, which is why I am writing this post.
Basic Introduction of USB Serial Adaptor
A USB Serial Adaptor is a type of protocol converter that is used for converting USB data signals to and from other serial communication standards, for example, RS232, RS485, RS422 etc.
As a communication device, USB Org has the specification for USB serial adaptor at here: https://www.usb.org/document-library/class-definitions-communication-devices-12, it is worth to note though that this is a general specification for CDC device, for USB serial adaptor, the proper specification document is indeed “PSTN120.pdf”, as illustrated below:
The benefits of following the USB ACM specification is that most mainstream operating systems have their class driver built in so that no customized kernel mode driver is needed, therefore it is a plug-N-play device from this perspective.
However, the downsides of following USB ACM specification are:
- Lack of synchronous status report for the data byte
- No support for 9th data bit
- Mandate the data transfer to Bulk, and two logical Endpoints for each port
As a result, if you want USB serial adaptor to avoid any of the above downsides, the CDC ACM specification cannot be followed, this is why there are many USB serial adaptors to be implemented in a vendor specific interface instead of standard CDC, which usually requires a vendor provided kernel mode driver to go with.
Architecture Overview
The below diagram illustrates the process how a RS232 message reaches the user application that receives it:
it covers 3 segments:
- Hardware
- Kernel mode driver stack
- User mode application
Next sector will explain the latency generated by each step of this propagation process.
Latency Analysis
On Wire
Obviously, there is a latency on the RS232 signal delivery from the sender to the receiver via the wire, the latency on wire is proportional to both the amount of data being transferred and the baud rate being configured.
Firmware Implementation
Once the receiver receives the RS232 messages, and put them into the receiver buffer, usually an interrupt will be generated to notify the MCU of the USB Serial Adaptor which will retrieve the data out of the receiving buffer and place it at the buffer of its data Endpoint for the USB host controller to pick up. The latency resulted by this process are mainly a result of the speed of MCU and the implementation of the firmware.
USB Hub
Most of USB Serial adaptor is of USB1.1, at the same time it is getting harder and harder to find an EHCI host controller on newer controller board, therefore it is more than likely that the adaptor will be connected to USB Host Controller (eg, xHCI) via a hub. The hub serves as a relay between the host and the adaptor, to talk to lower speed device, the host controller will talk to the hub via split transactions, while the hub will behave as a host to the adaptor and translate the Start Split Transaction it receives from the host controller and send it to the adaptor instead, vice versa, it will translate the transaction received from the adaptor and send it to USB host controller via Complete Split Transaction.
This translating process will result in latency. The below graph reveals the result of comparing test by adding an extra USB2.0 hub in front of ATEN USB serial adaptor before it gets connected to a USB2.0 on my main board, it shows ~20us added latency with the extra hub.
Data Transfer between the adaptor and USB Host controller
Once adaptor’s firmware place the received data in its data Endpoint, it is up to the host controller (ignore the hub effect here to simplify the discussion), there will be latency introduced in this stage, which will be determined by:
- USB speed of the adaptor(1.1 or 2.0 device)
- USB transfer type that the adaptor adopts, usually it is Bulk, and some 2.0 devices can choose Interrupt
For Interrupt transfer, the latency is bounded to 1ms for USB 1.1 device and 125us for USB 2.0 device. However, the latency goes tricky for Bulk transfer type as it varies depending on the bus traffic, generally speaking, when USB bus is not busy, the latency on Adaptor is small, but it can get bigger when the bus gets more heavily loaded. The reason is that the host controller will schedule the synchronous tasks (INT and Isochronous transfer) first for each frame/uFrame, before scheduling the asynchronous tasks (Bulk and Control), when the bus is not busy, there will be unused bandwidth left after the host controller finishes the traversing of both synchronous and asynchronous tasks, it will enter a sleep timer (usually 10us) and restart the traversing of asynchronous task again when this timer expires, it will repeat this process until the end of the frame/uFrame. My testing reveals that the latency on a USB1.1 USB serial adaptor is ~25us when the bus is not busy, however it can get up to more than 200us if the bus gets loaded.
As an example, the below graph shows that the latency of a USB 1.1 adaptor using bulk transfer can have a latency of 233us when tested it while at the same time copying a large file from 3 USB thumb drives respectively, all of them are connected to my EHCI controller.
Driver Stack
The driver stack for USB serial adaptor is a list of kernel mode drivers between the user application and the host controller, together they work by transferring the received data from the host controller to the user application, and vice versa, passing the data sent by the user application to the host controller. It consists of:
- Filter driver (optionally)
- Functional driver for adaptor (either vendor provided or CDC class driver from the OS), which is a client of Host Controller driver
- Host Controller driver
The communication between the client driver and the Host Controller driver is accomplished via USB Request Block, the communication between the filter(if existed) and the client driver is accomplished by IRP. Obviously, all these communication will add more latency.
Windows
The data exchange between the kernel mode driver stack and the user mode application is done via Windows, which will add another layer of latency, depending on system’s busy level, the latency introduced by Windows can be very significant.
I did a test by measuring the time from my PC on sending a byte to my target(AMD Merlin Falcon) via USB serial adaptor and receiving a response back, the below first graph shows about 3ms in most cases when my target is close to be in idle, the below second graph shows the result when all 4 cores of my target are saturated by 5 CPU intensive applications, we can see the latency in many cases can reach more than 30ms. It is worth to note, this problem is well known to Windows which is not a real time operating system and not specific to serial communication.
Application
It is obvious that latency can be resulted by the application itself. Depending how the application is implemented, the latency can be small or very significant. What we want to pay attention is the latency increased caused by bad programming practice, see next sector for more information.
Good Code Practice
Good practice under Window can help you to reduce the latency for all data communication include serial one, or can increase it otherwise. Here are a few areas to pay attention to.
Always use overlapped IO
When opening a serial port, always use overlapped IO so that your request thread can wait for the arrival of the received data, instead of either adding significant latency by occasionally checking the data or saturating one CPU core to constantly check it.
Avoid “cout” Or “Printf”
The “cout” or “Printf”can be very helpful for debugging purpose, but it can easily add 1~2ms delay, it is an expensive operation so don’t use it if you care about the latency.
Never call Sleep () and other timer related APIs
The timer resolution in Windows user mode is ~16ms (the fastest is 1ms but only available in kernel mode), therefore the call to any timer related Windows API like Sleep (1) for 1ms may return anytime between 1~16ms, potentially adding expected significant latency to your communication