As part of my project of building a compositor for Wayland and also X11 through Xwayland, I have had to learn about how pointing devices work in Linux. If you’re unfamiliar with window systems like Wayland and X11, you can read my blog post on building a window manager. The pointer is one of the most popular inputs into a computer, so knowledge about how it works can be useful. In this article I will explain how this input device is managed, interacts with the operating systems, applications, and is shown on the display with with an image.
Cursor vs Pointer
When reading documentation and code, you will frequently see both pointer and cursor used. Confusingly some people use them to mean the same thing, others use them to mean seperate things. The Wayland docs refer to the pointer as a representation of the input device, and the pointer image as a cursor. Since my compositor project is for Wayland, I will use their terminology.
Hardware to the kernel
One of the jobs of a kernel is to manage I/O, in our case a pointing device such as a serial, PS/2, USB, or Bluetooth mouse. Each of these interfaces use different protocols to send mouse movements and button presses.
Serial
By serial mouse, typically people mean a mouse with a D-sub connector with 9 pins,
known as DE-9 (or DB-9).
Other connectors can be used, like DB-25 or
the DEC VSXXX hockypuck mouse.
What ties all of these mice together is the usage of the RS-232 standard for communication.
This standard specifies the voltage levels for transmitting and receiving signals, in our case between
the mouse and the computer. Computer buses can be parallel, so to communicate through the serial
port data is passed through a controller, typically a UART IC.
This also handles the encoding and decoding of data to and from signals passed through the RS-232
connectors. The UART registers can be accessed through I/O addresses,
such as 0x3F8
for the first port.
There are two protocols, Microsoft and Mouse Systems Corporation protocol.
The Microsoft one was more popular from what I have read, so it is what I will cover.
The data for this protocol is encapsulated in a frame that has 9 bits.
Diagram of the RS-232 frame
The bytes sent through these frames from the device to the computer contain mouse movements and button presses. The basic serial mouse has two buttons and can only be moved across the X and Y-axis. There are 3 bytes used for the Microsoft mouse protocol to send this data. Although there are 8 usable data bits in a frame, only 6 of these bits are used to send mouse data. This means to send a full 8 bit number, 2 bits from the X and Y movements are put in the first byte of the protocol.
Diagram of the Microsoft protocol used for serial mice
- The first byte for button state and some bits of the X/Y movements.
- The second byte for X-axis movements.
- The third byte for Y-axis movements.
PS/2
The PS/2 mouse also uses a serial protocol, but with a different
port.
The name PS/2 comes from the IBM Personal System/2 that introduced a 6-pin mini-DIN connector to replace the
older DE-9 connector. Other companies started using the mini-DIN connector for the keyboard and
mouse port, but refered to it as the PS/2 port.
The 8042 controller was used by these
computers to manage a PS/2 device. It has an I/O address of 0x64
for the status/command register and 0x60
for reading and writing data.
The data sent over this connector
uses the PS/2 protocol. With this protocol a frame contains 11 or 12
bits.
Diagram of the PS/2 frame
The data from the frame received uses the PS/2 protocol with 3 or 4 bytes. The default format of data contains 3 bytes with:
Diagram of the PS/2 protocol used for mice
- The first byte for button state and X/Y offset bit and sign bit.
- The second byte for X-axis movements.
- The third byte for Y-axis movements.
- The fourth byte is an optional extension byte for Z-axis movements (scroll).
It is interesting to see the differences between PS/2 and Microsoft serial mice. The serial mouse uses 8 data bits, 1 stop bit, and no parity bit. The PS/2 mouse uses 8 data bits, a pairty bit, a start bit, a stop bit, and an acknowledge bit. Since the serial mouse only has 6 usable data bits, it has to pack an extra 4 bits for X/Y movements in the byte containing button states. The PS/2 mouse uses the extra room in the first byte to add an overflow and signed bit to the movement data.
USB and Bluetooth
It depends on your processor and motherboard, but most modern machines will have controllers to bridge USB/Bluetooth with PCIe. The USB and Bluetooth I/O subsystems (subsystem is just part of the Linux kernel that provides functionality) handle when a new device is connected (hotplugged) or already exists when the host boots.
While the I/O subsystem directly manages devices, it does not interact with the higher level world of handling pointer movements or other device actions. USB and Bluetooth use several layers of protocols to send data between the device and host. For pointing devices, the protocol to sent the actual pointer data is the Human Interface Device (HID) protocol. The HID protocol was created to be an abstract protocol for all kinds of external devices used by people. This includes webcams, fingerprint readers, mice, keyboards, and pretty much all other modern peripherals.
Since the I/O subsystem does not understand the HID protocol, to use it another subsystem is required.
The job for the transport driver
(such as USB-HID)
is to link HID core
with the I/O driver.
HID core handles HID data (reports) between the device and host, the userspace API, and managing the
HID devices. The transport driver sets the hid_ll_driver
constant to receive callbacks for HID actions.
The HID specification defines two protocols that can be used to interact with HID devices. One is the boot protocol, and the other the report protocol. The boot protocol is used by the BIOS, embeded systems, and in general non-USB aware systems. It is simplier and has less features than the report protocol. With the boot protocol, keyboards and mice work in a limited manner, but things like Z-axis movements (scroll) does not work. For this reason I will cover the report protocol.
There are many high level concepts in the HID specification, but here are some of the most important:
- request - Data sent between the host and device. There are standard requests
defined by a specific format, such as
Get_Descriptor
. - report - This is a USB interrupt transfer that is a data structure sent either to the host from the device or vice versa.
- descriptor - A hierarchy of data structures that set information about the device like the configuration of the device, endpoints which are basically addresses to send and receive data (it can only be
IN
orOUT
), report data structures, and localizations.
Diagram of how descriptors relate to each other
- class - A type of device such as audio, thermometers, keyboard, mouse, and many others.
When a HID device is connected, the host needs to know the vendor, product, configuration,
interfaces, endpoints, format of reports, etc of the device. To get this information the host
retrieves the device descriptor through a Get_Descriptor
request. From the device descriptor it
finds and requests the other descriptors for the device.
Part | Standard USB Descriptor | HID Class Descriptor | Notes |
---|---|---|---|
bmRequestType | 100 xxxxx | 10000001 | The bits 0-4 indicate if the descriptor is for device (0), interface (1), endpoint (2), or other (3). Bits 5-6 is the type such as standard (0), class (1), vendor (2), reserved (3). Bit 7 is the direction, either host to device (0) or device to host (1). |
bRequest | GET_DESCRIPTOR (0x06) | GET_DESCRIPTOR (0x06) | The request type. |
wValue | Descriptor Type and Descriptor Index | Descriptor Type and Descriptor Index | The descriptor type in the high byte (like configuration or interface) and descriptor index in the low byte (if there are multiple descriptors of the type, which to retrieve). |
wIndex | 0 (zero) or Language ID | Interface Number | The string descriptor requires a Language ID to handle localization. If we were getting a string descriptor with English, the wIndex would be 0x0009 . |
wLength | Descriptor Length | Descriptor Length | The number of bytes to transfer. |
Data | Descriptor | Descriptor | For IN it is the bytes to hold the incoming data. For OUT it is the bytes to send. |
Example
Get_Descriptor
request from the specification
The host will get the report descriptor for the mouse defining the data structure of reports. This allows the host to parse the items received in reports.
Usage Page (Generic Desktop),
Usage (Mouse),
Collection (Application),
Usage (Pointer),
Collection (Physical),
Report Count (3),
Report Size (1),
Usage Page (Buttons),
Usage Minimum (1),
Usage Maximum (3),
Logical Minimum (0),
Logical Maximum (1),
Input (Data, Variable, Absolute),
Report Count (1),
Report Size (5),
Input (Constant),
Report Size (8),
Report Count (2),
Usage Page (Generic Desktop),
Usage (X),
Usage (Y),
Logical Minimum (-127),
Logical Maximum (127),
Input (Data, Variable, Relative),
End Collection,
End Collection
Example report descriptor from the specification
Each of these lines is known as an item and have a unique number associated with it. For example
Usage Page (Generic Desktop)
is 0x05, 0x01
. A device can have multiple reports, they are grouped
together in a report descriptor with a collection item. Each report item will have some data it is trying
to send, this is defined with a usage item . In the case above the usage item is sending the mouse state. The
usage item will have some bounds set with a logical minimum and logical maximum. This is the minimum and
maximum value of the usage respectively. The input item defines how the data above it should be
interpreted, for example relative or absolute, and that it is finished defining a section of data.
When you move your pointing device or click a button then a report is sent to the host to process.
Kernel to libinput
kernel → libevdev → libinput → Compositor → Wayland client
The flow of events taken from libevdev documentation
After the pointer data is received by the kernel, it goes to evdev, then to libinput where it is
picked up by the Wayland compositor. Event device, known as evdev, takes events from drivers like
sermouse
for serial mice for use in userspace. It is generic across architectures and hardware.
input_report_rel(dev, REL_X, buf[3]);
Example from sermouse.c of creating events caused from movements on the X-axis
After the events are reported to evdev through the usage of helpers from input.h,
they can be read by userspace in a file like /dev/input/event0
. There will be a whole bunch of
these event files created for your various devices. If you want to get events from the file, just open and read
it. It can be a lot of work to create all the handling for events, so the library libevdev
can be used to do all of that for you. This makes opening an event file and reading events from it
easy. The next piece is libinput which uses
libevdev to read events from all the event files for input devices and returns each of these events.
Event | Description |
---|---|
LIBINPUT_EVENT_POINTER_BUTTON |
When you click a button on your pointing device. The event data contains the state (pressed or released) and the button (left, middle, right, forward, back). |
LIBINPUT_EVENT_POINTER_MOTION |
Pointer motion that is relative, for example move (-15, 20) pixels. It is associated with evdev event code EV_REL . |
LIBINPUT_EVENT_POINTER_MOTION_ABSOLUTE |
Pointer motion that is absolute, for example a sudden jump to specific coordinates. It is associated with evdev event code EV_ABS . |
Pointer events from libinput
static int open_restricted(const char *path, int flags, void *user_data)
{
int fd = open(path, flags);
return fd < 0 ? -errno : fd;
}
static void close_restricted(int fd, void *user_data)
{
close(fd);
}
/**
* libinput doesn't directly open and close files, instead we define two functions.
*/
const static struct libinput_interface interface = {
// Opens a device from a path and returning a file descriptor.
.open_restricted = open_restricted,
// Takes a file descriptor and closes it.
.close_restricted = close_restricted,
};
int main(void) {
struct libinput *li;
struct libinput_event *event;
// Create libinput with udev backend. Using udev gives device hotplugging functionality
// with events when devices are added and removed.
li = libinput_udev_create_context(&interface, NULL, udev);
// Tell udev to use seat0 for devices.
libinput_udev_assign_seat(li, "seat0");
// Start reading events.
libinput_dispatch(li);
while ((event = libinput_get_event(li)) != NULL) {
// handle the event here
libinput_event_destroy(event);
// Read more events.
libinput_dispatch(li);
}
// Clear pending events, clean up memory.
libinput_unref(li);
return 0;
}
Simple libinput program from the documentation
The libinput library can use the udev subsystem to handle
adding and removing evdev devices to be monitored internally. The other alternative is to manually do this
with libinput_path_add_device.
No matter if udev or path is used for the backend, each device is assigned to a seat. A seat is a
collection of inputs like a keyboard and mouse. The vast majority of computers will just have one
seat being seat0
. You can view the devices on your computer and which
seats they are assigned to with libinput list-devices
.
Device: SINOWEALTH Game Mouse
Kernel: /dev/input/event9
Group: 8
Seat: seat0, default
Capabilities: pointer
Example pointer device from
libinput list-devices
Compositor
A Wayland compositor is responsibile for managing and placing clients (an application that wants to show one or more windows) on your display. Wayland is the protocol used by clients and a compositor to communicate, the clients and compositor can use a wide variety of libraries/software to accomplish this. The typical Wayland compositor uses libinput to get events for movements and button clicks. It keeps track of the movements to calculate which window should have focus of the pointer and where to render the cursor. After the compositor receives a button event from libinput, it passes the event to the currently focused window. For learning how the cursor is displayed on the screen, some understanding of how Wayland works is needed. I recommend looking at the Wayland documentation and reading the Wayland Book to get an indepth knowledge since I will just cover the basics of implementing a compositor for Linux.
The client draws itself and passes the pixel data to the compositor. Shared memory is used to send the data between the client and compositor. This data is associated with a wl_buffer which is a shared buffer between the client and compositor. A wl_buffer is attached to a wl_surface which is the object that displays the client on the screen. Double buffering is generally used by the client to reducing tearing. This can be implemented by having two wl_buffer objects and switching between them.
struct wl_callback *frame_callback;
// Create a callback for when frame is done, this means client should draw new buffer.
frame_callback = wl_surface_frame(surface)
// The frame_callback_listener should have a done callback.
wl_callback_add_listener(frame_callback, &frame_callback_listener, state);
// Attach the first buffer to the surface.
wl_surface_attach(surface, buffer_front)
wl_surface_commit(surface)
wl_display_flush()
// Here we should destroy the frame callback and request a new one.
// Attach the second buffer to the surface on done callback, the next callback should revert to
// using the first buffer and so on.
wl_surface_attach(surface, buffer_back)
wl_surface_commit(surface)
wl_display_flush()
A Wayland surface can be used in roles other than displaying a client’s application window. The
client can set a surface’s buffer to a cursor and call the wl_pointer
set_cursor
method. This tells the compositor to show the custom cursor if the focus is in one of the
client’s surfaces like for the application window. The data in the surface with the cursor role can come
from an Xcursor file. Xcursor is a
file format for storing images for a cursor. The format supports animations since a cursor can be animated,
like with a spinner. It implements this by specifying that each image for a cursor should set a
delay to wait before showing the next image in the sequence.
The client can use the Xcursor library to load a cursor and put each image into the surface
while waiting for the delay between each update.
Parameter | Description |
---|---|
header | 36 Image headers are 36 bytes |
type | 0xfffd0002 Image type is 0xfffd0002 |
subtype | CARD32 Image subtype is the nominal size |
version | 1 |
width | CARD32 Must be less than or equal to 0x7fff |
height | CARD32 Must be less than or equal to 0x7fff |
xhot | CARD32 Must be less than or equal to width |
yhot | CARD32 Must be less than or equal to height |
delay | CARD32 Delay between animation frames in milliseconds |
pixels | LISTofCARD32 Packed ARGB format pixels |
Format of an image in an Xcursor file taken from the specification
The surfaces are stored by the wl_compositor object. They are combined together into a single displayable output by a renderer. The rendering can be done in various ways, but the most common method is with OpenGL ES (GLES) with EGL used for the context (basically state for GLES) and other window system functionality. EGL is cross-platform and has no dependencies on other window systems, unlike others such as GLX. To see an example of how a Wayland compositor can implement a GLES renderer, you can look at the wlroots GLES 2.0 implementation. The rendered content still needs a way to be displayed.
The Direct Rendering Manager (DRM) is a subsystem to interact with GPUs. This subsystem has an expansion called Kernel Mode-Setting (KMS) that allows configuring of the display pipeline, it is used by compositors to display the rendered output to your monitor. The display pipeline is the sequence of steps from userspace to what is outputted to a connector.
- Provide a framebuffer (pixel data) to a plane.
There are three types of planes the GPU provides:
- Primary: Required plane used for primary content (like compositor output), page flipping (double buffer), and mode setting.
- Cursor: Optional plane for hardware cursor.
- Overlay: Optional plane for all other content to be displayed. Unlike the other two types, the GPU can provide more than one overlay plane.
- The plane takes the framebuffer and applies cropping, scaling, rotation, and other properties to the source.
- The CRTC takes all the planes and blends them together.
- The encoder takes the CRTC data and converts it to a format the connector uses like VGA or HDMI.
- The connector receives the encoder data and sends it to the display.
I recommend the talk about KMS that Simon Ser did for foss-north for more information.
Diagram of display pipeline for KMS
You probably noticed the mention of cursor plane, this is where the distinction between the hardware and software cursor is.
Hardware cursor
A hardware cursor means that the primary framebuffer is not modified when a person moves the pointer device. Only the framebuffer for the cursor plane is modified to show changes like movement or icon updates. This gives a performance boost by decreasing the amount of data to be rendered.
Software cursor
It is not possible to use the cursor plane for some hardware since it is old or does not support it. For this case a software cursor is used, this means that the cursor is rendered in the same framebuffer as the primary.
References
I browsed through many dozens of websites researching this article. I linked to some of them in the article, but here are a few more that I found useful.
Input Protocols
- https://www.eecg.utoronto.ca/~jayar/ece241_08F/AudioVideoCores/ps2/ps2.html
- https://learn.microsoft.com/en-us/windows-hardware/drivers/hid/
- https://www.cpcwiki.eu/index.php/Serial_RS232_Mouse
- https://wiki.osdev.org/PS/2_Mouse
- https://isdaman.com/alsos/hardware/mouse/ps2interface.htm
EGL
- https://jan.newmarch.name/Wayland/EGL/
- https://ppaalanen.blogspot.com/2012/03/what-does-egl-do-in-wayland-stack.html
DRM/KMS
- https://embear.ch/blog/drm-framebuffer
- https://gitlab.freedesktop.org/daniels/kms-quads
- https://events.static.linuxfound.org/sites/events/files/slides/brezillon-drm-kms.pdf
- https://landley.net/kdocs/htmldocs/drm.html