How pointing devices work in Linux

Apr 19, 2023

Table of Contents

As part of my project of building a compositor for Wayland and also X11 through Xwayland, I have had to learn about how pointing devices work in Linux. If you’re unfamiliar with window systems like Wayland and X11, you can read my blog post on building a window manager. The pointer is one of the most popular inputs into a computer, so knowledge about how it works can be useful. In this article I will explain how this input device is managed, interacts with the operating systems, applications, and is shown on the display with with an image.

Cursor vs Pointer

When reading documentation and code, you will frequently see both pointer and cursor used. Confusingly some people use them to mean the same thing, others use them to mean seperate things. The Wayland docs refer to the pointer as a representation of the input device, and the pointer image as a cursor. Since my compositor project is for Wayland, I will use their terminology.

Hardware to the kernel

One of the jobs of a kernel is to manage I/O, in our case a pointing device such as a serial, PS/2, USB, or Bluetooth mouse. Each of these interfaces use different protocols to send mouse movements and button presses.

Serial

By serial mouse, typically people mean a mouse with a D-sub connector with 9 pins, known as DE-9 (or DB-9). Other connectors can be used, like DB-25 or the DEC VSXXX hockypuck mouse. What ties all of these mice together is the usage of the RS-232 standard for communication. This standard specifies the voltage levels for transmitting and receiving signals, in our case between the mouse and the computer. Computer buses can be parallel, so to communicate through the serial port data is passed through a controller, typically a UART IC. This also handles the encoding and decoding of data to and from signals passed through the RS-232 connectors. The UART registers can be accessed through I/O addresses, such as 0x3F8 for the first port. There are two protocols, Microsoft and Mouse Systems Corporation protocol. The Microsoft one was more popular from what I have read, so it is what I will cover. The data for this protocol is encapsulated in a frame that has 9 bits.

Data (least significant bit first) Always 1 Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 Bit 8

Diagram of the RS-232 frame

The bytes sent through these frames from the device to the computer contain mouse movements and button presses. The basic serial mouse has two buttons and can only be moved across the X and Y-axis. There are 3 bytes used for the Microsoft mouse protocol to send this data. Although there are 8 usable data bits in a frame, only 6 of these bits are used to send mouse data. This means to send a full 8 bit number, 2 bits from the X and Y movements are put in the first byte of the protocol.

Unused Always 0 Y5 Data Bit Y4 Data Bit Y3 Data Bit Y2 Data Bit Y1 Data Bit Y0 Data Bit Byte 3 Unused Always 0 X5 Data Bit X4 Data Bit X3 Data Bit X2 Data Bit X1 Data Bit X0 Data Bit Byte 2 Unused Always 1 Left Button(1 = pressed) Right Button(1 = pressed) Y7 Data Bit Y6 Data Bit X7 Data Bit X6 Data Bit Byte 1 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0

Diagram of the Microsoft protocol used for serial mice

  1. The first byte for button state and some bits of the X/Y movements.
  2. The second byte for X-axis movements.
  3. The third byte for Y-axis movements.

PS/2

The PS/2 mouse also uses a serial protocol, but with a different port. The name PS/2 comes from the IBM Personal System/2 that introduced a 6-pin mini-DIN connector to replace the older DE-9 connector. Other companies started using the mini-DIN connector for the keyboard and mouse port, but refered to it as the PS/2 port. The 8042 controller was used by these computers to manage a PS/2 device. It has an I/O address of 0x64 for the status/command register and 0x60 for reading and writing data. The data sent over this connector uses the PS/2 protocol. With this protocol a frame contains 11 or 12 bits.

Always 0 Data(least significant bit first) Parity Always 1 Ack(host only) Bit 0 Bit 1 Bit 2 Bit 3 Bit 4 Bit 5 Bit 6 Bit 7 Bit 8 Bit 9 Bit 10 Bit 11

Diagram of the PS/2 frame

The data from the frame received uses the PS/2 protocol with 3 or 4 bytes. The default format of data contains 3 bytes with:

Y7 Data Bit Y6 Data Bit Y5 Data Bit Y4 Data Bit Y3 Data Bit Y2 Data Bit Y1 Data Bit Y0 Data Bit Byte 3 X7 Data Bit X6 Data Bit X5 Data Bit X4 Data Bit X3 Data Bit X2 Data Bit X1 Data Bit X0 Data Bit Byte 2 Y Overflow X Overflow Y Signed X Signed Always 1 Middle Button Right Button Left Button Byte 1 Bit 7 Bit 6 Bit 5 Bit 4 Bit 3 Bit 2 Bit 1 Bit 0

Diagram of the PS/2 protocol used for mice

  1. The first byte for button state and X/Y offset bit and sign bit.
  2. The second byte for X-axis movements.
  3. The third byte for Y-axis movements.
  4. The fourth byte is an optional extension byte for Z-axis movements (scroll).

It is interesting to see the differences between PS/2 and Microsoft serial mice. The serial mouse uses 8 data bits, 1 stop bit, and no parity bit. The PS/2 mouse uses 8 data bits, a pairty bit, a start bit, a stop bit, and an acknowledge bit. Since the serial mouse only has 6 usable data bits, it has to pack an extra 4 bits for X/Y movements in the byte containing button states. The PS/2 mouse uses the extra room in the first byte to add an overflow and signed bit to the movement data.

USB and Bluetooth

It depends on your processor and motherboard, but most modern machines will have controllers to bridge USB/Bluetooth with PCIe. The USB and Bluetooth I/O subsystems (subsystem is just part of the Linux kernel that provides functionality) handle when a new device is connected (hotplugged) or already exists when the host boots.

While the I/O subsystem directly manages devices, it does not interact with the higher level world of handling pointer movements or other device actions. USB and Bluetooth use several layers of protocols to send data between the device and host. For pointing devices, the protocol to sent the actual pointer data is the Human Interface Device (HID) protocol. The HID protocol was created to be an abstract protocol for all kinds of external devices used by people. This includes webcams, fingerprint readers, mice, keyboards, and pretty much all other modern peripherals.

Since the I/O subsystem does not understand the HID protocol, to use it another subsystem is required. The job for the transport driver (such as USB-HID) is to link HID core with the I/O driver. HID core handles HID data (reports) between the device and host, the userspace API, and managing the HID devices. The transport driver sets the hid_ll_driver constant to receive callbacks for HID actions.

The HID specification defines two protocols that can be used to interact with HID devices. One is the boot protocol, and the other the report protocol. The boot protocol is used by the BIOS, embeded systems, and in general non-USB aware systems. It is simplier and has less features than the report protocol. With the boot protocol, keyboards and mice work in a limited manner, but things like Z-axis movements (scroll) does not work. For this reason I will cover the report protocol.

There are many high level concepts in the HID specification, but here are some of the most important:

  • request - Data sent between the host and device. There are standard requests defined by a specific format, such as Get_Descriptor.
  • report - This is a USB interrupt transfer that is a data structure sent either to the host from the device or vice versa.
  • descriptor - A hierarchy of data structures that set information about the device like the configuration of the device, endpoints which are basically addresses to send and receive data (it can only be IN or OUT), report data structures, and localizations.
ConfigurationDescriptor DeviceDescriptor InterfaceDescriptor InterfaceDescriptor InterfaceDescriptor EndpointDescriptor HIDDescriptor

Diagram of how descriptors relate to each other

  • class - A type of device such as audio, thermometers, keyboard, mouse, and many others.

When a HID device is connected, the host needs to know the vendor, product, configuration, interfaces, endpoints, format of reports, etc of the device. To get this information the host retrieves the device descriptor through a Get_Descriptor request. From the device descriptor it finds and requests the other descriptors for the device.

Part Standard USB Descriptor HID Class Descriptor Notes
bmRequestType 100 xxxxx 10000001 The bits 0-4 indicate if the descriptor is for device (0), interface (1), endpoint (2), or other (3). Bits 5-6 is the type such as standard (0), class (1), vendor (2), reserved (3). Bit 7 is the direction, either host to device (0) or device to host (1).
bRequest GET_DESCRIPTOR (0x06) GET_DESCRIPTOR (0x06) The request type.
wValue Descriptor Type and Descriptor Index Descriptor Type and Descriptor Index The descriptor type in the high byte (like configuration or interface) and descriptor index in the low byte (if there are multiple descriptors of the type, which to retrieve).
wIndex 0 (zero) or Language ID Interface Number The string descriptor requires a Language ID to handle localization. If we were getting a string descriptor with English, the wIndex would be 0x0009.
wLength Descriptor Length Descriptor Length The number of bytes to transfer.
Data Descriptor Descriptor For IN it is the bytes to hold the incoming data. For OUT it is the bytes to send.

Example Get_Descriptor request from the specification

The host will get the report descriptor for the mouse defining the data structure of reports. This allows the host to parse the items received in reports.

Usage Page (Generic Desktop),
Usage (Mouse),
Collection (Application),
    Usage (Pointer),
    Collection (Physical),
        Report Count (3),
        Report Size (1),
        Usage Page (Buttons),
        Usage Minimum (1),
        Usage Maximum (3),
        Logical Minimum (0),
        Logical Maximum (1),
        Input (Data, Variable, Absolute),
        Report Count (1),
        Report Size (5),
        Input (Constant),
        Report Size (8),
        Report Count (2),
        Usage Page (Generic Desktop),
        Usage (X),
        Usage (Y),
        Logical Minimum (-127),
        Logical Maximum (127),
        Input (Data, Variable, Relative),
    End Collection,
End Collection

Example report descriptor from the specification

Each of these lines is known as an item and have a unique number associated with it. For example Usage Page (Generic Desktop) is 0x05, 0x01. A device can have multiple reports, they are grouped together in a report descriptor with a collection item. Each report item will have some data it is trying to send, this is defined with a usage item . In the case above the usage item is sending the mouse state. The usage item will have some bounds set with a logical minimum and logical maximum. This is the minimum and maximum value of the usage respectively. The input item defines how the data above it should be interpreted, for example relative or absolute, and that it is finished defining a section of data.

When you move your pointing device or click a button then a report is sent to the host to process.

Kernel to libinput

kernel → libevdev → libinput → Compositor → Wayland client

The flow of events taken from libevdev documentation

After the pointer data is received by the kernel, it goes to evdev, then to libinput where it is picked up by the Wayland compositor. Event device, known as evdev, takes events from drivers like sermouse for serial mice for use in userspace. It is generic across architectures and hardware.

input_report_rel(dev, REL_X, buf[3]);

Example from sermouse.c of creating events caused from movements on the X-axis

After the events are reported to evdev through the usage of helpers from input.h, they can be read by userspace in a file like /dev/input/event0. There will be a whole bunch of these event files created for your various devices. If you want to get events from the file, just open and read it. It can be a lot of work to create all the handling for events, so the library libevdev can be used to do all of that for you. This makes opening an event file and reading events from it easy. The next piece is libinput which uses libevdev to read events from all the event files for input devices and returns each of these events.

Event Description
LIBINPUT_EVENT_POINTER_BUTTON When you click a button on your pointing device. The event data contains the state (pressed or released) and the button (left, middle, right, forward, back).
LIBINPUT_EVENT_POINTER_MOTION Pointer motion that is relative, for example move (-15, 20) pixels. It is associated with evdev event code EV_REL.
LIBINPUT_EVENT_POINTER_MOTION_ABSOLUTE Pointer motion that is absolute, for example a sudden jump to specific coordinates. It is associated with evdev event code EV_ABS.

Pointer events from libinput

static int open_restricted(const char *path, int flags, void *user_data)
{
    int fd = open(path, flags);
    return fd < 0 ? -errno : fd;
}

static void close_restricted(int fd, void *user_data)
{
    close(fd);
}

/**
 * libinput doesn't directly open and close files, instead we define two functions.
 */
const static struct libinput_interface interface = {
    // Opens a device from a path and returning a file descriptor.
    .open_restricted = open_restricted,
    // Takes a file descriptor and closes it.
    .close_restricted = close_restricted,
};


int main(void) {
    struct libinput *li;
    struct libinput_event *event;

    // Create libinput with udev backend. Using udev gives device hotplugging functionality
    // with events when devices are added and removed.
    li = libinput_udev_create_context(&interface, NULL, udev);
    // Tell udev to use seat0 for devices.
    libinput_udev_assign_seat(li, "seat0");
    // Start reading events.
    libinput_dispatch(li);

    while ((event = libinput_get_event(li)) != NULL) {

            // handle the event here

            libinput_event_destroy(event);

            // Read more events.
            libinput_dispatch(li);
    }

    // Clear pending events, clean up memory.
    libinput_unref(li);

    return 0;
}

Simple libinput program from the documentation

The libinput library can use the udev subsystem to handle adding and removing evdev devices to be monitored internally. The other alternative is to manually do this with libinput_path_add_device. No matter if udev or path is used for the backend, each device is assigned to a seat. A seat is a collection of inputs like a keyboard and mouse. The vast majority of computers will just have one seat being seat0. You can view the devices on your computer and which seats they are assigned to with libinput list-devices.

Device:           SINOWEALTH Game Mouse
Kernel:           /dev/input/event9
Group:            8
Seat:             seat0, default
Capabilities:     pointer

Example pointer device from libinput list-devices

Compositor

A Wayland compositor is responsibile for managing and placing clients (an application that wants to show one or more windows) on your display. Wayland is the protocol used by clients and a compositor to communicate, the clients and compositor can use a wide variety of libraries/software to accomplish this. The typical Wayland compositor uses libinput to get events for movements and button clicks. It keeps track of the movements to calculate which window should have focus of the pointer and where to render the cursor. After the compositor receives a button event from libinput, it passes the event to the currently focused window. For learning how the cursor is displayed on the screen, some understanding of how Wayland works is needed. I recommend looking at the Wayland documentation and reading the Wayland Book to get an indepth knowledge since I will just cover the basics of implementing a compositor for Linux.

The client draws itself and passes the pixel data to the compositor. Shared memory is used to send the data between the client and compositor. This data is associated with a wl_buffer which is a shared buffer between the client and compositor. A wl_buffer is attached to a wl_surface which is the object that displays the client on the screen. Double buffering is generally used by the client to reducing tearing. This can be implemented by having two wl_buffer objects and switching between them.

struct wl_callback *frame_callback;

// Create a callback for when frame is done, this means client should draw new buffer.
frame_callback = wl_surface_frame(surface)
// The frame_callback_listener should have a done callback.
wl_callback_add_listener(frame_callback, &frame_callback_listener, state);

// Attach the first buffer to the surface.
wl_surface_attach(surface, buffer_front)
wl_surface_commit(surface)
wl_display_flush()

// Here we should destroy the frame callback and request a new one.

// Attach the second buffer to the surface on done callback, the next callback should revert to
// using the first buffer and so on.
wl_surface_attach(surface, buffer_back)
wl_surface_commit(surface)
wl_display_flush()

A Wayland surface can be used in roles other than displaying a client’s application window. The client can set a surface’s buffer to a cursor and call the wl_pointer set_cursor method. This tells the compositor to show the custom cursor if the focus is in one of the client’s surfaces like for the application window. The data in the surface with the cursor role can come from an Xcursor file. Xcursor is a file format for storing images for a cursor. The format supports animations since a cursor can be animated, like with a spinner. It implements this by specifying that each image for a cursor should set a delay to wait before showing the next image in the sequence. The client can use the Xcursor library to load a cursor and put each image into the surface while waiting for the delay between each update.

Parameter Description
header 36 Image headers are 36 bytes
type 0xfffd0002 Image type is 0xfffd0002
subtype CARD32 Image subtype is the nominal size
version 1
width CARD32 Must be less than or equal to 0x7fff
height CARD32 Must be less than or equal to 0x7fff
xhot CARD32 Must be less than or equal to width
yhot CARD32 Must be less than or equal to height
delay CARD32 Delay between animation frames in milliseconds
pixels LISTofCARD32 Packed ARGB format pixels

Format of an image in an Xcursor file taken from the specification

The surfaces are stored by the wl_compositor object. They are combined together into a single displayable output by a renderer. The rendering can be done in various ways, but the most common method is with OpenGL ES (GLES) with EGL used for the context (basically state for GLES) and other window system functionality. EGL is cross-platform and has no dependencies on other window systems, unlike others such as GLX. To see an example of how a Wayland compositor can implement a GLES renderer, you can look at the wlroots GLES 2.0 implementation. The rendered content still needs a way to be displayed.

The Direct Rendering Manager (DRM) is a subsystem to interact with GPUs. This subsystem has an expansion called Kernel Mode-Setting (KMS) that allows configuring of the display pipeline, it is used by compositors to display the rendered output to your monitor. The display pipeline is the sequence of steps from userspace to what is outputted to a connector.

  1. Provide a framebuffer (pixel data) to a plane. There are three types of planes the GPU provides:
    • Primary: Required plane used for primary content (like compositor output), page flipping (double buffer), and mode setting.
    • Cursor: Optional plane for hardware cursor.
    • Overlay: Optional plane for all other content to be displayed. Unlike the other two types, the GPU can provide more than one overlay plane.
  2. The plane takes the framebuffer and applies cropping, scaling, rotation, and other properties to the source.
  3. The CRTC takes all the planes and blends them together.
  4. The encoder takes the CRTC data and converts it to a format the connector uses like VGA or HDMI.
  5. The connector receives the encoder data and sends it to the display.

I recommend the talk about KMS that Simon Ser did for foss-north for more information.

PrimaryFramebuffer PrimaryPlane CursorPlane Encoder Encoder CursorFramebuffer CRTC Connector(Examples: HDMI/VGA/DVI) Connector(Examples: HDMI/VGA/DVI)

Diagram of display pipeline for KMS

You probably noticed the mention of cursor plane, this is where the distinction between the hardware and software cursor is.

Hardware cursor

A hardware cursor means that the primary framebuffer is not modified when a person moves the pointer device. Only the framebuffer for the cursor plane is modified to show changes like movement or icon updates. This gives a performance boost by decreasing the amount of data to be rendered.

Software cursor

It is not possible to use the cursor plane for some hardware since it is old or does not support it. For this case a software cursor is used, this means that the cursor is rendered in the same framebuffer as the primary.

References

I browsed through many dozens of websites researching this article. I linked to some of them in the article, but here are a few more that I found useful.

Input Protocols

EGL

DRM/KMS

Wayland