How to build a window manager in Python

Aug 17, 2022

Building your own window manager might sound difficult, but it actually isn’t too bad. Most of the work is already handled by libraries and programs, all that is needed is to glue them together. You can write your own and have a solid understanding of it in a weekend of building. If you follow through this guide the end result will be similar to an example Python window manager I created and published on GitHub. You can also look at mwm, a window manager I wrote in Rust.

Some definitions that might help:

  • windowing system: A system generally made up of a display server, window manager, and clients. It defines how each part of the system communicates with each other.
  • client: Application that communicates with the display server to listen for events (like keyboard/mouse events) and can request for windows to be created. An example of a client is GIMP.
  • display server: Program that takes requests from clients and works with the window manager and kernel to draw on screens. It sends events from hardware such as the mouse and keyboard to clients and the window manager.
  • window manager: Communicates with the display server to decide the position of clients, which clients are visible, and adds decorations to clients such as title bars. A window manager is considered a client.
  • window: A portion of the screen where a client’s interface is shown.

There are several windowing systems, but this tutorial is for the X Window System. It’s made up of several parts that include:

  • X.Org Foundation: The organization that manages the X system standards and an implementation of it.
  • X Protocol: Defines how the X server and X clients should communicate.
  • X11: The 11th version of the X protocol.
  • EWMH: Specifies additional functionality that the window manager and X clients can use. This includes dialog boxes, taskbars, pagers, fullscreen, virtual desktopes. and more. The EWMH spec adds to the ICCCM spec. You only need to implement the parts of the spec you want to use such as dialog boxes or a taskbar.
  • X Server: The display server for the X Window System. It keeps track of clients, windows,
  • Display: Think of it as an address for an X server. So localhost:0 is an X server with display 0. You can have another X server running on localhost:1 with the display being 1.
  • Xlib: Library that is used to communicate (over the X11 protocol) with the X Server from clients including the window manager.
  • xcb: Alternative to Xlib. I prefer the syntax of xcb so that is what this guide uses.
Client (Example: GIMP) (X Protocol) (X Protocol) X Server Window Manager I/O (Examples: Screen, Mouse, Keyboard)

Diagram of the X Window System

To continue you will need to install all the required software on your machine. To get started quickly I recommend installing the xorg group from your package manager.

# Arch
sudo pacman -S xorg
# Debian
sudo apt-get install xorg
# Void
sudo xbps-install -S xorg

The xorg group should include tools like startx, xinit, and Xephyr along with the X server. When your computer boots your init system should run xstart which runs xinit with some default params. This starts an X server and reads the .xinitrc in your home directory. The .xinitrc script should launch your window manager. To reduce the hassle of development use Xephyr to create a nested X server.

In preview.sh:

#!/bin/sh

# Stop running if we hit an error
set -e

# The xinit command takes a first option of a xinitrc script to run
# and a second option of the command to start an X server.

# Run xinit then specify Xephyr command to start a nested X server
# with a display of 1 to use. In X11 this basically means that
# an X server runs on localhost:1 If display 1 is in use, change it
# to another number like 10.
xinit ./xinitrc -- $(command -v Xephyr) :1 -screen 1024x768

In xinitrc:

#!/bin/sh

# Python script to run
exec ./run_pwm

In run_pwm:

#!/usr/bin/env python3

from pwm.window_manager import WindowManager

if __name__ == '__main__':
    WindowManager().run()

Next create a Python module in the folder pwm.

user@machine pwm/ > ls
__init__.py  utils.py  window_manager.py

Writing the bulk of the window manager is next. The main action a window manager does is to take events sent from the X server and respond to them. Events can include:

  • CreateNotify: When a new window is created. Generated from CreateWindow requests.
  • DestroyNotify: When a window is destroyed. Generated from DestroyWindow requests.
  • MapRequest: When a window wishes to be made visible on the screen. Generated from MapWindow requests.
  • UnmapRequest: When a window wishes to be hidden from the screen. Generated from UnmapWindow requests.
  • ConfigureRequest: When a window wishes to change its configuration. This can include x/y position, width/height, and border width. Generated from ConfigureWindow requests.
  • KeyPress: When a key combination that the window manager is listening for occurs.

A simplified version of the event flow might look something like:

  1. User presses keys to launch a program.
  2. A window for the program is created.
  3. Window manager gets CreateNotify event.
  4. Window requests to be mapped.
  5. Window manager gets MapRequest event. The window manager configures window size/border, maps window, and adds the window to a list.
  6. A ConfigureRequest is generated from above.

Let’s now create a WindowManager class in window_manager.py. This class should connect to the X server, handle events, allow the user to launch programs, and allow the user to switch between windows using the left and right arrow keys.

from pwm.utils import KeyUtil
import logging
import subprocess
import xcffib
import xcffib.xproto
import yaml


# Constants to use in window manager
NEXT_WINDOW = 'NEXT_WINDOW'
PREVIOUS_WINDOW = 'PREVIOUS_WINDOW'


class WindowManager:
    def __init__(self):
        # Load config file. This should be moved into ~/.config/ and be read from there.
        with open('config.yaml') as f:
            self.config = yaml.safe_load(f)

        # Connect to X server. Since we don't specify params it will connect to display 1
        # that was set in preview.sh.
        self.conn = xcffib.connect()

        # Class that contains utils for switching between keycodes/keysyms
        self.key_util = KeyUtil(self.conn)

        # Get first available screen
        self.screen = self.conn.get_setup().roots[0]

        # All windows have a parent, the root window is the final parent of the tree of windows.
        # It is created when a screen is added to the X server. It is used for background images
        # and colors. It also can receive events that happen to its children such as mouse in/out,
        # window creation/deletion, etc.
        self.root_window = self.screen.root

        # An array of windows
        self.windows = []
        self.current_window = 0

Create a basic config in config.yaml.

# Modifier is added to actions, so press Mod1 + c to open xclock
# Run xmodmap to see mappings
modifier: "Mod1"

actions:
 # Programs to launch
 - command: "xclock"
   key: "c"
 - command: "st"
   key: "t"
 # How to cycle through windows
 - action: "NEXT_WINDOW"
   key: "Right"
 - action: "PREVIOUS_WINDOW"
   key: "Left"

In the run method of WindowManager setup and start the event loop of the window manager.

def run(self):
    """
    Setup and run window manager. This includes checking if another window manager is
    running, listening for certain key presses, and handling events.
    """

    # Tell X server which events we wish to receive for the root window.
    cookie = self.conn.core.ChangeWindowAttributesChecked(
        self.root_window,
        xcffib.xproto.CW.EventMask,  # Window attribute to set which events we want
        [
            # We want to receive any substructure changes. This includes window
            # creation/deletion, resizes, etc.
            xcffib.xproto.EventMask.SubstructureNotify |
            # We want X server to redirect children substructure notifications to the
            # root window. Our window manager then processes these notifications.
            # Only a single X client can use SubstructureRedirect at a time.
            # This means if the request to changes attributes fails, another window manager
            # is probably running.
            xcffib.xproto.EventMask.SubstructureRedirect,
        ]
    )

    # Check if request was valid
    try:
        cookie.check()
    except:
        logging.error(logging.traceback.format_exc())
        print('Is another window manager running?')
        exit()

    # Loop through actions listed in config and grab keys. This means we
    # will get a KeyPressEvent if the key combination is pressed.
    for action in self.config['actions']:
        # Get keycode from string
        keycode = self.key_util.get_keycode(
            KeyUtil.string_to_keysym(action['key'])
        )

        # Get modifier from string
        modifier = getattr(xcffib.xproto.KeyButMask, self.config['modifier'], 0)

        self.conn.core.GrabKeyChecked(
            # We want owner_events to be false so all key events are sent to root window
            False,
            self.root_window,
            modifier,
            keycode,
            xcffib.xproto.GrabMode.Async,
            xcffib.xproto.GrabMode.Async
        ).check()

    while True:
        event = self.conn.wait_for_event()

        if isinstance(event, xcffib.xproto.KeyPressEvent):
            self._handle_key_press_event(event)
        if isinstance(event, xcffib.xproto.MapRequestEvent):
            self._handle_map_request_event(event)
        if isinstance(event, xcffib.xproto.ConfigureRequestEvent):
            self._handle_configure_request_event(event)

        # Flush requests to send to X server
        self.conn.flush()

When creating a window a MapRequest event is sent to the window manager asking to be mapped (to be made viewable). The client can tell the window manager not to manage it with the override_redirect attribute. If this attribute is set then the window manager should ignore the window.

def _handle_map_request_event(self, event):
    """
    When a window wants to map, meaning make itself visible, it send a MapRequestEvent that
    gets send to the window manager. Here we add it to our client list and finish by sending
    a MapWindow request to the server. This request tells the X server to make the window
    visible.
    :param event: MapRequestEvent to handle
    """

    # Get attributes associated with the window
    attributes = self.conn.core.GetWindowAttributes(
        event.window
    ).reply()

    # If the window has the override_redirect attribute set as true then the window manager
    # should not manage the window.
    if attributes.override_redirect:
        return

    # Send map window request to server, telling the server to make this window visible
    self.conn.core.MapWindow(event.window)

    # Resize the window to take up whole screen
    self.conn.core.ConfigureWindow(
        event.window,
        xcffib.xproto.ConfigWindow.X |
        xcffib.xproto.ConfigWindow.Y |
        xcffib.xproto.ConfigWindow.Width |
        xcffib.xproto.ConfigWindow.Height,
        [
            0,
            0,
            self.screen.width_in_pixels,
            self.screen.height_in_pixels,
        ]
    )

    # Add event window to window list
    if event.window not in self.windows:
        self.windows.insert(0, event.window)
        self.current_window = 0

The window manager should handle ConfigureRequest events.

def _handle_configure_request_event(self, event):
    """
    A configure request is a request that is asking to change a certain thing about a window.
    This can include width/height, x/y, border width, border color, etc.
    :param event: ConfigureRequestEvent to handle
    """

    # Pass on configure request to X server
    self.conn.core.ConfigureWindow(
        event.window,
        xcffib.xproto.ConfigWindow.X |
        xcffib.xproto.ConfigWindow.Y |
        xcffib.xproto.ConfigWindow.Width |
        xcffib.xproto.ConfigWindow.Height |
        xcffib.xproto.ConfigWindow.BorderWidth |
        xcffib.xproto.ConfigWindow.Sibling |
        xcffib.xproto.ConfigWindow.StackMode,
        [
            event.x,
            event.y,
            event.width,
            event.height,
            event.border_width,
            # Siblings are windows that share the same parent. When configuring a window
            # you can specify a sibling window and a stack mode. For example if you
            # specify a sibling window and Above as the stack mode, the window
            # will appear above the sibling window specified.
            event.sibling,
            # Stacking order is where the window should appear.
            # For example above/below the sibling window above.
            event.stack_mode
        ]
    )

The mapping and configuring of windows is finished, but the window manager still needs a way to handle KeyPress events. The first thing to look at is how X11 handles keyboards. Keyboards have physical keys that the X server must turn into something it can understand. Each physical key is is mapped to a keycode with values ranging 8-255. There has been discussion around the choice to only allow keycodes in this range.

Keycodes can be turned into a character called a keysym. Your keyboard probably doesn’t have different physical keys for both upper and lowercase letters. When converting from a keycode to a keysym the resulting keysym can be different based on if you press a modifier key. The keys caps lock and shift are considered modifiers, you can use xmodmap to see a list of them. Look what keysyms result from pressing the following physical keys:

  • Shift key + T key: Produces the T keysym
  • T key: Produces the t keysym

I recommend reading the Keyboards section of the X protocol spec for more information.

How does this relate to the window manager? In the config file there are keysyms that must be converted to keycodes when grabbing a key to listen for presses or when handling KeyPress events. This is seen in the run method. There needs to be a way to convert between keycodes and keysyms in the window manager. To do this create a KeyUtil class in utils.py.

import xpybutil
import xpybutil.keybind


class KeyUtil:
    def __init__(self, conn):
        self.conn = conn

        # The the min and max of keycodes associated with your keyboard. A keycode will never
        # be less than eight because I believe the 0-7 keycodes are reserved. The keycode zero
        # symbolizes AnyKey and I can't find references to the other seven. The max keycode is 255.
        self.min_keycode = self.conn.get_setup().min_keycode
        self.max_keycode = self.conn.get_setup().max_keycode

        self.keyboard_mapping = self.conn.core.GetKeyboardMapping(
            # The array of keysyms returned by this function will start at min_keycode so that
            # the modifiers are not included.
            self.min_keycode,
            # Total number of keycodes
            self.max_keycode - self.min_keycode + 1
        ).reply()

    def string_to_keysym(string):
        return xpybutil.keysymdef.keysyms[string]

    def get_keysym(self, keycode, keysym_offset):
        """
        Get a keysym from a keycode and state/modifier.

        Only a partial implementation. For more details look at Keyboards section in X Protocol:
        https://www.x.org/docs/XProtocol/proto.pdf

        :param keycode: Keycode of keysym
        :param keysym_offset: The modifier/state/offset we are accessing
        :returns: Keysym
        """

        keysyms_per_keycode = self.keyboard_mapping.keysyms_per_keycode

        # The keyboard_mapping keysyms. This is a 2d array of keycodes x keysyms mapped to a 1d
        # array. Each keycode row has a certain number of keysym columns. Imagine we had the
        # keycode for 't'. In the 1d array we first jump to the 't' row with
        # keycode * keysyms_per_keycode. Now the next keysyms_per_keycode number
        # of items in the array are columns for the keycode row of 't'. To access a specific
        # column we just add the keysym position to the keycode * keysyms_per_keycode position.
        return self.keyboard_mapping.keysyms[
            # The keysyms array does not include modifiers, so subtract min_keycode from keycode.
            (keycode - self.min_keycode) * self.keyboard_mapping.keysyms_per_keycode + keysym_offset
        ]

    def get_keycode(self, keysym):
        """
        Get a keycode from a keysym

        :param keysym: keysym you wish to convert to keycode
        :returns: Keycode if found, else None
        """

        # X must map the keys on your keyboard to something it can understand. To do this it has
        # the concept of keysyms and keycodes. A keycode is a number 8-255 that maps to a physical
        # key on your keyboard. X then generates an array that maps keycodes to keysyms.
        # Keysyms differ from keycodes in that they take into account modifiers. With keycodes
        # 't' and 'T' are the same, but they have different keysyms. You can think of 'T'
        # as 't + CapsLock' or 't + Shift'.

        keysyms_per_keycode = self.keyboard_mapping.keysyms_per_keycode

        # Loop through each keycode. Think of this as a row in a 2d array.
        # Row: loop from the min_keycode through the max_keycode
        for keycode in range(self.min_keycode, self.max_keycode + 1):
            # Col: loop from 0 to keysyms_per_keycode. Think of this as a column in a 2d array.
            for keysym_offset in range(0, keysyms_per_keycode):
                if self.get_keysym(keycode, keysym_offset) == keysym:
                    return keycode

        return None

The key press event gives us a detail and state value. The detail value is the keycode and the state value are the modifiers. Using these two values compare with characters in the config to handle actions.

def _handle_action(self, action):
    """
    Handle actions defined in config.yaml
    :param action: Window manager action to handle
    """

    # Cycle to the next window
    if action == NEXT_WINDOW:
        if len(self.windows) == 0: return

        self.conn.core.UnmapWindow(self.windows[self.current_window])

        # Get the next window
        self.current_window += 1
        # If overflowed go to the first window
        if self.current_window >= len(self.windows):
            self.current_window = 0

        self.conn.core.MapWindow(self.windows[self.current_window])

    # Cycle to the previous window
    if action == PREVIOUS_WINDOW:
        if len(self.windows) == 0: return

        self.conn.core.UnmapWindow(self.windows[self.current_window])

        # Get the previous window
        self.current_window -= 1
        # If underflowed go to last window
        if self.current_window < 0:
            self.current_window = len(self.windows) - 1

        self.conn.core.MapWindow(self.windows[self.current_window])

def _handle_key_press_event(self, event):
    """
    We receive key press events on windows below the root_window that match keysyms we are
    listening on from the GrabKey method above.
    :param event: KeyPressEvent to handle
    """

    for action in self.config['actions']:
        keycode = self.key_util.get_keycode(
            KeyUtil.string_to_keysym(action['key'])
        )

        modifier = getattr(xcffib.xproto.KeyButMask, self.config['modifier'], 0)

        # If the keycode and modifier of the action match the event's keycode/modifier then
        # run the command.
        if keycode == event.detail and modifier == event.state:
            if 'command' in action:
                subprocess.Popen(action['command'])
            if 'action' in action:
                self._handle_action(action['action'])

That’s it, you can now launch applications and switch between them based on what is in the config file. Remember to look at the GitHub repo for the complete code. Give it a star if you’ve found it useful :). This is a basic window manager, but you can take it in different directions. Try creating a tiling or stacking window manager from it.

I want to thank all the resources I found on the internet. I’ve browsed many websites for info about the X Window System. Here are a few that I can remember and inspired me:

Finally, if you’re interested in other windowing systems besides X11 be sure to read Drew DeVault’s introduction to Wayland and his articles on writing a Wayland compositor.