Container Load Balancing with Linux bridges, veths, and IPVS

Dec 31, 2024

Table of Contents

I’ve been working on a project that takes requests and load balances them between containers. Using Docker or Kubernetes makes this easy, but I wondered how these systems work, so I decided to try and implement it myself. I’ve been using containerd to programmatically create and manage containers. It works quite well for this task, however it does not provide networking to or between containers. This is a problem for the load balancing part of my project, so I searched for a solution. The solution I landed with uses a Linux bridge, virtual Ethernet devices, and IPVS to provide load balancing and networking to and between containers.

Definitions

  • namespaces: A namespace in Linux isolates resources from each other. For example a network namespace is an isolated group that has its own interfaces, routing table, and IP addresses. Processes can run inside a namespace and use its resources, but cannot access resources in namespaces it is not in.
  • bridge: A bridge in Linux is a virtual device that forwards traffic between other virtual devices. Unlike physical bridges, it is not limited to two ports.
  • veth: A virtual Ethernet device gives you two ends called “peers”, when you send traffic into one peer, it exists out of the other peer. Each peer can be in the same or different namespaces.
  • vip: A virtual IP address is an address that is not connected to a specific interface. It is used to route traffic to IP addresses.
  • IPVS: IP Virtual Server is a module in the Linux kernel (ip_vs) that can be used load balance TCP/UDP traffic between a group of IP addresses. For a group of IP addresses a vip is given.

Problem

Each container has its own namespaces, including a network namespace which allows containers to be separated from each other. However, sometimes we wish two containers to communicate with each other, such as an application talking to a database. We may also want traffic to be load balanced between a group of containers. For example, let’s say we have two containers c0 and c1 running that contains a webserver on port 8080 that prints a list of interfaces and their addresses (essentially ip addr served on 8080). We want to load balance between these two containers.

Solution

When using containerd, the ctr CLI is helpful to inspect what is happening. For example if we want to see a list of running processes of containers we can run ctr task list and get:

TASK    PID      STATUS
c0      65077    RUNNING
c1      65128    RUNNING

We can see more information about these containers by doing lsns which will list the existing namespaces.

        NS TYPE   NPROCS   PID USER            COMMAND
4026533267 mnt         2 65077 root            /bin/sh ./server.sh
4026533268 uts         2 65077 root            /bin/sh ./server.sh
4026533269 ipc         2 65077 root            /bin/sh ./server.sh
4026533270 pid         2 65077 root            /bin/sh ./server.sh
4026533271 net         2 65077 root            /bin/sh ./server.sh
4026533381 mnt         2 65128 root            /bin/sh ./server.sh
4026533382 uts         2 65128 root            /bin/sh ./server.sh
4026533383 ipc         2 65128 root            /bin/sh ./server.sh
4026533384 pid         2 65128 root            /bin/sh ./server.sh
4026533385 net         2 65128 root            /bin/sh ./server.sh

Looking at this printout we can see that there was a network namespaces created for each container (where TYPE is net).

4026533271 net         2 65077 root            /bin/sh ./server.sh
4026533385 net         2 65128 root            /bin/sh ./server.sh

A simple way to provide network connectivity to a container is to create a veth and then put one peer inside the network namespace belonging to the container.

# Create a veth with two peers, one named c0veth0 and the other c0veth1
ip link add c0veth0 type veth peer name c0veth1
# Move the c0veth0 to the c0 namespace, which is at PID 65077
ip link set c0veth0 netns 65077

We can check that this veth peer has been moved using the nsenter command which allows us to run commands inside a namespace.

nsenter -t 65077 -n ip addr
2: c0veth0@if2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ba:79:db:15:10:b1 brd ff:ff:ff:ff:ff:ff link-netnsid 0

Now let’s give the c0veth0 interface an IP address.

nsenter -t 65077 -n ip link set c0veth0 up
# Set the address of c0veth0 to 10.0.0.2
nsenter -t 65077 -n ip addr add 10.0.0.2/32 broadcast 10.0.0.2 dev c0veth0
# Route traffic going to 10.0.0.1 through c0veth0 with the source address 10.0.0.2
nsenter -t 65077 -n ip route add 10.0.0.1 dev c0veth0 src 10.0.0.2

And let’s give our c0veth1 interface the 10.0.0.1 address on a /24.

ip addr add 10.0.0.1/24 dev c0veth1

If we do a curl 10.0.0.2:8080 we see the webserver returning the interfaces.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: c0veth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ba:79:db:15:10:b1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.2/32 brd 10.0.0.2 scope global c0veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b879:dbff:fe15:10b1/64 scope link
       valid_lft forever preferred_lft forever

Creating a bridge

We can now see that we are able to create a veth and route traffic to a specific container. The next step is to create a load balancing system that uses IPVS to route traffic between addresses. In this example c0 will have the IP address 10.0.0.2 and c1 will have 10.0.0.3. Assume that I have setup c1veth0 and c1veth1 using the same method as above.

Let me delete the 10.0.0.1 address from c0veth1 because we will need this address.

ip addr del 10.0.0.1/24 dev c0veth1

Since we want routing to and between containers, let’s create a bridge and attach both c0veth1 and c1veth1 to it. This will allow the two veths to communicate on a shared network.

ip link add name br0 type bridge
ip link set dev br0 up
# Assign 10.0.0.1 to the bridge
ip addr add 10.0.0.1/24 dev br0

Now connect both c0veth1 and c1veth1 to br0.

ip link set c0veth1 master br0
ip link set c1veth1 master br0

Afterwards we can do curl 10.0.0.2:8080 and curl 10.0.0.3:8080 to reach each container.

Load Balancing

With these two addresses we can use IPVS to load balance between them. First enable the IPVS modules.

# Enable IPVS
modprobe ip_vs
# Enable round robin algorithm.
# IPVS supports many scheduling algorithms, view the man page for full list.
modprobe ip_vs_rr

Now create a virtual service with a vip of 10.0.0.4 that uses round robin on port 8080.

ipvsadm -A -t 10.0.0.4:8080 -s rr

Add both of the containers as targets.

ipvsadm -a -t 10.0.0.4:8080 -r 10.0.0.2:8080 -m
ipvsadm -a -t 10.0.0.4:8080 -r 10.0.0.3:8080 -m

If we do a curl we can see it rotates between requesting each container.

user@host# curl 10.0.0.4:8080
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: c1veth0@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 96:33:b4:9d:3a:b3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.3/32 brd 10.0.0.3 scope global c1veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::9433:b4ff:fe9d:3ab3/64 scope link
       valid_lft forever preferred_lft forever
user@host# curl 10.0.0.4:8080
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: c0veth0@if2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether ba:79:db:15:10:b1 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.2/32 brd 10.0.0.2 scope global c0veth0
       valid_lft forever preferred_lft forever
    inet6 fe80::b879:dbff:fe15:10b1/64 scope link
       valid_lft forever preferred_lft forever

End

I hope to continue experimenting and learning about containers and networks, in the future I will write more blog posts about it. Feel free to email me if you have questions or if I got something wrong.