As containers share the same kernel with the host, isolating and controlling the containers from the rest of the system becomes very important. Linux kernel does not have any specific code for the containers, these containers are implemented by multiple independent Linux kernel features like Namespaces, Cgroups, combined with filesystem tricks and enhanced with security features—these features control and limit containers. Here we will particularly talk about Namespaces and Cgroups.
Namespaces
Namespaces in Linux provide a layer of isolation for different aspects of a system, such as process IDs, network interfaces, mount points, and more. Each namespace provides an isolated environment for processes, ensuring they don’t interfere with resources outside their namespace. Docker uses namespaces to create an isolated environment for each container, making it seem like each container is running on a separate system.
One point to remember, Namespaces exist even when you don’t use containers, there will be one namespaces of each type containing all the processes on the system. This is a bit similar to the UID field in UNIX processes as in UNIX all processes have the UID even if there is no user present on the system.
Types of Namespaces:
- PID Namespace (Process ID): When a new PID namespace is created, processes within it see themselves as having PID 1, and the hierarchy of processes is isolated from the host. This means a process inside a PID namespace cannot see or affect processes outside of it.
- Net Namespace (Network): Each net namespace has its own network stack, including IP addresses, routing tables, and firewall rules. This isolation allows each container to have its own network interfaces and routes independently of others.
- Mount Namespace: The mount namespace provides each container with its own set of mount points. When a mount namespace is created, it can be populated with a unique set of mounts, making the filesystem appear differently to each container.
- UTS Namespace: The UTS namespace allows each container to have a different hostname and domain name, providing a separate identity for networked applications.
- IPC Namespace: Each IPC namespace has its own set of IPC objects, preventing processes in different namespaces from communicating via IPC mechanisms.
- User Namespace: User namespaces allow a process to have a different user and group ID inside the namespace than outside. This provides a way to map users and groups inside a container to different users and groups on the host, enhancing security by running processes with non-root privileges inside the namespace.
Example 1: PID Namespace Isolation
# Run a process on the host
sleep 1000 &
# List processes
ps aux | grep sleep
# Output:
# root 12345 0.0 0.0 4288 544 pts/0 S 12:00 0:00 sleep 1000
In this case, the process ID (PID) is globally visible and can interfere with other processes.
With PID Namespace (using Docker):
# Run an Nginx container
docker run -d --name nginx-container nginx
# Enter the container's shell
docker exec -it nginx-container /bin/bash
# List processes inside the container
ps aux
# Output:
# PID USER TIME COMMAND
# 1 root 0:00 nginx: master process nginx -g daemon off;
# 6 nginx 0:00 nginx: worker process
Here, the Nginx container has its own PID namespace, starting PIDs from 1, isolated from the host system’s PIDs.
Example 2: Network Namespace Isolation
# Run a web server on the host
python3 -m http.server 8080 &
# List network interfaces
ip addr show
The web server binds to the host’s network interfaces, potentially causing port conflicts with other services.
With Network Namespace (using Docker):
# Run an Nginx container
docker run -d --name nginx-container nginx
# Enter the container's shell
docker exec -it nginx-container /bin/bash
# List network interfaces inside the container
ip addr show
The Nginx container has its own network interfaces, isolated from the host, preventing port conflicts and ensuring network security.
Control Groups (Cgroups)
Cgroups are a Linux kernel feature used to limit, prioritize, and account for resources (CPU, memory, disk I/O, etc.) used by processes. Docker uses cgroups to enforce resource limits on containers, ensuring they get their fair share of system resources without affecting other containers or the host system.
Key Features of Cgroups:
- Resource Limiting: Set limits on CPU, memory, disk I/O, and other process resources.
- Resource Prioritization: Assign different priorities to processes to control their access to resources.
- Accounting: Track resource usage by processes for monitoring and reporting.
- Control: Freeze, resume, and terminate groups of processes.
That’s all for now.
Thank you for reading!!
Stay tuned for more articles on Cloud and DevOps. Don’t forget to follow me for regular updates and insights.