Traditionally, the telecom industry has used clusters to meet its carrier-grade requirements of high availability, reliability and scalability, while relying on cost-effective hardware and software. Efficient cluster security is now an essential requirement that has not yet been addressed in a coherent fashion.
To answer the need for advanced security features on Linux clusters in the telecom world, the Open Systems Lab at Ericsson Research (Montréal, Canada) started a project called Distributed Security Infrastructure (DSI). The main goal of DSI is to design and develop a secure infrastructure that provides advanced security mechanisms for telecom applications running on carrier-grade Linux clusters. One important component of DSI is the distributed security module (DSM), which provides an implementation of mandatory access control within a Linux cluster.
In this article, we discuss the goals of having a distributed security module, architecture, features, performance and implementation status. We also offer a tutorial that explains how to install DSM and experiment with it.
Currently implemented security mechanisms rely on discretionary access-control mechanisms. These mechanisms, however, are inadequate to protect against the various kinds of attacks possible in today's complex environments. Access decisions are based on user identity and ownership. As a consequence, these mechanisms are easy to bypass, and malicious applications easily can cause failures and breaches in system security.
Various research results have shown that mandatory security provided by the operating system is essential for the security of the whole system. Furthermore, they proved that mandatory access-control mechanisms are efficient for supporting complex relationships between different entities in the computing environment.
As part of the DSI Project, we address the design and implementation of a framework for mandatory access control. We are implementing cluster-aware access-control mechanisms as a Linux loadable module. Our work in this area will help position Linux as a secure operating system for clustered servers.
Our work is based mainly on the Flask architecture and the Linux security module (LSM) framework; however, our focus is on Linux clustered servers, not single Linux servers. We address the performance challenges of the cluster security because enforcing security may introduce degradation in system performance, an increase in administration and some annoyance for the user.
One important aspect of our DSM implementation is its distributed nature. This aspect provides location transparency of the security resources in the cluster from the security point of view.
The LSM framework does not provide any additional security in the Linux kernel. Rather, it provides the infrastructure to support the development of security modules. The LSM kernel patch adds security fields to kernel data structures and inserts calls (called hooks) at special points in the kernel code to perform a module-specific access-control check.
LSM adds methods for registering and unregistering security modules, in addition to a general security system call that allows communication between user programs and the LSM for security-aware applications. Each LSM hook is a function pointer in a global structure called security_ops. Because the hooks are embedded in the kernel and are called even before a security module is installed, this structure is initialized to a set of functions provided by a dummy security module. These functions are simply placeholders for more useful security mechanisms that can be loaded as a Linux module. A register_security method is introduced to allow a security module to set its own security functions to overlay the dummy functions. An unregister_security method is used to return to the dummy functions.
The LSM methods are organized into two categories: 1) hooks to handle the security fields and 2) hooks to perform access control. When a Linux resource is created, the security label is attached to it. These labels are used to enforce mandatory access control with the security hooks. When the object is destroyed, the label is removed. Hooks to handle the security fields are used for label creation and removal. An example of those hooks are alloc_security and free_security in the task_security_ops structure. The process of mandatory access control using LSM is presented in Figure 1.
Let's assume that the subject (a process in this case) has a security ID of SSec and is trying to access (1) the resource (a file in this case) having the security ID TSec. To perform the access, the subject issues the system call (2). The system call is handled by the Linux kernel code (the system call interface in Figure 1). Before the access decision is taken, the kernel consults (using security hooks) the LSM module (3), where the user-specific security is implemented as a function “f”. LSM will compute the function “f” and return the results to the kernel. The kernel will then either grant or deny access to the target resource (4).
The distributed security module is part of DSI. The purpose of implementing DSM is to enforce access control and to provide labeling for the IP messages across the nodes of the cluster with the security attributes of the sending process and node.
We started the development of DSM using Linux kernel 2.4.17 and the appropriate security patch for that kernel version (lsm-full-2002_01_15-2.4.17.patch). The implementation of DSM was based on CIPSO and FIPS-188 standards, which specify the IP header modification (adding options to IP header), and on hooks added to the IP stack.
Because DSM is a component of DSI, let's briefly look at it. As part of a carrier-grade Linux cluster, DSI must comply with carrier-grade requirements such as reliability, scalability and high availability. Furthermore, DSI supports the following requirements: coherent framework, process-level approach, pre-preemptive security, dynamic security policy, transparent key management and minimal impact on performance.
The security server is the central point of management. It is the central security authority for all other security components in the system, and it is responsible for the distributed security policy. It also defines the dynamic security environment of the whole cluster by broadcasting changes in the distributed policy to all security managers.
Security managers enforce security locally at each node of the cluster. They are responsible for enforcing changes in the security environment. Security managers only exchange security information with the security server. For a detailed description of DSI, see Resources.
Designing an efficient solution to the cluster mandatory access control is a complex task. Many factors are involved in defining the access rights, because the subjects and resources can be located on different nodes in the cluster. To simplify the relationships, we can handle the access control at two levels. At the local level, the subject and resource are located on the same node while on the remote level, the subject and resource are located on different nodes.
For local access control, the access rights are the functions of the security IDs of the subject (SSID) and the resource (TSID), see Figure 1. This equation is based on the FLASK architecture: Access = Function (SSID, TSID).
The FLASK architecture can serve as a solution for single-node processing. When the nodes are presented as a cluster, security solutions become more complicated. In this case, we extend the FLASK architecture to the cluster remote access model (Figure 3).
One of the new parameters is the security node ID (SnID), an ID that, as the name implies, defines the node in terms of security. Access rights are not only the function of the subject and target security IDs, but the function of the security node ID as well.
The architecture of the distributed access control is illustrated in Figure 3. The equation of the architecture can be described as Access = Function (SnID2, SID2, SnID1, SID1).
An important part of the distributed system is the network, which spans the nodes of the cluster. To apply the access control functions in the cluster, there must be a way to pass the security parameters between the nodes in a transparent fashion. Our prototype implementation of distributed mandatory access control will be exercised in the Linux kernel, which provides us security transparency and better performance.
In Figure 3, we illustrate an example of how the distributed access control works. The security server is responsible for passing the security policy to the security module. It is also responsible for providing the security node ID to each node of the cluster (1). Let's assume the subject2 in node SnID2 tries to access a resource (file) on the node SnID1 (2). In this case subject2 first must get access to the local communication resources (3) and then get a pair of identifiers (SnID2, SID2) that then must be passed to the remote node (4). Those identifiers will be validated on the remote node SnID1. When the access is granted, the message can be passed to process 1 (5). Now process 1 will perform an access on behalf of process 2 (6).
In greater detail, this section explains what happens on a single node of a cluster. Access control on any node of the cluster consists of two parts (Figure 4):
Kernel space: responsible for implementing both the enforcement and the decision-making tasks of access control as separate responsibilities. The kernel space maintains the security policy upon which it bases its decisions. The security policy is supplied by the security server and stored in the local memory for fast access (hash table).
User space: its many responsibilities (Figure 4) include taking the information from the distributed security policy (1) and from the security context repository, combining them together and feeding them to the kernel space part in an easily usable form (2, 3 and 4). It propagates alarms from the kernel space back to the security manager, which will feed them to the auditing and logging services and, if necessary, propagate them to the security server through the security communication channel (see Figure 2).
Both kernel space and user space are started and monitored by the local security manager (SM) on each node. The SM also introduces them to other services and subsystems of DSI with which they need to interact. When a user process tries to access the system resource (5), the system call is forwarded to DSM (6), where the decision is made based on the DSP internal representation (7).
All the subjects and resources must be labeled. Because the security module can be loaded at runtime, we distinguish two modes of subject labeling:
Before the module is loaded, no labels are attached to any subject or resource in the system. At module initialization time, all the running tasks are scanned, and labels are attached to them.
When a new process is created after the security module is loaded, the security hooks do the labeling.
Since Linux stores the process descriptor and the kernel mode process stack in a single 8KB memory area, we can use this fact and avoid allocating memory for labeling the subjects (Figure 5).
The other labels are attached to the resources at runtime, which implies that the module checks if the label is there. If the label is not attached, a new label will be created.
Because access in the cluster can be achieved from a subject located on one node to a resource located on another node (Figure 3), additionally there is a need to control such accesses.
When a process on one node accesses a resource on another node, local access to the communications resources (socket, network interface) is checked first. When local access is granted, the message can be sent to the remote location.
In order to identify the sending subject, the Security Node ID (security node identifier) and the Security ID of the subject (security subject identifier) are added to the IP packet. For this implementation, we use the IP protocol for the security information transfer. A new option based on the hooks in the IP protocol stack is added after the IP header. On the receiving side, this information (Security Node ID and Security SID) is extracted (based on the hooks in the IP stack) and used to build the network security ID (NSID). The build equation of this is NSID = Function (SnID, SID).
This function can be specified by the security server in the form of a conversion table (for the current implementation a simple mathematical function is used). The receiving side extracts the Security Network ID by looking in the table and specifying SnID and SID. Now the security network ID can be used as a local label to all of the access controls.
You need to follow several steps to compile, load and experiment with DSM. For illustration purposes, we assume that your machine runs Red Hat 7.2 with Linux kernel 2.4.17 (from kernel.org).
Here are the main steps involved (they are explained in detail in the following sections):
Apply the LSM patch for kernel 2.4.17.
Modify the kernel options and rebuild the kernel with the new options.
Update the boot options in /etc/lilo.conf.
Reboot the machine with the new kernel.
Compile and load the security module.
Perform some testing to validate that the module works correctly.
The distributed security module is based on the LSM infrastructure, which is a set of hooks added to the kernel that installs the security patch from the LSM web site. The site contains many different patches; you need to match the patch with the appropriate kernel version, in our case kernel 2.4.17.
Follow these steps to patch the kernel with LSM. First, download lsm-full-2002_01_15-2.4.17.patch.gz into /usr/src. Then, unzip the patch:
% gunzip lsm-full-2002_01_15-2.4.17.patch.gz
Next go to the Linux source tree and apply the patch:
% cd /usr/src/linux % patch -p1 < /usr/src/lsm-full-2002_01_15-2.4.17.patchAfter applying the appropriate patch, reconfigure the kernel options to support the distributed security module. Modify the following options:
CODE MATURITY LEVEL: the prompt for development and/or incomplete code/drivers must be set to “y”.
LOADABLE MODULE SUPPORT: version information on all module symbols must be set to “n” to avoid problems with versioning.
NETWORKING OPTIONS: network packet filtering must be set to “y” to enable the net-filtering hooks for IP packet modification. Kernel httpd acceleration (EXPERIMENTAL) must be set to “m”. This option will include tcp_sync_mss in the kernel.
SECURITY OPTIONS: capabilities support should be set to “m”, IP networking support set to “n”, NSA SELinux Support set to “m”, NSA SELinux Development Module set to “y”, NSA SELinux MLS policy (EXPERIMENTAL) set to “n”, LSM port of Openwall (EXPERIMENTAL) set to “n” and Domain and Type Enforcement (EXPERIMENTAL) set to “n”.
Once you enable support for these options, simply build the kernel and modules, install them and run LILO normally.
Because all of the components of DSI are not currently implemented, we have created some test programs that emulate these parts, such as loading the security policy and alarm receivers. Before the module can be used it must be loaded into the kernel by root:
% /sbin/insmod lsm.o
Next, the policy must be supplied to the security module. For this exercise, the security file is a normal ASCII file with four fields: source security ID; target security ID; class (for now, only three classes are implemented: fork, socket and network); and permission.
An extract of the policy file goes like this:
1 1 1 0x01 1 1 2 0x07 1 1 3 0x01
The policy can be loaded with our test program, called UpdatePolicy:
% UpdatePolicy policy_fileThe alarms can be received by our test program, CheckAlarm. The program is started using % CheckAlarm. The default label for a process can be overwritten by changing the first byte of the padding field in the ELF format of the process image (which is the eighth byte in the file).
We performed three types of testing to develop a preliminary performance evaluation of the security module. The tests included process creation with fork, UDP local access and UDP remote access. The UDP tests were performed with and without IP packet modification, so as to see how much performance was lost during IP packet modification.
These tests were executed on a Pentium III 650MHz machine with 256MB of RAM. Here are our testing methods:
Process Creation: measures the time required for a process to fork a child that immediately exits. The parent process loops 100,000 times, performing fork and wait calls.
UDP Local Access: measures the time needed by a process to send a UDP message. It sends 500,000 UDP messages in a loop. The sending process does not check whether the message was sent outside the node, and it does not wait for the confirmation. In this case, it is not important whether the server has DSM installed or not.
UDP Remote Access Testing: measures the time needed by a process to send a UDP message and receive a UDP response from a server. This test sends and receives 100,000 UDP messages in a loop. The client process will send a new message after receiving the confirmation from the server. In this case, it is important that the server runs the DSM software so that permissions are checked on the receiving side. For our test, the second server is a Pentium II 300MHz desktop with 128MB RAM.
Based on the testing performed, we present the results in Table 1 and Table 2. All numbers are in microseconds.
Process Creation Results: we have a 1.2% increase in overhead, because the system has to perform a permission check on the fork operation and spend extra time labeling the child process.
UDP Local Access Results: the average overhead for the setting with the DSM module against the setting without the DSM module is 20%. This overhead consists of performing a permission check on the socket, sending a message and sk_buff label attachment for each message, plus labeling the IP messages. When the IP packet modification is disabled (Table 2), the overhead drops to 4.2%.
UDP Remote Access Results: the average overhead for the setting with the DSM module against the setting without the DSM module is 30%. The overhead consists of the following: performing a permission check on the send socket side, attaching a label to sk_buff, attaching the security information to the IP message, retrieving the security information on the receive side, attaching the network security ID to sk_buff, performing the permission check on sk_buff, performing the security checking on the socket and repeating all the above operations on the return message. When the IP packet modification is disabled (Table 2), the overhead drops to 5.4%.
In both UDP testing cases, most of the overhead occurred is related to IP packet modification. Only a small fraction of the overhead is caused by the security module.
Our ongoing work and future plans for DSM include:
Fully implementing the framework of the distributed access control.
Optimizing the code for better performance.
Providing additional functionality for the server resource to access on behalf of a client.
Providing security information protection.
Providing security information transfers at lower levels of the protocol stack.
Testing cluster security against different types of attacks.
As previously mentioned, DSM is one component of the DSI architecture. So far, we have a basic implementation of DSM. The performance results we presented must be regarded as upper-bound, because no single security operation has been optimized.
We have done some work optimizing the IP packet modification. The primary results have shown significant improvements. The UDP local access with IP packet modification (Table 1) has dropped from 20% overhead to 8%. Similarly, the UDP remote access (Table 1) has dropped from 30% overhead to 14%. These results are promising, and we see many more opportunities to optimize and reach even lower overhead. Nevertheless, the results demonstrate the challenges facing the development of efficient distributed security.
As far as testing in real environments, we tested the framework with buffer-overflow attacks. It proved that our current solution could guard against these types of attacks.
As a final note, we did our best to describe DSM within the limited space we have. However, if you would like more details, please feel free to contact us. We hope you try out the source code for DSM and the supporting test programs and send us your feedback. All source code is available for download at ftp.ssc.com/pub/lj/listings/issue102/6215.tgz.
David Gordon and Philippe Veillette, Computer Science interns from Sherbrooke University, for contributing to the DSI Project in the DSM area.