Read Source Code the HTML Way

Kamran Soomro

Issue #158, June 2007

Cross-reference and convert source code to HTML for easy viewing.

Every decent programmer has to study source code at some time or other. Sometimes it's to learn new coding styles. Sometimes it's to get an idea of how something works. Regardless of the reason, no programmer can do without it. Studying the source code of small projects is not difficult. You easily can do without formal methods. However, when you want to study the source code of large projects, keeping track of the various functions, variables and their definitions becomes a huge problem. People are lucky if they can find the source code on-line—doing so means they can use tools to process the source code and help them study it. Such tools give people studying source code advantages and flexibility never before dreamed of. One such tool is the LXR (Linux Cross-Reference) tool.

Developed originally as a testbed application for a general hypertext cross-referencing tool in Norway, its flagship achievement is cross-referencing the Linux kernel source code. The code is available at lxr.linux.no/source for browsing. Other projects are at lxr.mozilla.org/seamonkey, where the Mozilla source code is available for browsing, and the FreeBSD source code is available at fxr.watson.org. LXR gives users the capability to jump to function definitions, search for usages and so forth with only a single click. It also supports indexing of e-mail and hypertext links.

The project is based on stock Web technology, so it can be accessed via any Web browser. On the server side, it was developed using Apache but should work with any Web browser supporting CGI-scripting capabilities. The scripts that actually do all the work were developed using Perl and rely heavily on Perl's powerful regular expression libraries.

Probably the best feature of this software is that it is presented to users in HTML format. Because of the HTML format, it is easy to link various portions of the code to others. It is written in Perl, so theoretically, it can run on any operating system that has a Perl interpreter. What is really great about this tool is that it supports multiple languages. This means it doesn't matter which language your program is written in; you still can use this tool to cross-reference and browse your code.

LXR is actually quite simple and clean. The users use a utility called genxref to generate an index of the complete source code. Once this is done, users access a Perl script called source through a Web browser that reads the index files and generates the HTML for the cross-referenced source code dynamically. Users then can browse the source code as they want.

Installation

Installing and configuring LXR is pretty simple once you know a bit about how it works and what the various configuration options are. First, download the source tarball from sourceforge.net/projects/lxr. At the time of this writing, lxr-0.3 is the stable release. Once you have downloaded the tarball, extract it using:

bash# tar -xvf lxr-0.3.tar.gz

After extracting the source, cd to the newly created directory, and open the Makefile for editing with the text editor of your choice. You need to set two variables here: INSTALLPREFIX and PERLBIN. PERLBIN refers to the executable binary of the perl5 interpreter. In my case, it was in /usr/bin/perl. INSTALLPREFIX is the directory where LXR will be installed. It should be in a location that is accessible via a Web browser. On my system, that's Apache 1.3.33, and I chose to install it under /var/www/htdcos/. Thus, my Makefile looked something like this:

# Makefile for installation and configuration of LXR

# The location of your perl5 binary

PERLBIN=/usr/bin/perl

# LXR will be installed here

INSTALLPREFIX=/var/www/htdocs/lxr

# End of configuration parameters

CGISCRIPTS=find ident search diff source

PERLMODULES=SimpleParse.pm Common.pm Config.pm

....

....

....

Leave the rest of the Makefile unchanged. At the console, type:

bash# make install

and LXR is installed in the specified directory.

Configuration

Now, cd to the directory where LXR was installed (in my case that was /var/www/htdocs/lxr). Three subdirectories should be there: bin, http and source. Although the source code you want to cross-reference can be placed anywhere, I prefer to put it under the $INSTALLPREFIX/source subdirectory. I put the glibc-2.3.5 and OpenMOSIX-2.4.26 source code here. Now, we have to generate the index files that LXR will use to generate the cross-referenced source code. So, cd to the directory with the source code, and execute the genxref script in $INSTALLPREFIX/bin:

bash# /var/www/htdocs/lxr/bin/genxref .

The . at the end tells the script that the source code is contained within the current directory. Next, sit back and enjoy the ride until the parsing is complete. Once it's done, you should have two new files in the current directory—the directory containing your source code—fileidx and xref. These two files are the ones lxr needs to generate the cross-referenced source code when you browse it. Make sure that others have read permission for these files. To do so, type the following while still in the source directory:

bash# ls -l fileidx xref

The output should be something like:

-r--r--r-- 1 nobody root 671744 2006-08-24 05:06 fileidx*

-r--r--r-- 1 nobody root 8425472 2006-08-24 05:06 xref*

The third r should be set. If it isn't, you can set it by doing the following:

bash# chmod o+x fileidx xref

Now, it's time to configure LXR for use. Change directory to $INSTALLPREFIX/http/ (in my case, that is /var/www/htdocs/lxr/http/), and open the lxr.conf file for editing. The lxr.conf file is the most important file you need. It has several different configuration options.

Variables

The first thing you'll see when you open the file is a definition for a variable called v. As with programming languages, you can define your own variables and use them later in the configuration file. Wherever they occur, they will be replaced by whatever value they have. Variable values are referenced by the configuration file by $/variable-name. Variable definitions follow one of two possible formats:

variable: /variable-identifier, variable-name, 
/(/list-of-values/), /default-value/

or:

variable: /variable-identifier, variable-name,
/[/file-containing-list-of-values/], /default-value/

Here's what the terms stand for:

variable-identifier: the name the variable will be known as throughout the configuration file.
variable-name: the actual name of the variable that will be displayed to the user.
list-of-values: comma-separated list of values to be displayed.
file-containing: a file that contains a list of possible values.
list-of-values: the list has each entry on a separate line. The user can select any one of them. The absolute path of the file should be provided.
default-value: the value that the variable will take on by default. The first value is automatically set if this is not specified.

baseurl

The baseurl is the URL relative to which all of the scripts required by LXR are placed. It should be accessible via a browser. In my configuration, it's http://my-ip/lxr/http/ <http://localhost/lxr/http/>. Make sure to place the / at the end, or the last directory will be ignored.

HTML Headers and Footers

When the HTML for the source is generated, LXR can add headers and footers to the pages. Sample headers and footers are provided in the $INSTALLPREFIX/http/ directory. They're called template-head and template-tail. In addition, you also can change the way files and directories are displayed by LXR by modifying the template-dir file. The locations of these files can be specified by the htmlhead, htmltail and htmldir options in the lxr.conf file.

sourceroot

This option tells LXR where to look for the actual source code. In my case, it's /var/www/htdocs/lxr/source/glibc-2.3.5. If you want to cross-reference multiple projects, all you have to do is create a variable specifying the location of each of the directories that contain the source code. Then, you can specify the value of the variable as the sourceroot. For example, I set up the sources for glibc-2.3.5 and OpenMOSIX-2.4.26, placing the sources for both of them in /var/www/htdocs/lxr/source in their individual directories with the same names as above. In lxr.conf, I had a line like:

variable: s, Source, (glibc-2.3.5, OpenMOSIX-2.4.26)

then:

sourceroot: /var/www/htdocs/lxr/source/$s

Thus, the appropriate source code is automatically selected based on the value of the source variable.

srcrootname

srcrootname specifies the name of the project whose source code is displayed—for example:

srcrootname: $s

incprefix

This specifies the locations of the header files that are to be included in the project.

dbdir

This is the location of the fileidx and xref files generated by genxref. If you have multiple projects, specify a separate location for each, as follows:

dbdir: /var/www/htdocs/lxr/source/$s/

These are the only options you need to set when configuring LXR. Additionally, you can specify the location of the glimpse binary using glimpsebin.

glimpsebin

glimpse allows users to search for specific files within the source code and to search for any text within source files. You can obtain the latest version of glimpse from webglimpse.net/trial/glimpse-latest.tar.gz. Extract and install it. Once you are done installing glimpse, go to the directory where the source code is installed, such as /var/www/htdocs/lxr/source/glibc-2.3.5, and do the following:

bash# glimpseindex -H . .

The output should look something like this:

This is glimpseindex version 4.18.2, 2006.

Indexing "/var/www/htdocs/lxr/source/glibc-2.3.5" ...

Size of files being indexed = 81711416 B, Total #of files = 10075

Index-directory: "/var/www/htdocs/lxr/source/glibc-2.3.5"

Glimpse-files created here:

-rw-r--r-- 1 root root 676398 2006-09-08 05:51 .glimpse_filenames

-rw-r--r-- 1 root root 40300 2006-09-08 05:51 .glimpse_filenames_index

-rw-r--r-- 1 root root 0 2006-09-08 05:50 .glimpse_filetimes

-rw------- 1 root root 1783314 2006-09-08 05:51 .glimpse_index

-rw-r--r-- 1 root root 686 2006-09-08 05:51 .glimpse_messages

-rw------- 1 root root 836 2006-09-08 05:51 .glimpse_partitions

-rw-r--r-- 1 root root 23888 2006-09-08 05:51 .glimpse_statistics

This creates the required glimpse index files in the current directory. Once they're created, make sure read permission is set for others:

bash# chmod o+r .glimpse-*

Now, set the glimpsebin option in lxr.conf to wherever you installed glimpse. I installed it in /usr/local/bin/glimpse.

That's it; save and close the lxr.conf file. The only thing remaining to do now is configure the Web server to work with LXR.

Configuring Apache

The Web server I used was Apache 1.3.33. The first order of business is to set the permissions for the LXR directory. You can do that by editing the httpd.conf file, normally found under /etc/apache/httpd.conf. Add the following lines:


<Directory /var/www/htdocs/lxr>

AllowOverride All

Options All

</Directory>

Simply replace the location with wherever you've installed LXR. Then, go to $INSTALLPREFIX/http/ and create and open a file named .htaccess for editing. Type the following lines:


<File ~ (search|find|source|diff|ident)?>

SetHandler cgi-script

</Files>

This tells the Web server to treat the above-mentioned files as CGI scripts. If you don't do this, the server will display only the contents of these files. Close and save .htaccess. We are now ready to browse the cross-referenced source code.

How to Use LXR

After all of the above steps are done, all you have to do is open a Web browser and go to the URL, http:///my-ip//lxr/http/source, where /my-ip/ is the IP of your Web server. When you open the Web page, you will get something like what is shown in Figure 1.

Figure 1. Source Code Navigation via the Browser

As you can see, users can select any of the source code files for browsing. At the top, I'm using the template provided by LXR. It includes links to navigate the source code, search for a particular identifier, search for any text within the source code and search for any file.

A few utilities of note:

Source navigation: select this option to browse the source code of your choice.
Identifier search: search for the definition and uses of a particular identifier. Requires the ident script.
Freetext search: search for any text within the source files. Requires the search script.
File search: search for files matching the passed string. Requires the find script.

All of these utilities also can search using regular expressions to match the strings.

Conclusion

LXR is an excellent tool. It makes life a lot easier for people who want to study source code, and it's powerful and easy to use. The fact that it uses dynamically generated Web pages to browse the source code gives users a lot of flexibility in configuring it. Also, because it can be accessed via any Web browser, it imposes no limitations on the platform, client or location of users. This interoperability is one of the reasons LXR is so powerful.