LJ Archive

USENIX.org: a Case Study of a Migration to Drupal

Jody Hamilton

Issue #888, April 2068

The new USENIX.org is built on Drupal with an extensive Salesforce.com integration, Apache Solr-based faceted searching, e-commerce from the Drupal Commerce suite and microsites built with Organic Groups. Dozens of custom Drupal modules make this powerful system simple for staff and visitors.

USENIX (usenix.org), the Advanced Computing Systems Association, was founded in 1975 to foster and promote technical excellence and innovation among system administrators and other .nix professionals. This member-focused organization runs dozens of conferences each year, publishing content on-line for each event. Prior to the migration to Drupal, most conferences and Web pages were created using either straight HTML or PERL, and after 17 years of working with its Web properties in this way, USENIX had tens of thousands of custom HTML pages with a slew of Perl scripts and no database. The lack of a database-driven system meant it was never possible for the USENIX site to have much structural organization or coherence—it had simply grown organically for decades—and it created a management nightmare for those tasked with maintaining this growing library of content and functionality.

When we began the project, USENIX knew it needed a robust Content Management System (CMS) and had chosen Drupal. The new USENIX Web site requirements included:

  • Replacement of the old HTML and PERL script functionality.

  • Strong security.

  • Robust user and role management.

  • Seamless integration with its newly adopted Customer Relationship Management system and its Event Management system (Salesforce.com and CVent, respectively).

  • E-commerce functionality.

  • Video integration.

  • Private and public files.

  • Faceted search listings.

  • Migration of the data stored in the HTML pages to the new system.

Security, e-commerce, user/role management and faceted search were among the features best matching Drupal's strengths. USENIX also is interested in supporting open-source software communities, and wanted to develop its site in a way that gives back as much as possible. My team at Zivtech was chosen for the project due to our experience with Salesforce.com Drupal integrations, Ubercart customizations and community contributions.

Figure 1. The New USENIX.org

Building in Drupal 6, Migrating to Drupal 7

When we started working with USENIX, Drupal 7 did not have a full release, so the site was built in Drupal 6. We used the Ubercart module suite for e-commerce features and Houston (houstoncommand.com) to handle the integration between Drupal and the Salesforce.com CRM, which also was under active construction.

Most of the USENIX Web site already was complete in Drupal 6 when a decision was made to upgrade to Drupal 7. This was very early in the Drupal 7 release cycle, so our team had to port 32 custom modules, 12 custom features, numerous contributed modules and our custom themes. Beyond these upgrades, being an early adopter meant that our developers had to commit dozens of additional patches to fix bugs and issues with upgrade paths from Drupal 6 to 7. During this process, we repeatedly fine-tuned our four-hour upgrade bash script, which consisted mainly of drush (the excellent Drupal shell interface) commands. In addition, we rebuilt the Web site's e-commerce functionality using Drupal Commerce, ported Houston to Drupal 7, and also re-created numerous views, organic groups configurations, profiles and more.

Upgrading a complex site to a new major version of Drupal is difficult because Drupal core (and to varying extents contributed modules) provides a migration path for data but no support for legacy code. Often major contributed building blocks of Drupal use a new core version as an opportunity for a rewrite, often without an upgrade path. Upgrades are more difficult if they occur sooner after the core release; if the site is live (as it typically is); and if the site database is updated frequently by visitors (purchases, account creation, comments and so on), and the site cannot be taken off-line for upgrade. With these trade-offs in mind, upgrading before site launch was preferable to postponing.

E-commerce

USENIX sells memberships, file access, physical magazines, books and offprints. To cater to these diverse e-commerce needs, our team used the Drupal Commerce framework along with Commerce Coupon, Commerce Custom Line Items, Commerce Devel, Commerce Features, Commerce File, Commerce Flat Rate, Commerce Payflo Pro and Commerce Shipping to enable the sale, pricing and access to products.

Certain files on the Web site are free to select users (depending on membership status or conference registration) but unavailable or available for purchase by others. Files also can be set to become public at a specified date, so that they are no longer conceptually a product for sale on the site. Multiple product formats for one piece of content are routinely sold (such as print and electronic book formats). These needs did not map well administratively to the Drupal Commerce model of separating the product and its display (the content, aka Drupal node) and would have made for a complex and tedious workflow for content creation. Fortunately, the Commerce API facilitated a solution. Our team was able to hide the creation of products from Web site administrators by auto-creating and updating associated product files, SKUs, attributes and prices when content is created and updated. We also used the Commerce API to implement sophisticated membership levels and benefits, including free products per year of membership and discount scenarios. The Commerce API was a pleasure to use because it uses the same entity-based architecture as core Drupal 7.

Houston

USENIX required a complex bi-directional integration between Salesforce.com and the new Web site. We used our own Houston Command Center (houstoncommand.com) over the more widely used Salesforce Drupal module to allow flexibility with the data model as well as to provide queue management and failover support. Houston is OSS we've evolved during several large integration projects. It addresses challenges in integrating with SaaS products, including handling API limits and downtime. It provides a framework for mapping the data model between systems and handles queuing and backups.

Using Houston, we linked authentication, user profiles, membership status, conference registration, grant applications, pricebook entries and e-commerce purchases between the two platforms in a seamless manner. Existing USENIX.org members are able to log in to the new site with the same credentials they used pre-Drupal (which were mainly used in .htaccess-style authentications) by comparing their hashed passwords to those migrated into the CRM. Once authenticated, users can manage their membership and profile information on the new Drupal site, and USENIX staff can view and edit that data on Salesforce.com or on USENIX.org. When users register for a conference on the site, their profile information is used to pre-populate a registration form on CVent (an SaaS event management system). After payment, the CVent registration data is sent to the Salesforce.com CRM, where it is then available to Houston and the Drupal site to grant access to attendee-restricted pages and files. Staff even can manage the prices of different categories of products sold on the Drupal site (for example, the current price of an audiobook for a USENIX member) from the Salesforce.com CRM. By centralizing user history, such as conference attendance, membership status, product purchases and profile information in the CRM, the organization can target its outreach and marketing efforts effectively.

Search

We love using Apache Solr integrated with Drupal to build not only site-wide searches but also searchable, filterable listing pages. We build listings with Solr rather than traditional (usually Views module-based) database queries to get fast results, full content search and facets. We created three search pages, one each for proceedings, multimedia and site-wide search. Each uses custom Solr index fields, custom Facet API module sorts and custom search-result displays. We also created (and contributed) Facet API drop-down menus for the proceedings and multimedia search pages. Because the search work was built first for Drupal 6, we used the Apache Solr Search Integration module rather than the next-generation Search API Drupal suite. Search API allows site builders to use the Views module to build custom search listings quickly through a UI.

Figure 2. The Multimedia page is a searchable Solr-based listing of all content containing video or audio recordings, allowing faceting into specific conferences.

Microsites

Organic Groups, the Drupal module that powers groups.drupal.org to allow groups of people to organize themselves “organically”, can be used effectively in many use cases that involve “groups” of site users and content. For USENIX.org, the “groups” are the conferences, and both users (attendees) and content (sessions, trainings, information pages) can be a part of a conference. To create a microsite for a conference in the past, USENIX staff copied a directory of HTML, CSS and scripts and modified its content and visual style for each conference. To allow a similar feel of distinct microsites, we used the Organic Groups Theme module to allow a unique Drupal theme to be assigned to any conference. The look of conference microsites is customized through the UI with banner logos, and the visual themes are created with CSS. These custom themes are subthemes of a conference base theme. We included a conference subtheme starter kit with basic CSS for common elements altered between conferences. The Organic Groups Menu module provides each conference with its own navigation. Each microsite consists of page, session, paper, training, sponsor, speaker and organizer content types.

Figure 3. A conference microsite has a custom theme, banner, navigation and content, such as sessions and sponsors.

The conference microsites display conference-specific Views module listings (such as lists of organizers, speakers and attendees). We used the Viewfield module so that conference administrators simply could add a page to a conference and include one of these predefined Views listings. Because the page itself is a part of the conference, the Views listing filters its results for the context of the conference. This approach also allows the listing pages to include custom text before and after the listings, to create an appropriate URL and to pick up the conference theme automatically. It also keeps content creators in the familiar content creation interface rather than requiring them to learn another Drupal system.

Some of the conference pages required even more customization, pushing past the limits of what can be automated and programatically displayed. The conference schedule pages, for example, vary in format between conferences. The staff wanted full control over the markup of these individual listings while still benefiting from dynamic content management of each session displayed on the schedule. For these cases, we implemented a token-based HTML system. With this system, an editor creates a content node of type schedule and is presented with a content creation form including a textarea containing a sample HTML template for the schedule. A list of all sessions within the conference and their corresponding token codes also is displayed. Editors can adjust the HTML as they please and paste in these token codes for each session on the schedule. When displayed, the token codes are replaced dynamically with the appropriate session displays.

Multimedia and CDN

For years, USENIX created videos of conference presentations in a variety of formats. In the process of moving these videos to the new site, we scripted their conversion to the Ogg and H.264 encoding. Similar work was performed with existing audio files. All media files now are stored on Rackspace Cloud Files CDN and referenced by Drupal via link fields. When links are pasted into these link fields, they are displayed automatically as HTML5 video available in WebM, H.264 or Ogg, with Flash as a fallback option.

Figure 4. A treasure-trove of conference recordings was made available for all browsers and devices.

The Rackspace Cloud Files CDN is also used to host public PDF files on the USENIX Web site. We built a custom integration using the PHP Cloud Files API to queue the bidirectional transfer of these files between the Web site and the CDN as dictated by access permissions. As a result, content editors can manage files on the site without concern for the CDN and the appropriate links are shown to site visitors.

Data Migration

The legacy USENIX content was a migration challenge because it was all hand-written HTML. Typically, content migration can make use of an existing database or at least consistency created by a templating system. This content was not only hand-written but it also had been added on-line since 1993, so that there were major changes to the markup over time. We began the migration process by identifying content that was critical to the new USENIX Web site as opposed to that which could be maintained in its legacy form. Of the critical content, we distinguished between content that had enough structure to be scraped programmatically, and that which required manual migration. There was also a WordPress blog that was migrated into Drupal, but WordPress-to-Drupal migration is a trivial process, thanks to WordPress Import and WordPress Migrate Drupal modules.

Legacy Content

Some content was not critical for full integration into the new site. For example, driving directions to a conference in 1995, while fine to archive, was not worth migrating. We used the old site's content directories to create a list of all the URLs on the old site and scraped each page's full HTML into a Drupal node. This legacy content, which included Listserv discussions, was indexed by the Apache Solr search engine so that it can be found in site-wide searches. Viewing these Drupal nodes of legacy content results in a redirect to the page on the real legacy site, now hosted at static.usenix.org.

Scraped Content

We used the Simple HTML DOM parser to traverse sections of the old USENIX Web site and programmatically create new Drupal nodes with dates, files, images, text and node reference fields. These imports had to be repeated as new content was added to the legacy site, so they were written as Drupal modules containing custom Drush commands. These commands took a URL and scraped its HTML, parsed it into arrays of data, built Drupal objects representing content, files, products or categories and saved these programatically. This approach was used to import some conference content, magazine content and Short Topics in System Administration books content.

Manually Migrated Content

Conference proceedings are arguably the most important content on the site. The proceedings needed to be highly structured to allow connecting content from the same presenters, connecting content from the same conferences, organizing content by awards won and listing content by date. Media on these proceedings including PDFs, slides and videos also needed to be structured. The original content was extensive and lacked a consistent structure that would be needed for scraping.

To facilitate the necessary manual migration of the proceedings of hundreds of conferences, we built a custom migration interface into the development site using the Views Bulk Operations and Workflow modules. We hired contractors to use the interface, which consisted of a Views listing of each legacy conference and its status (incomplete, complete, duplicate or not applicable). Each legacy conference was a link to a data entry form that consisted of an iframe showing the legacy proceedings of the conference and a form for entering in fields, such as title, date, authors, slides and video. We designed the structure of the imported content to make import as quick as possible for the manual process. After entry of thousands of conference proceedings, we converted the resulting nodes into their final data structure: nodes of type session, paper, discussion and speaker.

Design and Theme

The USENIX design was started by Nica Lorber of Chapter Three and completed by our Creative Director, Mason Wendell, in collaboration with USENIX staff. Themes were built with Sass/Compass using the Coding Designer Survival Kit (thecodingdesigner.com/survivalkit).

Credits

  • Drupal development and design by Zivtech: Jody Hamilton (project lead), Howard Tyson, Matt Klein, Steve Heise, Mason Wendell, Sean Wolfe, Benji Davis, Stephen Haslett, Laurence Liss, Justin Randell, Andrew Morton and Meghan Palagyi.

  • Design by Chapter Three: Nica Lorber.

  • Hosting by Pantheon: Josh Koenig and David Strauss.

  • USENIX project leads: Casey Henderson, Jane-Ellen Long and Anne Dickison.

  • Video encoding: Joseph Schwartz via USENIX.

  • Salesforce development by Heller Consulting: Bran Scott and Kim Kupferman.

Jody Hamilton is CTO and partner at Zivtech (zivtech.com), a Philadelphia-based open-source consultancy. She has been a Drupal developer since 2006 and contributes to core and contributed modules on drupal.org.

LJ Archive