Drupal 6

Working with Drupal's Migrate module

UPDATE: if you're interested in this article you should check out the slides from my 4th migrate presentation. The registration system in migrate has changed ("automatic" registration is gone) and I've explained it all in better detail in the slides. I even cover the pathway of the variables if you're new to OOP. ...and yes, I'm available for support and/or workshops.

I have had the opportunity to work on two Drupal sites that involved the Migrate module. I have in the past worked fairly extensively with the Feeds module to do this type of thing. Feeds is a great and powerful module, but at a certain point you will probably find that you want to use Migrate.

Why use Migrate rather than Feeds?

  • Rollbacks. The migrate module makes it damn easy to rollback and re-import.
  • Rules in code. Any changes you do to your input file are stored in a coded template.

The downside as you have probably already guessed is that you're going to spend a lot of time writing custom code here. On the bright side, there isn't that much complicated stuff once you know what everything is trying to do.

Example Migrate Templates

These are the ways that do not involve using much code. The reality is that you *will* be doing code with almost any Migrate project. Before you get started the best things you can do are (1) read the manual on Drupal.org, and (2) find a reference template that is close to what you want to do.

There is one additional thing... the beer and the wine templates. These are part of the migrate_example package that is part of the main migrate module. Review these, the code is commented specifically for this purpose.

Also check out the following as possible starting points:

Additionally, there is a Wordpress conversion template, and ones for Typo3 and PhpBB. The Wordpress one has a UI and works quite flawlessly (though you will need to remap your image locations somehow).

Migrate and the Future of Drupal

Some people have suggested using Migrate as the main way to upgrade between versions of Drupal. My guess is that my team is going to use it a lot for D7->D8 migrations when the time comes, and I suspect that it will be the default way for D9.

Setting Up Your Dev Environment and Migrate Basics

One thing about Migrate that you should know is that some of the queries that will run will be enormously expensive processor-wise. Especially when things aren't working correctly. I find it is best to work from a local environment rather than a dev server until you have a working template. Additionally, it helps to have a process monitor showing the activity on the machine so you can see when things go wrong.

Another tip that you will need to know is the command drush_print_r($variable); in your PHP code that will help you to analyze the objects referenced in your code and to see what structure Drupal is expecting when you put stuff into $row->something; (prepareRow) or $entity->field_name[][0]['value']; (prepare).

Familiarize yourself with the following commands, as it is recommend to use drush to do your migration:

drush ms
drush mi $migration
drush mr $migration

You will probably need some debugging tools as well:

drush_print_r($var); - this command does what you would expect - dumps a variable to the command line. You put this in your PHP code.

drush_print_r($query->__toString()); - this will dump the MySQL query that is generated by Drupal's dynamic query engine. If you have one template that won't run, this can come in handy. You can adapt the output of this to run in MySQL directly - useful if you need to confirm that the query is causing memory problems. You can prepend the keyword EXPLAIN to the query generated for additional info from MySQL.

Anatomy of a Migrate template

The migrate module needs some basic definitions so that it will show up on the Migrate page, which is a tab on the Content list page. Most of the time "registration" of the migration classes is "automatic" but I have found that it is useful to use the tab on that page to force it to look for migrations.

In the case of Commerce Ubercart Migration module, there is also a settings tab which will appear for that. You will want to configure it to connect to the existing data set and it will create a bunch of "migrations" to show on the main Migrate page.

When your migrations are all listed, you can click one and see what fields are mapped and which are not. This is where you're going to spend a lot of your migrate time.

Typically you will have a custom module with the following:

  • custom.info file
  • custom.module file, blank... unless you want to use it for something*
    • *an optional API definition if not using automatic-registration
  • something.inc, your first migration, listed in your files in the info file
  • another.inc, your second migration, also listed in files of info file

Within your .inc files you will need to know about the following components which are almost always going to be in your migration:

  • class CommerceMigrateUbercartNodeMigration extends DynamicMigration class definition setting what migration will be used as a base. Then the following sub-components:
    • public function __construct(array $arguments); // initialize the migration, connect to source and grab the settings! Contains a reference to what will be used to map the migration (keeps track of changes to IDs)
    • public function prepareRow($row); // preprocess function, DOES NOT RUN FOR XML IMPORTS
    • function prepare($entity, stdClass $row); // last chance to do stuff. Can run SQL queries here, first param is always the target node/order/entity fully expanded, so put in your languages setting and delta values!
    • function complete($entity, stdClass $row); // rarely needed if you did things right. Can be used for doing post-migration tasks but could complicate your rollbacks, so be sure to test!

Initializing the Migration

With Migration templates you're dealing with Object Oriented Programming, not a staple in our Drupal PHP diet yet. If you're familiar with OOP you'll be right at home. Let's start out with a definition:

class CommerceMigrateUbercartNodeMigration extends DynamicMigration {
  // An array mapping D6 format names to this D7 databases formats.
  public $filter_format_mapping = array();

  public function __construct(array $arguments) {
    $this->arguments = $arguments;
    parent::__construct();
    $this->description = t('Import product nodes from Ubercart.');
    $dependency_name = 'CommerceMigrateUbercartProduct' . ucfirst($this->arguments['type']);
    $this->dependencies = array($dependency_name);

    // Create a map object for tracking the relationships between source rows
    $this->map = new MigrateSQLMap($this->machineName,
      array(
        'nid' => array(
          'type' => 'int',
          'unsigned' => TRUE,
          'not null' => TRUE,
          'description' => 'Ubercart node ID',
        ),
      ),
      MigrateDestinationNode::getKeySchema()
    );

There is a lot of stuff going on here. Firstly, we're extending a base class. That means this will apply to any objects of that class. So we can just assume that if the base class Migrate is running, that this class will run as well. What follows is a constructor, which what is going to run at startup.

The last part of this code block is the most interesting: $this->map. Here is how Drupal is going to track the objects in your migration. It is your key for each item in your import - and it is going to BREAK if you are trying to use XML as your import source. Then you're going to have to dig around in your DB and change that type from int to something else.

Connecting to Source(s)

The most exciting part about this part of the migration template is that you have many options. In this case we're looking at a D6->D7 migration, so in our case D6 is our source! Migrate takes care of the actual connection to the database so that part is easy, you can point it right at production to grab the data. As noted above, XML can also be your source(s). When I say source(s) I mean possibly plural... the example code provided will let you cycle through many XML files during the import process. Potentially really handy if your exports were run in batches.

Note, that the switch statement in the start of this module is related to the UI of Migrate, where the author of this module has made a switch available. Most migrations don't provide a settings form for Migrate but this one does.

    // Create a MigrateSource object, which manages retrieving the input data.
    $connection = commerce_migrate_ubercart_get_source_connection();

    $query = $connection->select('node', 'n');

    switch (SOURCE_DATABASE_DRUPAL_VERSION) {
      case 'd7':
        $query->leftJoin('field_data_body', 'fdb', 'n.nid = fdb.entity_id AND n.vid = revision_id');
        $query->leftJoin('field_data_field_campaign_id', 'cid', 'n.nid = cid.entity_id');
        $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
              ->fields('fdb', array('body_value', 'body_summary', 'body_format', 'language'))
              ->fields('cid', array('field_campaign_id_value'))
              ->condition('n.type', $arguments['type'])
              ->distinct();
        break;
      case 'd6':
        $query->leftJoin('node_revisions', 'nr', 'n.nid = nr.nid AND n.vid = nr.vid');
        $query->leftJoin('filter_formats', 'ff', 'nr.format = ff.format');
        // ... and so on ...
    }
    $this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE)); // connect!

You probably noted the D7 and D6 options have different mappings. Take note. If you're going to extend one, do the one you intend to use.

When your main Migrate page stops working...

Eventually if you start adding fields your query will silently break, and will force Migrate's listing page to hang. Usually you can still access the page for the specific migration but since the purpose of these pages is to help guide your clients through the process this is a severe deal-breaker. Fortunately, it is easily solved one of two different ways. Either (1) don't bother counting how many tasks are to be done or (2) specify a basic query that can be tabulated quickly.

Here is how you do that:

$this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE));

In your source you specify skip_count as a parameter and you set it to TRUE. Now you're going to have an X where the number of items to be migrated was... but you can at least get to that page. Whew...

Adding each of your fields

This is where start getting fun... or really boring. Your pick. We now have to go through each field that will be brought into the database and create a relationship with the place where we want it to go.

Each field in your migration is going to have the following components:

  • A field in your target content type to hold the incoming data
  • An extension to the $query object to perform a LEFT JOIN if your source is in a table that is not in the base query
  • An extension to the $query object's fields to SELECT by column name
  • An extension to $this to connect the old field to the new field
  $query->leftJoin('content_field_location', 'loc', 'n.nid = loc.nid');

  $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
             ->fields('loc', array('field_location_value'));

  $this->addFieldMapping('body', 'body_value') // add semicolon if no options
         ->arguments($arguments)  // optional
         ->defaultValue(''); // optional

This wouldn't be Drupal without some "gotchas". As I was adding fields in each of the migrations I have done eventually the main Migrate page stops listing out my migrations. This was REALLY aggravating and I spent at least a full week debugging this issue across the two migrations I have done. The solution? Stop counting how many things you are going to process (see the section above about using skip_count when connecting to the source.

It probably goes without saying... but test EVERY field after you have defined it. This is where you'll start finding weird stuff in your data. Unless, like nobody before you, your previous database is perfection in every way. Highly doubtful. As you discover problems in your code you can deal with them in processRow or process functions.

Processing the Data

Now that we have all of our fields coming into Drupal and our Migrate page has no errors on the main list and additionally no errors on our specific migration it is time to process the data! Most of the time this will not be necessary, but sometimes you must change things as they import so it will fit the new data structure.

  public function prepareRow($row) {
    // Transform body format to something we can use if it's not already.
    if (!filter_format_exists($row->body_format)) {
      $row->body_format = $this->transformFormatToMachineName($row->body_format);
    }
    $row->temp_data = "somejunk"; // will be passed to prepare function then dropped
    $row->field_text = "XXX"; // sets the value of a text field to the string "XXX" (works for many simple fields)
  }

Within your migration class you will probably always have a prepareRow function. This is where you can make some changes, or alternatively, reject a certain row before it gets imported by return FALSE;. This step DOES NOT RUN for any XML import you do, not even to carry over temp data. Gotcha!

  function prepare($entity, stdClass $row) {
    $entity->field_text['fr'][0] = 'YYY'; // manually set field_text French version (if using entity_translation), for delta 0 (the first instance if allowing multiple values) to YYY
  }

The prepare function is your second chance to change things using the entity pre-rendered, fully expanded into Drupal's typical array syntax. If you're doing XML processing, this is where you need to make any transformations. You can also grab any data you setup in prepareRow... or just run SQL queries here. Be forewarned... if you do SQL queries here it is generally better to load all of your data earlier on (in $this->query) so that you're not hammering your database with requests or blowing out the memory on your server. Those things are hard to detect.

  function complete($entity, stdClass $row) {
  }

The complete function runs at the end. In some cases it is necessary to make an update after the data has been imported. For example: if you are importing line items that are part of an order in Commerce, you will have a task here to update the order total after each line item has been imported. Migrate uses $this->map to track the progress and ID changes between all of your migrations. This function takes that into consideration.

So... sweet, we're already done and we don't even need to use the complete function! That was easy right? Sort-of. It is not always easy. It is a long time-consuming process to verify all the data is correct and to get this far. If you're considering taking on a Migrate project be sure to consider all the possibilities of problems that will come up with the old data. Quantify all of your content types and entities that will be migrated and every individual field contained therein. Expect that each field will need a certain level of review and base your estimates/calculations on that.

Using simplehtmldom API with Drupal to radically change node editing UI

In mid 2011 I took on an interesting code challenge and never got around to posting about it. The technique I describe here is available as part of my Drupal 6 module translation wysiwyg if you would like to see a demonstration of the result. This blog post talks about the way we use simplehtmldom API module to traverse the node body content produced by a wysiwyg editor - and pick out all of the translatable elements which we then render as individual fields in the node editor UI.

Still following? Awesome.

What simplehtmldom API does

This module brings the PHP Simple HTML DOM Parser into Drupal for use with your custom modules. It renders all of your HTML that you feed it as a tree of objects that you can perform operations on. If you have used JavaScript and/or JQuery you will probably feel somewhat comfortable working with it. It provides simple dom traversal, and then re-assembly of the HTML all in your PHP code.

How and what we want to parse

In the translation wysiwyg module we want to take the code from the default language version of a node and break it into strings.

  • The goal here is that editors of the default language will get their usual WYSIWYG editor.
  • Editors of translations of the node will get individual fields for each string in the body text.

So our module will have to look at the node before you begin editing to recognize if it is going to be a translation of the original node. If it is a translation we modify the node edit form.

To get the node editor to do what we want we need to do all of the following:

  • Find the default language version of the node and grab it's body text
  • Use the simplehtmldom API to find all of the h1, h2, h3, h4, h5, p and a tags that contain text
  • Check the values contained in each of those tags to see if they exist in the locales database tables
  • Render a tree of Drupal Forms API textareas for each of the text-containing tags listed above
  • Load the translated versions of the items found in the locales table as default values in Forms API
  • Unset the body field so that it does not appear

How we want to put it back together

The obvious problem that we're going to run into with all of these new form fields on our edit form is that we now must re-capture all of the items in the fields and put them in the appropriate places.

Re-enter simplehtmldom API!

Here are all of our steps to re-create the structure of our HTML body content while preserving all of the images, hr tags, object tags... all other tags!

  • Grab all of the submitted fields during the validation of the form
  • Re-load the body text of the default translation
  • Crawl through the tree of the original text, replacing each h1, h2, h3, h4, h5, p and a tags that received a translation
  • Each translated string is stored in the locales table for future editing
  • The new body text is taken from simplehtmldom and converted back to HTML
  • We put this new HTML back into Drupal's node body field and pass the results to the submit function
  • Drupal saves the "translated" version of the node

Note that for any images or custom HTML you put into your original nodes - translators did not have access to change any of that stuff. Only the text.

If you read this carefully you noted that we are now putting a huge sub-set of node body text into Drupal's locales table. This means that your translators could find these strings while searching within the translation interface - however they would not update the node content until the next time someone edits that node and thus loads the new default value for that header, paragraph or anchor tag they modified.

Where this method is really handy is when you have a translator return to a node after the original has been updated. If a new paragraph was added to the node, the only thing to translate is a blank field where the untranslated content occurs. This can be extremely handy.

First Drupal code release - Node Tasklist

Yesterday marks my first public foray into developing contrib modules for the Drupal project. I applied for project maintainer status for my new module, Node Tasklist.

From the description of the project:

Each time you visit the page or block created by this module you are presented with the edit form of the most recent node of that content type. When you save your changes, you are returned to the same page so you can perform another update to the node.

Read the description of Node Tasklist in my Drupal.org sandbox.

I made a demo of the module at the end of Montréal's Blitzweekend, where I put the final touches on the code.  You can see the module in action here: http://weal.ca/tl/blitz

I really love doing work that has a time component.  I have in the past written shell scripts and scheduling systems.  I also have a time tracker I developed in-house with Drupal and much more.  I hope to release all three of those as community projects once the Node Tasklist module is fully polished.

Customizing FCKeditor on Drupal

Editing modern CMS-based sites usually means a bit of HTML editing.  More often than not there is some kind of editor built-in with the application framework you are using to facilitate this.  On Drupal you choose whichever editor you think works best.  For me the choice was simple - FCKeditor.  Why FCK? It supports a lot of browsers and is built in a modular fashion.  So there are many different ways you can use it.

Before we get started there are a few decisions to make about how you want to integrate with Drupal.

Choosing Your Module

You can use the "Wysiwyg API" module if you are interested in using the editor for everyone of a specific input format (think "Filtered HTML", etc).  This new module allows you to download the current FCK codebase and permission against that and only that.  It is limited but growing in popularity due to the number of other editors that are supported by this module.  If you need more than one editor this is your best bet.

What I have found in practice though is that you will probably want the FCKeditor module rather than "Wysiwyg API" if you want to customize with a GUI interface and/or set permissions based on URL and/or another method.  For most new installs I use "WYSIWYG API" for it's simplicity.

If your intent is to use custom buttons for page break and to manually set teaser length I would recommend the FCKeditor module - it has the Drupal custom plugins included.

Getting the Code

Once your module is installed you will want to download the source code for the editor.  This will effectively activate your module and allow you to set permissions so you can use it.

Make Editing Pleasurable

By default there are a few settings that are targeted to yesterday's editing preferences.  I like to change the following to better utilize my browser's built-in functions.

  1. Simplify the toolbar - by default FCK shows absolutely everything under the sun.  That means 4 or more bars worth of buttons.  Simplify that!  This can be done in the fckconfig.js file by modifying the tokens in the toolbar.  You can also add new toolbars, when using Wysiwig API you will need to activate new toolbars by editing fckeditor.js to specify the default.  With the FCKeditor module you can use the Drupal admin page to specify the default.
  2. Activate the Drupal Plugins - the FCKeditor module comes with the Drupal Plugins.  These two buttons will set your teaser length and/or page break points.  If working on a site which requires these features the buttons are worth it. (Alternatively, you could copy the plugin from the FCKeditor module's folder to your Wysiwyg API folder and then add the name in your fckconfig.js file if you aren't using the FCKeditor module).
  3. Switch to the Silver skin - the default of all FCK instances is to display a yucky brownish color to match an old Microsoft Office theme.  Not all of my users are on Windows so I switch to "Silver" when I can.
  4. Disable the right-click context menu - when you're editing text in a web browser you often need the real right-click menu rather than one supplied by JavaScript.  Why? Some browsers block the JavaScript code.  Also, Firefox lets you change languages with this menu - quite important for spell checking.  You can change this in fckconfig.js as well.
  5. Disable built-in spell checking - Most editors like this do a terrible job of spell checking anyway.  Modern browsers do a better job on their own.  Let the browser manage spelling like you know it should.  Again, fckconfig.js will get you there.
  6. Whitelist any odd HTML notation you use - one of my projects involved adding <span> tags into the editor space.  Not ideal code, I know, but it is worth noting that you can override the editor to allow certain tags through when it cleans up the code.  Yes, FCK cleans up broken HTML code.  It even has options in fckconfig.js for removing Word-style formatting.  What bliss.

Those changes will probably make you a hero with most offices.  FCK is widely supported so if you get it working it just works.  It has been around for a long time and seems to have pretty good momentum.

When you want to move on to bigger and better things I suggest creating your own plugins for FCK.  They can be complex, but they are well worth the struggle when you get them working.  I use a custom plugin to produce buttons which insert pre-defined text into the HTML.  More on that in a future posting.