Working with Drupal's Migrate module

UPDATE: if you're interested in this article you should check out the slides from my 4th migrate presentation. The registration system in migrate has changed ("automatic" registration is gone) and I've explained it all in better detail in the slides. I even cover the pathway of the variables if you're new to OOP. ...and yes, I'm available for support and/or workshops.

I have had the opportunity to work on two Drupal sites that involved the Migrate module. I have in the past worked fairly extensively with the Feeds module to do this type of thing. Feeds is a great and powerful module, but at a certain point you will probably find that you want to use Migrate.

Why use Migrate rather than Feeds?

  • Rollbacks. The migrate module makes it damn easy to rollback and re-import.
  • Rules in code. Any changes you do to your input file are stored in a coded template.

The downside as you have probably already guessed is that you're going to spend a lot of time writing custom code here. On the bright side, there isn't that much complicated stuff once you know what everything is trying to do.

Example Migrate Templates

These are the ways that do not involve using much code. The reality is that you *will* be doing code with almost any Migrate project. Before you get started the best things you can do are (1) read the manual on Drupal.org, and (2) find a reference template that is close to what you want to do.

There is one additional thing... the beer and the wine templates. These are part of the migrate_example package that is part of the main migrate module. Review these, the code is commented specifically for this purpose.

Also check out the following as possible starting points:

Additionally, there is a Wordpress conversion template, and ones for Typo3 and PhpBB. The Wordpress one has a UI and works quite flawlessly (though you will need to remap your image locations somehow).

Migrate and the Future of Drupal

Some people have suggested using Migrate as the main way to upgrade between versions of Drupal. My guess is that my team is going to use it a lot for D7->D8 migrations when the time comes, and I suspect that it will be the default way for D9.

Setting Up Your Dev Environment and Migrate Basics

One thing about Migrate that you should know is that some of the queries that will run will be enormously expensive processor-wise. Especially when things aren't working correctly. I find it is best to work from a local environment rather than a dev server until you have a working template. Additionally, it helps to have a process monitor showing the activity on the machine so you can see when things go wrong.

Another tip that you will need to know is the command drush_print_r($variable); in your PHP code that will help you to analyze the objects referenced in your code and to see what structure Drupal is expecting when you put stuff into $row->something; (prepareRow) or $entity->field_name[][0]['value']; (prepare).

Familiarize yourself with the following commands, as it is recommend to use drush to do your migration:

drush ms
drush mi $migration
drush mr $migration

You will probably need some debugging tools as well:

drush_print_r($var); - this command does what you would expect - dumps a variable to the command line. You put this in your PHP code.

drush_print_r($query->__toString()); - this will dump the MySQL query that is generated by Drupal's dynamic query engine. If you have one template that won't run, this can come in handy. You can adapt the output of this to run in MySQL directly - useful if you need to confirm that the query is causing memory problems. You can prepend the keyword EXPLAIN to the query generated for additional info from MySQL.

Anatomy of a Migrate template

The migrate module needs some basic definitions so that it will show up on the Migrate page, which is a tab on the Content list page. Most of the time "registration" of the migration classes is "automatic" but I have found that it is useful to use the tab on that page to force it to look for migrations.

In the case of Commerce Ubercart Migration module, there is also a settings tab which will appear for that. You will want to configure it to connect to the existing data set and it will create a bunch of "migrations" to show on the main Migrate page.

When your migrations are all listed, you can click one and see what fields are mapped and which are not. This is where you're going to spend a lot of your migrate time.

Typically you will have a custom module with the following:

  • custom.info file
  • custom.module file, blank... unless you want to use it for something*
    • *an optional API definition if not using automatic-registration
  • something.inc, your first migration, listed in your files in the info file
  • another.inc, your second migration, also listed in files of info file

Within your .inc files you will need to know about the following components which are almost always going to be in your migration:

  • class CommerceMigrateUbercartNodeMigration extends DynamicMigration class definition setting what migration will be used as a base. Then the following sub-components:
    • public function __construct(array $arguments); // initialize the migration, connect to source and grab the settings! Contains a reference to what will be used to map the migration (keeps track of changes to IDs)
    • public function prepareRow($row); // preprocess function, DOES NOT RUN FOR XML IMPORTS
    • function prepare($entity, stdClass $row); // last chance to do stuff. Can run SQL queries here, first param is always the target node/order/entity fully expanded, so put in your languages setting and delta values!
    • function complete($entity, stdClass $row); // rarely needed if you did things right. Can be used for doing post-migration tasks but could complicate your rollbacks, so be sure to test!

Initializing the Migration

With Migration templates you're dealing with Object Oriented Programming, not a staple in our Drupal PHP diet yet. If you're familiar with OOP you'll be right at home. Let's start out with a definition:

class CommerceMigrateUbercartNodeMigration extends DynamicMigration {
  // An array mapping D6 format names to this D7 databases formats.
  public $filter_format_mapping = array();

  public function __construct(array $arguments) {
    $this->arguments = $arguments;
    parent::__construct();
    $this->description = t('Import product nodes from Ubercart.');
    $dependency_name = 'CommerceMigrateUbercartProduct' . ucfirst($this->arguments['type']);
    $this->dependencies = array($dependency_name);

    // Create a map object for tracking the relationships between source rows
    $this->map = new MigrateSQLMap($this->machineName,
      array(
        'nid' => array(
          'type' => 'int',
          'unsigned' => TRUE,
          'not null' => TRUE,
          'description' => 'Ubercart node ID',
        ),
      ),
      MigrateDestinationNode::getKeySchema()
    );

There is a lot of stuff going on here. Firstly, we're extending a base class. That means this will apply to any objects of that class. So we can just assume that if the base class Migrate is running, that this class will run as well. What follows is a constructor, which what is going to run at startup.

The last part of this code block is the most interesting: $this->map. Here is how Drupal is going to track the objects in your migration. It is your key for each item in your import - and it is going to BREAK if you are trying to use XML as your import source. Then you're going to have to dig around in your DB and change that type from int to something else.

Connecting to Source(s)

The most exciting part about this part of the migration template is that you have many options. In this case we're looking at a D6->D7 migration, so in our case D6 is our source! Migrate takes care of the actual connection to the database so that part is easy, you can point it right at production to grab the data. As noted above, XML can also be your source(s). When I say source(s) I mean possibly plural... the example code provided will let you cycle through many XML files during the import process. Potentially really handy if your exports were run in batches.

Note, that the switch statement in the start of this module is related to the UI of Migrate, where the author of this module has made a switch available. Most migrations don't provide a settings form for Migrate but this one does.

    // Create a MigrateSource object, which manages retrieving the input data.
    $connection = commerce_migrate_ubercart_get_source_connection();

    $query = $connection->select('node', 'n');

    switch (SOURCE_DATABASE_DRUPAL_VERSION) {
      case 'd7':
        $query->leftJoin('field_data_body', 'fdb', 'n.nid = fdb.entity_id AND n.vid = revision_id');
        $query->leftJoin('field_data_field_campaign_id', 'cid', 'n.nid = cid.entity_id');
        $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
              ->fields('fdb', array('body_value', 'body_summary', 'body_format', 'language'))
              ->fields('cid', array('field_campaign_id_value'))
              ->condition('n.type', $arguments['type'])
              ->distinct();
        break;
      case 'd6':
        $query->leftJoin('node_revisions', 'nr', 'n.nid = nr.nid AND n.vid = nr.vid');
        $query->leftJoin('filter_formats', 'ff', 'nr.format = ff.format');
        // ... and so on ...
    }
    $this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE)); // connect!

You probably noted the D7 and D6 options have different mappings. Take note. If you're going to extend one, do the one you intend to use.

When your main Migrate page stops working...

Eventually if you start adding fields your query will silently break, and will force Migrate's listing page to hang. Usually you can still access the page for the specific migration but since the purpose of these pages is to help guide your clients through the process this is a severe deal-breaker. Fortunately, it is easily solved one of two different ways. Either (1) don't bother counting how many tasks are to be done or (2) specify a basic query that can be tabulated quickly.

Here is how you do that:

$this->source = new MigrateSourceSQL($query, array(), NULL, array('map_joinable' => FALSE, 'skip_count' => TRUE));

In your source you specify skip_count as a parameter and you set it to TRUE. Now you're going to have an X where the number of items to be migrated was... but you can at least get to that page. Whew...

Adding each of your fields

This is where start getting fun... or really boring. Your pick. We now have to go through each field that will be brought into the database and create a relationship with the place where we want it to go.

Each field in your migration is going to have the following components:

  • A field in your target content type to hold the incoming data
  • An extension to the $query object to perform a LEFT JOIN if your source is in a table that is not in the base query
  • An extension to the $query object's fields to SELECT by column name
  • An extension to $this to connect the old field to the new field
  $query->leftJoin('content_field_location', 'loc', 'n.nid = loc.nid');

  $query->fields('n', array('nid', 'type', 'title', 'created', 'changed'))
             ->fields('loc', array('field_location_value'));

  $this->addFieldMapping('body', 'body_value') // add semicolon if no options
         ->arguments($arguments)  // optional
         ->defaultValue(''); // optional

This wouldn't be Drupal without some "gotchas". As I was adding fields in each of the migrations I have done eventually the main Migrate page stops listing out my migrations. This was REALLY aggravating and I spent at least a full week debugging this issue across the two migrations I have done. The solution? Stop counting how many things you are going to process (see the section above about using skip_count when connecting to the source.

It probably goes without saying... but test EVERY field after you have defined it. This is where you'll start finding weird stuff in your data. Unless, like nobody before you, your previous database is perfection in every way. Highly doubtful. As you discover problems in your code you can deal with them in processRow or process functions.

Processing the Data

Now that we have all of our fields coming into Drupal and our Migrate page has no errors on the main list and additionally no errors on our specific migration it is time to process the data! Most of the time this will not be necessary, but sometimes you must change things as they import so it will fit the new data structure.

  public function prepareRow($row) {
    // Transform body format to something we can use if it's not already.
    if (!filter_format_exists($row->body_format)) {
      $row->body_format = $this->transformFormatToMachineName($row->body_format);
    }
    $row->temp_data = "somejunk"; // will be passed to prepare function then dropped
    $row->field_text = "XXX"; // sets the value of a text field to the string "XXX" (works for many simple fields)
  }

Within your migration class you will probably always have a prepareRow function. This is where you can make some changes, or alternatively, reject a certain row before it gets imported by return FALSE;. This step DOES NOT RUN for any XML import you do, not even to carry over temp data. Gotcha!

  function prepare($entity, stdClass $row) {
    $entity->field_text['fr'][0] = 'YYY'; // manually set field_text French version (if using entity_translation), for delta 0 (the first instance if allowing multiple values) to YYY
  }

The prepare function is your second chance to change things using the entity pre-rendered, fully expanded into Drupal's typical array syntax. If you're doing XML processing, this is where you need to make any transformations. You can also grab any data you setup in prepareRow... or just run SQL queries here. Be forewarned... if you do SQL queries here it is generally better to load all of your data earlier on (in $this->query) so that you're not hammering your database with requests or blowing out the memory on your server. Those things are hard to detect.

  function complete($entity, stdClass $row) {
  }

The complete function runs at the end. In some cases it is necessary to make an update after the data has been imported. For example: if you are importing line items that are part of an order in Commerce, you will have a task here to update the order total after each line item has been imported. Migrate uses $this->map to track the progress and ID changes between all of your migrations. This function takes that into consideration.

So... sweet, we're already done and we don't even need to use the complete function! That was easy right? Sort-of. It is not always easy. It is a long time-consuming process to verify all the data is correct and to get this far. If you're considering taking on a Migrate project be sure to consider all the possibilities of problems that will come up with the old data. Quantify all of your content types and entities that will be migrated and every individual field contained therein. Expect that each field will need a certain level of review and base your estimates/calculations on that.