Existing import / export modules in Drupal contrib
I have gone through all of the reasonably established and well-known import / export modules in Drupal contrib, and I have compiled some notes on them. The notes relate mainly to the user interface, the file handling, the parsing system, the ID handling, the format support, and the abstraction that can be found in these modules.
Navigation: accessed through 'export' and 'import' local menu tabs on 'admin -> categories' page.
Export UI: lists available vocabularies, user clicks vocabulary and is taken direct to the exported XML, for which a 'download' prompt appears (due to the HTTP content-disposition header, discussed below).
Import UI: user can browse for file to upload, user selects whether to add terms to an existing vocabulary, or to let the XML file determine the vocabulary. UI warns that existing terms will not be overwritten (user assumes the same applies for existing vocabs).
ID / key handling: relationships are properly maintained for keys, but new IDs are generated for all vocabs / terms. User is not given the option to preserve (or try to preserve) existing IDs.
HTTP header handling:
- for exporting files, the HTTP 'content-disposition' header is set to 'attachment'. This has problems in IE∞ - perhaps there's a better solution?
- the 'content-type' header is set to 'application/octet-stream', identifying it as a binary file∞. This is wrong, as an XML file is text, not binary. Should be a better way to do this, as well.
- ugly hack for handling IE 5.5 and Opera - content-type is set to 'application/dummy' instead.
Formats available: XML only. This is a given, since there's no separation between the formatting engine and the code to export from / import into the DB. The name of the module kind of gives it away too ;-).
XML export: done directly, by printing the output within nested tags. No abstraction whatsoever. Has the distinct advantage of being simple and easy to understand - but also misses out on really simple things, like line breaks and indenting (maybe this is just a bug and should be patched?), which an XML generation library would handle. Then again, XML generation libraries are cr@p in my experience. Also exports all data as XML tags - no XML attributes used.
XML import / parsing: done with PHP's basic XML parsing functions (and Drupal's PHP4/5-safe wrapper). Once again, little to no abstraction. Data is parsed, and then straightaway saved using taxonomy_save_vocabulary() and taxonomy_save_term(). Not a bad approach, but not a particularly flexible one either.
Support for multiple import files: somewhat present, since you can choose the vocabulary to assign new imported terms to (you can also choose the parent term, but this feature seems to be commented out in the code - buggy / won't fix perhaps?). Apart from that, each file is essentially a separate import.
Support for attached file import (e.g. images, audio, documents): not present - the taxonomy system doesn't support any such files anyway.
Important note: this module is maintained by one of my mentors,
Sami Khan∞!
Navigation: accessed through 'import/export' top-level menu page.
UI: confusing in many places. Import and export UI is all muddled together on one page. CSV and XML import/exports are inconsistent, in terms of UI look-and-feel, and in terms of available options.
Export UI: for CSV export, you can choose which node type to perform an export on, and you can choose the exact fields to export. For XML export, there are several 'groups' of fields that you can choose to export. For XML, you cannot choose what node types to export, but if you're exporting books, you can choose to export a particular page and all its children. XML export doesn't seem to work for non-book types. For both XML and CSV, you can choose either 'export' (prints export data in a themed Drupal page), or 'friendly export' (asks you to save exported data as a file).
Import UI: Once again, inconsistent. For CSV, you can only paste your import data into a textfield, and for XML, you can only upload a file. For both formats, you can choose to either 'update existing records', 'insert non existing records', or 'delete records'. However, the 2nd option doesn't seem to work, so effectively you can't import data that doesn't already exist (handy, eh?).
ID / key handling: all IDs are preserved, since the module cannot handle importing new nodes, only existing ones for which the IDs are already established.
HTTP header handling:
- as with taxonomy XML, the HTTP 'content-disposition' header is set to 'attachment'
- the 'content-type' header is set to either 'text-plain' or 'text/xml'.
Formats available: XML and CSV. However, the CSV format supported by this module does not seem to conform to the
CSV standard∞, as it uses pipes instead of commas as delimiters, and does some other strange things.
XML export and import / parsing: done with a 3rd-party library called phpxml, which is included with the module. Not a very attractive library, but it seems to do the job. Because CSV is also supported, there is some abstraction between the XML parsing and the import / export logic, but not much - a lot of code duplication in places. Much of the data is exported as XML attributes, rather than as tags.
CSV export and import / parsing: done directly, with file output and regular expression filters, and PHP's implode and explode functions. Once again, some (but not enough) abstraction.
Support for multiple import files: Not present. Even support for any files is inconsistent between formats.
Support for attached file import (e.g. images, audio, documents): doesn't seem to be present, but hard to tell.
Important note: only works with Drupal 4.6. Also, code is hard to read, and does not adhere to Drupal coding conventions.
Navigation: accessed by clicking the 'export Drupal XML' link on a book page. This is done automatically by the book module's hook_export() system.
UI: consists of the link described above, as well as a simple 'settings' page (also generated by book's hook_export()).
ID / Key handling: all IDs are outputted as-is.
HTTP header handling: 'content-type' is set to 'text/xml'.
XML export handling: outputted directly, by printing all fields within XML tags. This is abstracted through the book module's hook_export() system. As with import-export, most of the metadata is stored as attributes in XML, rather than as XML tags.
Navigation: accessed through 'import -> book' page.
UI: consists of a simple 'drupal XML file upload' form on this page. Once the XML is uploaded, you can choose whether to 'update existing book pages', or to 'create new book'. Also an option for allowing export of PHP in nodes.
ID / key handling: parent-child relationships are maintained between book nodes. But if the user updates an existing book, IDs are kept; and if the user creates a new book, new IDs are generated. There is no user control over this behaviour.
XML import parsing: is done through a combination of PHP's built-in XML parsing system, a custom XML parsing helper class, and a custom node saving class (as well as a node definition class). This setup is very clean and well-architected: the XML parsing helper class is passed as the parser object to the built-in PHP parser. Additionally, the code for actually saving the imported data into a node is separated, and is simply defined on the XML parser as a callback function. So although there is no abstraction between the import process and the XML engine, there is a separation between the XML parsing and the node saving.
Support for multiple import files: not present. This is a shame, as a feature to 'import into existing book x' would be useful here.
Support for attached file import: not present. So if your book nodes have attachments associated with them, too bad - not part of the DXML definition anyway.
Navigation: accessed through 'administer -> content -> import' page.
UI: consists of a wizard-like form, where the user is first asked to upload a CSV file (and to select a node type); then is asked to map each field in the CSV file to a field for the selected node type (these mappings are stored in the DB for each node type, so that they can be re-populated for future imports); then is asked to set additional values for the nodes (i.e. workflow options, authoring options); then is able to preview the import; and then finally is asked to confirm the performing of the actual import.
ID / key handling: relationships between nodes are not recognised or catered for - nodes are considered flat 1-1 entities. Additionally, all imported nodes are assumed to have no existing ID, so an ID is created for all imported nodes (no option to supply IDs in the CSV file). Book nodes are not available as an import node type, so no need for parent-child handling. CCK types are supported, but don't currently work, so couldn't test. However, I don't think that nodereference fields would be handled properly, as there's no code that I can see to cater for them.
CSV import / parsing: parsing is done using PHP's fgetcsv() function, which reads each line of the CSV file into an array. The keys for all the fields are then transformed using the mappings that the user specified. Additional static fields relevant to the node type are added on using the node import _static() hook. The fields can be transformed or processed in some way by using the _prepare() hook. Finally, the nodes are saved, and further processing can be done after this.
Support for multiple import files: not present.
Support for attached file import: not present. No support for node types that are based around files either, e.g. image, audio.
Data definition system: this is the real strength of the node import module. Each supported node type defines its fields through hook_node_import_fields(), as a simple array. Fields can be handled and specially processed using the module's other available hooks. Fields are added to the node object, and the actual importing of the fields is dependent on some magic taking place within node_save(). Each field can be mapped to any of the data fields that are getting imported from the CSV file.
Important note: this is a very old module (first aired dec 2003), that has been contributed to by a variety of experienced Drupal developers over the years, and that has managed to keep up with the heady pace of Drupal core development. This 'heritage' aspect of the module means that there are a great many valuable ideas that have been implemented into it, and that can be re-used elsewhere; but it also means that in some ways the module is tied down by its past, and that it is still carrying some baggage from its days as part of older Drupal versions.
Navigation: accessed through 'administer -> settings -> user_import' page.
UI: consists of:
- A 'list imports' page, where all previous imports are listed, where all imports that have been saved but not yet run are listed, etc.
- A 2-page wizard form, where users can first either upload a CSV file or select a CSV file on the web server, and can then set various options, as well as map CSV fields to Drupal fields. The settings that the user configures here can either be saved (to be used later), tested, or executed.
- Users can run, test, configure, or delete saved/previous imports from the overview page.
- Also a 'configure' page for various general settings.
ID / key handling: all imported users are given a new UID - no option to import existing UIDs from the CSV data. Also no option to link new users to other entities, e.g. nodes, comments (as authors).
CSV import / parsing: parsing is done using PHP's fgetcsv() function, which reads each line of the CSV file into an array. The CSV file can be uploaded through the form, or it can be already on the filesystem (files are stored in the module's directory - they ideally should be in the 'files' directory, and should be better managed, e.g. stored in per-user folders). The keys for all the profile fields are then transformed using the mappings that the user specified. Core user data is immediately saved using user_save(), and additional profile fields are saved by directly inserting profile data into the database. No abstraction between CSV parsing and data saving - the call to user_save() is in the same function that loops through and executes fgetcsv().
Support for multiple import files: not present. However, there is a very clever system for large imports, where only a subset of the data is imported during the request, and additional data is imported later as a cron job. The max number of records to import at once is a user-configurable setting (defaults to 250).
Support for attached file import: not present. Pity, since the 'picture' file is part of the core user entity.
Import management: this is the real stand-out feature of this module. All imports (whether submitted, run, or tested) are saved to the database, and can be re-configured, re-run, or re-tested again any number of times (and can be deleted). Errors are logged and can be viewed for each import or test import that has run.
There are no comments on this page. [Add comment]