Uploading WordPress posts using Perl and XMLRPC

There are at least three ways in which pages can be remotely uploaded to a WordPress site.

  • Using the modern HTML-based (REST) API v2. I almost, but not completely, got this to work. Unfortunately, documentation is mostly missing, and partly incorrect, so I will skip this one, at least for now.
  • Using the older XMLRPC API. There are two Perl modules, WordPress::API and WordPress::API::Post, that make using this API a breeze. The downside of XMLRPC (and the REST API above) is speed, with each post taking more than a second to upload. Using XMPLRPC is the focus of this article.
  • Writing directly to the database. This is a lot faster, perhaps by a factor of 1000 on a local database.See this article.

Uploading using XMLRPC

Uploading is done by the script wordpress-upload-posts.pl which is part of the archive.

This script should be run in the directory contaning all the post directories.

Uploading, as done by the script, involves several steps:

  • Download all the posts from the server (WordPress::API->new->getRecentPosts()).
  • Create a hash that maps “slugs” to IDs.
  • For each downloaded post
    • Check that there is exactly one .post file in the directory named after the “slug”. The .post file can have any name
    • Check that there is a corresponding .html file. The file must have the same name as the .post file, but with an .html extension
    • Check if the .html file is newer than the timestamp (.ts) file.
    • Download the post using its ID(WordPress::API::Post->new->load()).
    • Update the content field ($the_post->description($new_text)}).
    • Upload the updated post ($the_post->description()}).

A typical filename triple:

creating-web-pages-using-perl/page.post
creating-web-pages-using-perl/page.html
creating-web-pages-using-perl/page.ts

The timestamp file help keeping total upload time down. To unconditionally upload all posts, delete all the .ts files

Key points in the source code

First, a connection for reading all the posts must be set up:

my $w = WordPress::API->new({
                             proxy    => $proxy,
                             username => $username,
                             password => $password
                            });

proxy is the URL of the file xmlrpc.php in the root of the WordPress installation directory, something like https://www.sdu.se/blog/xmlrpc.php. username and password are of course the username and password of a user that has read and write permissions on the site. These values are passed to the script as three environment variables: PROXY, WUSERNAME and WPASSWORD. The username and poassword variables get a prefix, W for web, to avoid conflicts with other usernames and passwords.

Then all active posts are downloaded, and a map from “slugs” to id:s can be created:

my $posts = $w->getRecentPosts();

map { $slug_to_id{$_->{'wp_slug'}} = $_->{'postid'} } @$posts;

We need the posts id when we send an update request to the server, and we need the “slug” to identify the directory containing the post. Recall that there is one directory for each post, and that the directory containing a post must have exactly the same name as the post’s “slug”. Why use the slug? Well, I’d rather have a directory called creating-web-pages-using-perl than one called 79.

If there are several revisions of a post, then there will be several records for that post in the database, all with the same “slug”. Only the active (usually: latest) revision is return by the getRecentPosts() call, so there is no risk of updating the wrong instance.

Then we loop over all the downloaded posts. For each post, if the .html file is newer than the .ts file, the post is downloaded, it’s contents replaced, and then uploaded again. Downloading the post twice does seem wasteful, but posts returned by getRecentPosts() are in a different format from posts downloaded indivitually. In technical terms, they are represented by Perl hashes that are completely different.

Downloading is easy:

my $p = WordPress::API::Post->new({
                                   proxy    => $proxy,
                                   username => $username,
                                   password => $password
                                  });
$p->id($id);            #--- Set the id
my $tmp = $p->load();   #--- Download post

Updating the body and uploading the update post is equally easy:

$p->description($body); #--- Update body
$tmp = $p->save();      #--- Upload modified post

Download the archive or view wordpress-upload-posts.pl.

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.