Creating web pages using Perl – Lars Nordenström's blog

p(“When I started this blog, I realised that it would not be long before “,
“I would get tired of the temperamental behaviour and “,
“rather lacklustre capabilities of the on-line WYSIWYG editor. “,
“And sure enought, after less than ten posts I decided that somehing had to be done. “),

When I started this blog, I realised that it would not be long before I would get tired of the temperamental behaviour and rather lacklustre capabilities of the on-line WYSIWYG editor. And sure enought, after less than ten posts I decided that somehing had to be done.

I started looking for tools that could convert an input text file into a nicely formatted HTML page, something like LaTeX, but for web pages.

There is of course latex2html, which is quite capable. But there has only been three updates since 2002, and I am weary of depending on a project that seems almost dead.

Finally I decided to write my own program for generating HTML pages, in the form of a simple Perl package, post.pm.

I currently use WordPress, running on my own machine, for this blog. Then underlying database uses MySQL.

The HTML-generating part is large platform independent, although I currently use one WordPress-specific feature ([CAPTION], for floating images). This cant be changed if I decide to move to some other blogging platsform.

I talk about generating content for my blog, but post.pm can be used to generate HTML for any type of web page.

To use this package, you need to be familiar with the basics of Perl.

Posts in this blog contain a link to the source code used to generate them, prividing working examples of how to use the package.

h2(“The program”),

p(“Every input file is a Perl program that writes HTML to stdout. “,
“The packages provides functions that converts its arguments to tags. “,
“The function “, ttq($tmp = ‘b(“text”)’),
” generates “, ttq(eval $tmp),
” and other functions, like “, tt (‘strong(…)’),
” and “, tt (‘h2(…)’ ),
” do the obvious thing. “),

p(“Perhaps it is best to show an example:”),

The program

Every input file is a Perl program that writes HTML to stdout. The packages provides functions that converts its arguments to tags. The function b("text") generates <b>text</b> and other functions, like strong(...) and h2(...) do the obvious thing.

Perhaps it is best to show an example:

generate_html(
     p("When I started this blog, I realised that it would not be long before ",
       "I would get tired of the temperamental behaviour and ",
       "rather lacklustre capabilities of the on-line WYSIWYG editor. ",
       "And sure enought, after less than ten posts I decided that somehing had to be done. "),

     ... more paragraphs ...

     h2("The program"),

     p("Every input file is a Perl program that writes HTML to stdout. ",
       "The packages provides functions that converts its arguments to tags. ",
       "The function ",               ttq($tmp = 'b("text")'),
       " generates ",                 ttq(eval $tmp),
       " and other functions, like ", tt ('strong(...)'),
       " and ",                       tt ('h2(...)'    ),
       " do the obvious thing. "),

     p("Perhaps it is best to show an example:"),

The post function takes a list of string, removes all new lines preceeded by Ctrl-B within the strings, and prints them. The _ function HTML-qoutes all special characters, converting e.g ‘&‘ to ‘&‘. The tt function wraps it arguments inside <b><tt>...</tt></b> tags. Well, you get the picture.

Most functions take a parameter hash as an optional first argument. If the functions sees a key that it knows about, then it acts on it and removes it. Any remaining elements in the parameter hash are converted to HTML tag attributes, and put inside the tag.

p({class=>”computer”}, “command”)

As an example, p("command") will generate

<p>command</p>

p({class=>”computer”}, “command”)

while p({class=>"computer"}, "command") will generate:

<p class="computer">command</p>

I can be sure that the example is correct, because I first used a variable to store the Perl code p({class=>"computer"}, "command"), and then evaluated it to show the expansion:

($tmp = q[p({class=>"computer"}, "command")]),
p("while «$tmp» will generate:"),
source_codeq(eval $tmp),

The first line return an empty string, and has the side-effect of putting a string into $tmp. The second lines show the use of guillemets as a shorthand for tt(_(text)). Using this feature works with Unicode, and may work with other encodings. Note that the post.pm file is Unicode encoded. On the third line, the source_codeq() function puts content inside a <pre>...</pre> tag pair. The source_codeq function removes all leading tabs (Perl RE /^ +/ from every line, as well as any leading and trailing empty lines.

This example shows why it is a good idea to write pages in Perl. Of course, other scripting languages are possible, but Perl is particularly strong on text processing.

Returning to the subject of parameter hashes, there may be more than one parameter hash at the beginning of the argument list. This makes it easy to add an additional parameter hash, and forward the now extended parameter list to another function. One example is the function img_float_right:

sub img_float_right {
    return _img_float({ align => "alignright" }, @_);
}

This will work even if @_ starts with a parameter hash, since _img_float will pop all leading hashes from its argument list.

Directory structure

post.pm does not enforce a directory structure, but there are a few conventions.

The configuration variable blog_root ($BLOG_ROOT) points to the root directory of the source files (.post and .page, as well as other files, like images). This variable must be set on the command line (-r /path/to/blog_root) or in the environment (export BLOG_ROOT=/path/to/blog_root).

When you run a .post or .page file (remember that they are complete perl programs), the output file will be written to the same directory, and with the same filename, except that the extension will be .html

The support script wordpress-generate-posts.pl

Searching for dependencies

Since relative links don’t work well (usually: not at all) with dynamically generated content like a blog or a CMS-driven website, .post routines like img and href generate absolute paths when provided with a filename not containing a forward slash. As an example, if $BLOG_ROOT/posts/slug-2/any-name.post contains a reference, like:

href("attachment.pdf", "Get PDF version")

the link will be:

href="/posts/slug-2/attachment.pdf"

assuming that post.pm finds attachment.pdf in /<parent>/files/slug-2/.

There is no formal documentation for this module. Download archive for more information.

To actually publish the pages, a quick way is to simply paste into the on-line editor in text mode, but that is hardly convenient in the long run.

In Uploading WordPress posts using Perl and XMLRPC I show how to upload contents to WordPress from the command line, using Perl and XMLRPC.

In Uploading WordPress posts using Perl and MySQL I show the same thing, but much faster, using Perl and MySQL.

Generating the pages

Generating creates an HTML file from an input file. The generated file can then be viewed locally, or uploaded.

Input files have a “.post” extension (even though they are really Perl programs) to set them aside from other files. Output files get a .html extension. Here is a script the regenerates output file in the current directory down, or in the directories specified on the command line:

#!/usr/bin/perl -I /h/hamren/src/post/lib/

use strict;

use File::Find;
use List::Util;

use Post::Core;
use Post::Html;           #--- Add to %INC, for checking modification time
use Post::Processors;     #--- Add to %INC, for checking modification time

our @child_options;       #--- From Post::Core
our $opt_f;               #--- From Post::Core
our $opt_n;               #--- From Post::Core

my  $lib_dir    = ($INC{'Post/Core.pm'} =~ m,(.*)/Post/Core.pm$,)[0];
my  $total      = 0;
my  $processed  = 0;
my  $core_mtime = List::Util::max(map { (lstat($INC{$_}))[9] } grep(m:^Post/:, keys %INC));

sub usage() {
    print(<<END);
    Usage:
        $0 <options>
            -f: Force (assume that generated files are out of date)
            -q: Be completely quiet
            -n: No execution, i.e. a dry run
            -v: Show files not processed (default is to show only files processed)
END
}

#----------------------------------------------------------------------------------------------------------------------------------------------------------- find_callback

sub find_callback {
    if (m/^.*\.post$/s) {
        my  $post       = $_;
        my  $html       = $_;  $html =~ s/post$/html/;
        my  $post_mtime = (lstat($post))[9];
        my  $html_mtime = (lstat($html))[9];
        if ($opt_f || $post_mtime > $html_mtime || $core_mtime > $html_mtime) {
            my @cmd = ("perl",  "-I$lib_dir", $post, @child_options);
            trace(1, "    Processing $post");
            trace(2, "        [%s]", join(' ', @cmd));
            $opt_n || system(@cmd);
            $processed++;
        } else {
            trace(2, "        Skip $post");
        }
        $total++;
    }
}

#----------------------------------------------------------------------------------------------------------------------------------------------------------- main

sub main {
    File::Find::find({wanted => \&find_callback, no_chdir => 1}, @_);  #--- Find and process *.post files
    trace(1, "Processed $processed of $total files.");
}

main(@ARGV ? @ARGV : config('blog_root'));

Since all arguments get passed to find, they can be anything that results in a valid find command line.

To unconditionally update a file, delete the .html file. This is useful if some other file than the .post file changes, like e.g. post.pm.

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.

The program

Directory structure

Searching for dependencies

Generating the pages

Cancel reply