Anagram finder in Perl, faster – Lars Nordenström's blog

This is the Perl implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.

This Perl implementation uses the faster model solution, and is definitely more Perlish than the first solution, which lokks like what it is, the C++ solution implemented in Perl. The substrings live in @values, and the corresponding unified versions live in @unified. Sorting a done using an index vector @words.

This Perl implementation is roughly twice as fast as the other.

The source code

#!/usr/bin/perl

use strict;
use warnings;
use integer;

use constant MINIMUM => 3;

my $comparisons = 0;
my @accepted    = ();
my @values      = ();
my $count1      = 0;
my $count2      = 0;

#----------------------------------------------------------------- add_candidate
sub add_candidate($$) {
    my ($cand, $len) = @_;
    printf("cand = [%s], scalar @accepted = %d\n", $cand, scalar @accepted);
    $count1++;
    if (@accepted) {
        $count2++;
        return if ($accepted[@accepted-1]+$len >= $cand);
        for (@accepted) {
            return if ($values[$_] eq $values[$cand]);
        }
    }
    push @accepted, $cand;
}

#------------------------------------------------------------------ print_result
sub print_result() {
    my @s = map { sprintf("%4d:%s", $_, $values[$_]) } @accepted;
    printf("%3d: %s\n", scalar @accepted, join(' ', @s))
}

#--------------------------------------------------------------------- find_all
sub find_all($$) {
    my ($text, $len) = @_;
    printf("len = $len\n");
    eval { @values = ($$text =~ /(?=(.{$len}))/g) };
    push @values, 'ZZZZZZZZZZ';

    my @unified = map { $comparisons++; join('', sort(split(//, $_))) } @values;

    my @words = sort {
        ($unified[$a] cmp $unified[$b]) || ($a <=> $b);
    } (0..(scalar @unified - 1));

    printf("scalar \@unified = %d\n", scalar @unified);
    for my $i (0..(scalar @words-2)) {
        add_candidate($words[$i], $len);
        printf("    \$unified[%d] = %s, \$unified[%d] = %s\n", $words[$i], $unified[$words[$i]], $words[$i+1], $unified[$words[$i+1]]);
        if ($unified[$words[$i]] ne $unified[$words[$i+1]]) {
            print_result() if (@accepted >= MINIMUM);
            @accepted = ();
            printf("   Differs\n");
        }
    }
}
#-------------------------------------------------------------------------- main
my $from =  @ARGV ? (shift @ARGV) : 6;
my $to   =  @ARGV ? (shift @ARGV) : $from;

my $text = uc(join("", <ARGV>)) =~ tr/[A-Z]//cdr;

for my $len ($from..$to) {
    #   printf("main: len = $len\n");
    print "\n" if ($len > $from);
    find_all(\$text, $len);
}

printf("string length = %5d, comparisons = %d\n", length($text), $comparisons);
print ("count1 = $count1, count2 = $count2\n");

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.

The source code

Cancel reply