Anagram finder in Python

This is the Python implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.

Also, you may find the C++ article helpful, since it describes member data and functions in some detail, not repeated here. This article describes only the differences between the C++ implementation and this implementation.

The Python and C++ implementations are almost identical; the differences are mostly syntactical.

The AnagramFinder constructor creates the list of words in two passes, to avoid extracting the same substring twice. This could have saved some execution time, but actually it does not.

Sorting in Python works by comparing a reference string for each object. The caller provides a function that returns the reference value for an element. That function gets called once, and only once, for each element to be sorted. In the program below that function is

    @staticmethod
    def compare_key(word):
        return "{:s}{:6d}".format(word.unified, word.offset)

Note that a numeric value has to be right-justified.

The source code

#!/usr/bin/python

import sys

MINIMUM = 3

#---------------------------------------------------------------------------------------- Word

class Word(object):
    def __init__(self, offset, value, unified):
       self.offset  = offset
       self.value   = value
       self.unified = unified

#---------------------------------------------------------------------------------------- AnagramFinder

class AnagramFinder(object):
    def __init__(self, text, l):
        self.len      = l
        self.accepted = []
        self.low_offset = 0;
        self.words    = [ Word(n, text[n:n+l], None) for n in range(len(text)-l+1) ]
        for it in self.words: it.unified = AnagramFinder.get_unified(it.value)

    @staticmethod
    def get_unified(value):
        tmp = list(value)
        tmp.sort()
        unified = ''.join(tmp)
        return unified

    @staticmethod
    def compare_key(word):
        return "{:s}{:6d}".format(word.unified, word.offset)

    def print_accepted(self):
        print("{:3d}:".format(len(self.accepted)), end = '')
        for w in self.accepted:
            print(" {:4d}:{:s}".format(w.offset, w.value), end = '')
        print()

    def add_candidate(self, cand):
        if (cand.offset < self.low_offset): return
        for it in self.accepted:
            if (it.value == cand.value):
                return
        self.accepted.append(cand)
        self.low_offset = cand.offset + self.len;

    def add_range(self, anagrams):
        if (len(anagrams) >= MINIMUM):
            self.accepted   = []
            self.low_offset = 0
            for it in anagrams:
                self.add_candidate(it)
            if (len(self.accepted) >= MINIMUM):
                self.print_accepted()

    def find_all(self):
        self.words.sort(key = AnagramFinder.compare_key)
        anagrams = []
        prev = 0
        for it in self.words:
            if (prev and it.unified != prev.unified):
                self.add_range(anagrams)
                anagrams = []
            prev = it
            anagrams.append(it)
        self.add_range(anagrams)

#---------------------------------------------------------------------------------------- Main program

def read_text():
    b = bytearray()
    for line in sys.stdin:
        b.extend(line.upper().encode('ascii'))

    len = 0
    for byte in b:
        if (byte >= ord('A') and byte <= ord('z')):
            b[len] = byte
            len += 1

    return b[0:len].decode('ascii')

def main():
    text = read_text()
    l = len(text)
    start = 6
    start = int(sys.argv[1] if (len(sys.argv) > 1) else 6    )
    to    = int(sys.argv[2] if (len(sys.argv) > 2) else start)
    if (0): print("len = {:d}\ns = [{:s}]\n".format(l, text), end='')

    for l in range(start,to+1):
        if (l > start):
            print("")
        print("len = {:d}".format(l))
        AnagramFinder(text, l).find_all()
    print("string length = {:5d}".format(len(text)))

main()

#--- Local Variables:      -*-
#--- coding: us-ascii-unix -*-
#--- mode: python          -*-
#--- End:                  -*-

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.