This is the Python implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.
Also, you may find the C++ article helpful, since it describes member data and functions in some detail, not repeated here. This article describes only the differences between the C++ implementation and this implementation.
The Python and C++ implementations are almost identical; the differences are mostly syntactical.
The AnagramFinder constructor creates the list of words in two passes, to avoid extracting the same substring twice. This could have saved some execution time, but actually it does not.
Sorting in Python works by comparing a reference string for each object. The caller provides a function that returns the reference value for an element. That function gets called once, and only once, for each element to be sorted. In the program below that function is
@staticmethod def compare_key(word): return "{:s}{:6d}".format(word.unified, word.offset)
Note that a numeric value has to be right-justified.
The source code
#!/usr/bin/python import sys MINIMUM = 3 #---------------------------------------------------------------------------------------- Word class Word(object): def __init__(self, offset, value, unified): self.offset = offset self.value = value self.unified = unified #---------------------------------------------------------------------------------------- AnagramFinder class AnagramFinder(object): def __init__(self, text, l): self.len = l self.accepted = [] self.low_offset = 0; self.words = [ Word(n, text[n:n+l], None) for n in range(len(text)-l+1) ] for it in self.words: it.unified = AnagramFinder.get_unified(it.value) @staticmethod def get_unified(value): tmp = list(value) tmp.sort() unified = ''.join(tmp) return unified @staticmethod def compare_key(word): return "{:s}{:6d}".format(word.unified, word.offset) def print_accepted(self): print("{:3d}:".format(len(self.accepted)), end = '') for w in self.accepted: print(" {:4d}:{:s}".format(w.offset, w.value), end = '') print() def add_candidate(self, cand): if (cand.offset < self.low_offset): return for it in self.accepted: if (it.value == cand.value): return self.accepted.append(cand) self.low_offset = cand.offset + self.len; def add_range(self, anagrams): if (len(anagrams) >= MINIMUM): self.accepted = [] self.low_offset = 0 for it in anagrams: self.add_candidate(it) if (len(self.accepted) >= MINIMUM): self.print_accepted() def find_all(self): self.words.sort(key = AnagramFinder.compare_key) anagrams = [] prev = 0 for it in self.words: if (prev and it.unified != prev.unified): self.add_range(anagrams) anagrams = [] prev = it anagrams.append(it) self.add_range(anagrams) #---------------------------------------------------------------------------------------- Main program def read_text(): b = bytearray() for line in sys.stdin: b.extend(line.upper().encode('ascii')) len = 0 for byte in b: if (byte >= ord('A') and byte <= ord('z')): b[len] = byte len += 1 return b[0:len].decode('ascii') def main(): text = read_text() l = len(text) start = 6 start = int(sys.argv[1] if (len(sys.argv) > 1) else 6 ) to = int(sys.argv[2] if (len(sys.argv) > 2) else start) if (0): print("len = {:d}\ns = [{:s}]\n".format(l, text), end='') for l in range(start,to+1): if (l > start): print("") print("len = {:d}".format(l)) AnagramFinder(text, l).find_all() print("string length = {:5d}".format(len(text))) main() #--- Local Variables: -*- #--- coding: us-ascii-unix -*- #--- mode: python -*- #--- End: -*-
You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090