Anagram finder in Cobol

This is the Cobol implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.

Also, you may find the C++ article helpful, since it describes member data and functions in some detail, not repeated here. This article describes only the differences between the C++ implementation and this implementation.

The rearranged source code

To make the code easier to follow, I have rearranged it so that declarations from working-storage section are show together with the code that uses them. The full unmodified source is show at the end of the article.

Start of the program

Declares the single input file Âanagram-inÂ.

identification division.
program-id.   anagram.

environment division.
input-output section.
file-control.
    select anagram-in assign to 'datain' line sequential.

data division.
file section.

fd anagram-in.
01 row     pic x(80).

working-storage section.

Core program data

Delares the letter-only character buf Âtxt and its actual length Âtxt-lenÂ. Declares a table Âword of up 800000 words, and ÂnwordsÂ, the actual number of words. 800000 is large enough to contain almost 1600 copies of the original input text. The `extra’ two entries are used as temporaries by quicksort

Âaccepted is the table of the words that make up the words of the current anagram, and Ânaccepted is the number of entries in use.

Âdata-in is the input buffer for reading the input file. Âlen is the length in characters of the anagrams.

01 af-data.
    05 txt                pic x(256000).
    05 txt-len            pic     9(06) comp value 0.
    05 word                                  occurs 256002.
         08 sort-key.
             11 unified   pic     x(12)      value spaces.
             11 offset    pic     9(06)      value 0.
         08 valu          pic     x(12)      value spaces.
    05 nwords              pic    9(06) comp.

    05 accepted           pic   9(06) comp occurs 30.
    05 naccepted          pic  99     comp.
    05 data-in            pic   x(80).
    05 MINIMUM            pic   9     comp value 3.
    05 len                pic  99     comp.

Miscellaneous global data

Some fields for formatting numeric data, and o counter for calls to the word comparison routines used by quicksort.

01 globals.
    05 numfmt3            pic   z(2)9.
    05 numfmt6            pic   z(5)9.
    05 numfmt9            pic   z(8)9.

01 debug.
    05 comparisons        pic   9(09) comp.

Temporaries

Do not trust these to survive a call.

01 temporaries.
    05 general-use.
        08 i              pic   9(06) comp.
        08 j              pic   9(06) comp.
        08 k              pic   9(06) comp.
        08 m              pic   9(06) comp.

Start of ÂPROCEDURE DIVISIONÂ

Calls paragraph ÂmainÂ, located further down.

procedure division.
    perform main
    stop run
    .

The main program

main.
    open input  anagram-in
    perform  read-infile
    close    anagram-in
    perform varying len from 4 by 1 until len > 10
        if len > 4 display ' ' end-if
        move len to numfmt3
        display 'len = ' numfmt3
        perform af-new-len
        perform af-find-all
    end-perform
    move txt-len to numfmt6
    display 'string length = ' numfmt6 no advancing
    move comparisons to numfmt9
    display ', comparisons = ' numfmt9
    .

Convert a character to an index

Does a binary search for Âchr in ÂlettersÂ. Returns the result in ÂixÂ. The input character Âchr must be an uppercase letter.

    05 chr-to-ix-local.
        08 letters pic x(26) value 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
        08 ix             pic      99 comp.
        08 ix1            pic      99 comp.
        08 ix2            pic      99 comp.
        08 chr            pic       X.
chr-to-ix.
    move 1 to  ix1
    move 26 to ix2
    perform until 1 < 0
        compute ix = ( ix1 + ix2 ) / 2
        evaluate true
            when chr > letters(ix:1) compute ix1 = ix + 1
            when chr < letters(ix:1) compute ix2 = ix - 1
            when true                next sentence
        end-evaluate
    end-perform
    .

Word-printing routines

The routine Âprint-word prints the word at index Âw-ptrÂ. The routine Âprint-words iterates over all words in use.

    05 print-word-locals.
        08 w-ptr          pic   9(06) comp.
print-word.
    display w-ptr ": "
            offset (w-ptr) " "
            valu   (w-ptr) " "
            unified(w-ptr)
    .

print-words.
     perform
         print-word
     varying w-ptr from 1 by 1 until w-ptr >= nwords
     .

Initialize a new anagram finder

    05 af-new-len-locals.
        08 l-count        pic      99 comp occurs 26.
        08 u-put          pic      99 comp.
        08 i-last         pic   9(06) comp.
        08 u-tmp          pic   x(12).
af-new-len.
    move 1 to nwords
    perform varying i from 1 by 1 until i > 26
        move 0 to l-count(i)
    end-perform

    perform varying i from 1 by 1 until i >= len
        move txt(i:1) to chr
        perform chr-to-ix
        add 1 to l-count(ix)
    end-perform
    compute nwords = 1
    compute i-last = txt-len - len + 1
    perform varying i from 1 by 1 until i > i-last

        move txt(i+len - 1:1) to chr
        perform chr-to-ix
        add +1 to l-count(ix)

        move 1 to u-put
        move spaces to u-tmp
        perform varying j from 1 by 1 until j > 26
            perform varying k from 1 by 1 until k > l-count(j)
                move letters(j:1) to u-tmp(u-put:1)
                add 1 to u-put
            end-perform
        end-perform

        move txt(i:1) to chr
        perform chr-to-ix
        add -1 to l-count(ix)

        move i          to offset (nwords)
        move txt(i:len) to valu   (nwords)
        move u-tmp      to unified(nwords)
        add 1 to nwords
    end-perform
    .

Print a single result

af-print-one-result.
    compute numfmt3 = naccepted
    display numfmt3 ':' with no advancing
    perform varying k from 1 by 1 until k > naccepted
        compute numfmt6 = offset(accepted(k)) - 1
        display ' ' numfmt6 ':' valu(accepted(k)) no advancing
    end-perform
    display ' '
    .

Conditionally add a candidate to accepted

Âcand holds the index of the candidate. The candidate must not overlap, and it must not alrady exist in ÂacceptedÂ.

    05 af-add-candidate-locals.
        08 cand           pic   9(06) comp.
        08 o1             pic   9(06) comp.
        08 o2             pic   9(06) comp.
        08 low_offset     pic   9(06) comp.
af-add-candidate.
    if offset(cand) < low_offset next sentence end-if
    move offset(cand) to o1
    perform varying i from 1 by 1 until i > naccepted
        move offset(accepted(i)) to o2
        evaluate true
            when valu(accepted(i)) = valu(cand) move 999 to i
            when o1 > o2 and o1 < o2 + len      move 999 to i
        end-evaluate
    end-perform
    if i < 999
        add 1 to naccepted
        move cand to accepted(naccepted)
        compute low_offset = offset(cand) + len
    end-if
    .

Add a range of candidates

    05 af-add-range-locals.
       08 r-begin         pic   9(06) comp.
       08 r-end           pic   9(06) comp.
af-add-range.
    compute m = r-end - r-begin
    if (r-end - r-begin) >= MINIMUM
        move 0 to naccepted
        move 0 to low_offset
        perform varying cand from r-begin by 1 until cand>=r-end
            perform af-add-candidate
        end-perform
        if (naccepted >= MINIMUM)
            perform af-print-one-result
    end-if
    .

Find all anagrams sets for given word length

af-find-all.
    move    1 to q-sp
    compute q-first(q-sp) = 1
    compute q-last(q-sp)  = nwords - 1
    perform q-sort-main

    move 1 to r-begin
    move 2 to i
    perform varying r-end from i by 1 until r-end >= nwords
        if unified(r-end) <> unified(r-begin)
            perform af-add-range
            move r-end to r-begin
        end-if
    end-perform
    perform af-add-range
    .

Add letters from a single line from the input file

add-letters-to-txt.
    move function upper-case(data-in) to data-in.
    perform varying ix from 1 by 1 until ix > 80
        if data-in(ix:1) is alphabetic
            if data-in(ix:1) is not = ' '
                add 1 to txt-len
                move data-in(ix:1) to txt(txt-len:1)
            end-if
        end-if
    end-perform
    .

Read the input file

    05 read-infile-locals.
         08 done-in       pic       9 comp value 0.
read-infile.
    move 0 to txt-len
    move 0 to done-in
    perform until 1 = done-in
        move spaces to data-in
        read anagram-in into data-in
             at end
                 move 1 to done-in
             not at end
                 perform add-letters-to-txt
        end-read
    end-perform
    .

Variables related to quicksort

01 qsort-related.
    05 q-sp               pic   9(04) value 1.
    05 q-stack                        occurs 100.
        08 q-first        pic   9(06).
        08 q-last         pic   9(06).
    05 q-sp-max           pic   9(06) comp value 0.
    05 q-p0               pic   9(06) comp.
    05 q-p1               pic   9(06) comp.
    05 q-p2               pic   9(06) comp.
    05 q-p3               pic   9(06) comp.

quicksort: swap to words

q-swap.
    move word (q-p1)   to word (256001)
    move word (q-p2)   to word (q-p1)
    move word (256001) to word (q-p2)
    .

quicksort: bubble sort, for small ranges

q-bubble.
    move q-last(q-sp) to m
    perform varying q-p1 from q-first(q-sp) by 1 until q-p1 >= m
      compute q-p2 = q-p1
      compute i    = q-p1 + 1
      perform varying i from i by 1 until i > m
        add 1 to comparisons
        if sort-key(i) < sort-key(q-p2)
          move i to q-p2
        end-if
      end-perform
      if q-p2 > q-p1 perform q-swap end-if
    end-perform
    .

quicksort: pivoting

q-pivot.
    add 3 to comparisons
    compute q-p1 = q-first(q-sp)
    compute q-p2 = (q-last(q-sp) + q-first(q-sp)) / 2
    if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if

    compute q-p2 = q-last(q-sp)
    if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if

    compute q-p1 = (q-last(q-sp) + q-first(q-sp)) / 2
    if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if

    move sort-key(q-p1) to sort-key(256002)
    compute q-p1 = q-first(q-sp) + 1
    add -1 to q-p2
    perform until q-p1 > q-p2
        perform until q-p1>q-p2
          or sort-key(q-p1)>sort-key(256002)
            add 1 to comparisons
            add +1 to q-p1
        end-perform
        if q-p1 > q-p2 add -1 to comparisons end-if
        perform until q-p1>q-p2
          or sort-key(q-p2)<sort-key(256002)
            add 1 to comparisons
            add -1 to q-p2
        end-perform
        if q-p1 > q-p2 add -1 to comparisons end-if
        if q-p1 < q-p2
            perform q-swap
        end-if
    end-perform

    add -1 to q-p1
    add +1 to q-p2
    .

quicksort: quicksort a range

q-qsort.
    perform q-pivot

    move  q-first(q-sp) to q-p0
    move  q-last (q-sp) to q-p3
    add -1 to q-sp
    if q-p1 > q-p0
        add 1 to q-sp
        move q-p0 to q-first(q-sp)
        move q-p1 to q-last (q-sp)
    end-if
    if q-p3 > q-p2
        add 1 to q-sp
        move q-p2 to q-first(q-sp)
        move q-p3 to q-last (q-sp)
    end-if

    if (q-sp > 1)
        move q-first(q-sp - 1) to q-p0
        move q-last (q-sp - 1) to q-p1
        move q-first(q-sp    ) to q-p2
        move q-last (q-sp    ) to q-p3
        if (q-p1 < q-p2 or q-p0 > q-p3)
            if (q-p1 - q-p0 < q-p3 - q-p2)
                move q-stack(q-sp) to q-stack(q-sp - 1)
                move q-p0          to q-first(q-sp    )
                move q-p1          to q-last (q-sp    )
            end-if
        end-if
    end-if
    compute q-sp-max = function max(q-sp, q-sp-max)
    .

quicksort: print stack (for debugging)

q-print-stack.
    display ' '
    perform varying i from 1 by 1 until i > q-sp
        compute m = q-last(i)  -  q-first(i)
        display i " " q-first(i) "-" q-last(i) ":" m no advancing
        perform varying j from q-first(i) by 1 until j>q-last(i)

            display " [" sort-key(j)(1:4) "]" no advancing
        end-perform
        display ' '
    end-perform
    .

quicksort: main routine (entry point)

q-sort-main.
    perform until q-sp = 0

        if q-last(q-sp) - q-first(q-sp) < 5
            perform q-bubble
            add -1 to q-sp
        else
            perform q-qsort
        end-if
    end-perform
    .


The unmodified source code

 identification division.
 program-id.   anagram.
*>author.       Lars Nordenstrom.

 environment division.
 input-output section.
 file-control.
     select anagram-in assign to 'datain' line sequential.

 data division.
 file section.

 fd anagram-in.
 01 row     pic x(80).

 working-storage section.

 01 qsort-related.
     05 q-sp               pic   9(04) value 1.
     05 q-stack                        occurs 100.
         08 q-first        pic   9(06).
         08 q-last         pic   9(06).
     05 q-sp-max           pic   9(06) comp value 0.
     05 q-p0               pic   9(06) comp.
     05 q-p1               pic   9(06) comp.
     05 q-p2               pic   9(06) comp.
     05 q-p3               pic   9(06) comp.

 01 af-data.
     05 txt                pic x(256000).
     05 txt-len            pic     9(06) comp value 0.
     05 word                                  occurs 256002.
          08 sort-key.
              11 unified   pic     x(12)      value spaces.
              11 offset    pic     9(06)      value 0.
          08 valu          pic     x(12)      value spaces.
     05 nwords              pic    9(06) comp.
*> The last two entries are used as temporaries by qsort.
*> Do not put data in them

     05 accepted           pic   9(06) comp occurs 30.
     05 naccepted          pic  99     comp.
     05 data-in            pic   x(80).
     05 MINIMUM            pic   9     comp value 3.
     05 len                pic  99     comp.

 01 globals.
     05 numfmt3            pic   z(2)9.
     05 numfmt6            pic   z(5)9.
     05 numfmt9            pic   z(8)9.

 01 debug.
     05 comparisons        pic   9(09) comp.

 01 temporaries.
     05 general-use.
         08 i              pic   9(06) comp.
         08 j              pic   9(06) comp.
         08 k              pic   9(06) comp.
         08 m              pic   9(06) comp.

     05 chr-to-ix-local.
         08 letters pic x(26) value 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'.
         08 ix             pic      99 comp.
         08 ix1            pic      99 comp.
         08 ix2            pic      99 comp.
         08 chr            pic       X.

     05 print-word-locals.
         08 w-ptr          pic   9(06) comp.

     05 af-new-len-locals.
         08 l-count        pic      99 comp occurs 26.
         08 u-put          pic      99 comp.
         08 i-last         pic   9(06) comp.
         08 u-tmp          pic   x(12).

     05 af-add-candidate-locals.
         08 cand           pic   9(06) comp.
         08 o1             pic   9(06) comp.
         08 o2             pic   9(06) comp.
         08 low_offset     pic   9(06) comp.

     05 af-add-range-locals.
        08 r-begin         pic   9(06) comp.
        08 r-end           pic   9(06) comp.

     05 read-infile-locals.
          08 done-in       pic       9 comp value 0.

 procedure division.
     perform main
     stop run
     .

*>-------------------------------------------------------- Start of qsort
 q-swap.
     move word (q-p1)   to word (256001)
     move word (q-p2)   to word (q-p1)
     move word (256001) to word (q-p2)
     .

 q-bubble.
     move q-last(q-sp) to m
     perform varying q-p1 from q-first(q-sp) by 1 until q-p1 >= m
       compute q-p2 = q-p1
       compute i    = q-p1 + 1
       perform varying i from i by 1 until i > m
         add 1 to comparisons
         if sort-key(i) < sort-key(q-p2)
           move i to q-p2
         end-if
       end-perform
       if q-p2 > q-p1 perform q-swap end-if
     end-perform
     .

 q-pivot.
     add 3 to comparisons
     compute q-p1 = q-first(q-sp)
     compute q-p2 = (q-last(q-sp) + q-first(q-sp)) / 2
     if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if

     compute q-p2 = q-last(q-sp)
     if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if

     compute q-p1 = (q-last(q-sp) + q-first(q-sp)) / 2
     if (sort-key(q-p1) > sort-key(q-p2)) perform q-swap end-if
*> At this point first element < pivot < last element, so each new slice
*> will have at least one member. Guarantees that sorting will terminates

     move sort-key(q-p1) to sort-key(256002)
     compute q-p1 = q-first(q-sp) + 1
     add -1 to q-p2
     perform until q-p1 > q-p2
         perform until q-p1>q-p2
           or sort-key(q-p1)>sort-key(256002)
             add 1 to comparisons
             add +1 to q-p1
         end-perform
         if q-p1 > q-p2 add -1 to comparisons end-if
         perform until q-p1>q-p2
           or sort-key(q-p2)<sort-key(256002)
             add 1 to comparisons
             add -1 to q-p2
         end-perform
         if q-p1 > q-p2 add -1 to comparisons end-if
         if q-p1 < q-p2
             perform q-swap
         end-if
     end-perform

     add -1 to q-p1
     add +1 to q-p2
     .

 q-qsort.
     perform q-pivot
*> Must save range now, since q-sp changes unpredictably
     move  q-first(q-sp) to q-p0
     move  q-last (q-sp) to q-p3
     add -1 to q-sp
     if q-p1 > q-p0
         add 1 to q-sp
         move q-p0 to q-first(q-sp)
         move q-p1 to q-last (q-sp)
     end-if
     if q-p3 > q-p2
         add 1 to q-sp
         move q-p2 to q-first(q-sp)
         move q-p3 to q-last (q-sp)
     end-if
*> We want short elements at the top of the stack
*> Swap top elements if a) the do not overlap, and b) the top
*> element is larger than the one below
     if (q-sp > 1)
         move q-first(q-sp - 1) to q-p0
         move q-last (q-sp - 1) to q-p1
         move q-first(q-sp    ) to q-p2
         move q-last (q-sp    ) to q-p3
         if (q-p1 < q-p2 or q-p0 > q-p3)
             if (q-p1 - q-p0 < q-p3 - q-p2)
                 move q-stack(q-sp) to q-stack(q-sp - 1)
                 move q-p0          to q-first(q-sp    )
                 move q-p1          to q-last (q-sp    )
             end-if
         end-if
     end-if
     compute q-sp-max = function max(q-sp, q-sp-max)
     .

 q-print-stack.
     display ' '
     perform varying i from 1 by 1 until i > q-sp
         compute m = q-last(i)  -  q-first(i)
         display i " " q-first(i) "-" q-last(i) ":" m no advancing
         perform varying j from q-first(i) by 1 until j>q-last(i)
*>           display " [" trim(sort-key(j)(1:4)) "]" no advancing
             display " [" sort-key(j)(1:4) "]" no advancing
         end-perform
         display ' '
     end-perform
     .

 q-sort-main.
     perform until q-sp = 0
*>       perform q-print-stack
         if q-last(q-sp) - q-first(q-sp) < 5
             perform q-bubble
             add -1 to q-sp
         else
             perform q-qsort
         end-if
     end-perform
     .
*>--------------------------------------------------------- Miscellaneous
*>
*> Expect a letter ('A'..'Z') in chr
*> Return an index (1..26) in ix
*>
*> (Using ORD() is not portable)
*>
*> chr-to-ix(chr) => ix

 chr-to-ix.
     move 1 to  ix1
     move 26 to ix2
     perform until 1 < 0
         compute ix = ( ix1 + ix2 ) / 2
         evaluate true
             when chr > letters(ix:1) compute ix1 = ix + 1
             when chr < letters(ix:1) compute ix2 = ix - 1
             when true                next sentence
         end-evaluate
     end-perform
     .

*> print-word(w-ptr)
 print-word.
     display w-ptr ": "
             offset (w-ptr) " "
             valu   (w-ptr) " "
             unified(w-ptr)
     .

*> print-words(1, nwords) => (w-ptr)
 print-words.
      perform
          print-word
      varying w-ptr from 1 by 1 until w-ptr >= nwords
      .

*>----------------------------------------------------------- Anagram Finder
 af-new-len.
     move 1 to nwords
     perform varying i from 1 by 1 until i > 26
         move 0 to l-count(i)
     end-perform
*> Prime the letter counters with the first len-1 characters
     perform varying i from 1 by 1 until i >= len
         move txt(i:1) to chr
         perform chr-to-ix
         add 1 to l-count(ix)
     end-perform
     compute nwords = 1
     compute i-last = txt-len - len + 1
     perform varying i from 1 by 1 until i > i-last
*> Add incoming letter to the letter counters
         move txt(i+len - 1:1) to chr
         perform chr-to-ix
         add +1 to l-count(ix)
*> Build unified from letter counters
         move 1 to u-put
         move spaces to u-tmp
         perform varying j from 1 by 1 until j > 26
             perform varying k from 1 by 1 until k > l-count(j)
                 move letters(j:1) to u-tmp(u-put:1)
                 add 1 to u-put
             end-perform
         end-perform
*> Remove outgoing letter from the letter counters
         move txt(i:1) to chr
         perform chr-to-ix
         add -1 to l-count(ix)
*> Initialize a new object
         move i          to offset (nwords)
         move txt(i:len) to valu   (nwords)
         move u-tmp      to unified(nwords)
         add 1 to nwords
     end-perform
     .

 af-print-one-result.
     compute numfmt3 = naccepted
     display numfmt3 ':' with no advancing
     perform varying k from 1 by 1 until k > naccepted
         compute numfmt6 = offset(accepted(k)) - 1
         display ' ' numfmt6 ':' valu(accepted(k)) no advancing
     end-perform
     display ' '
     .

 af-add-candidate.
     if offset(cand) < low_offset next sentence end-if
     move offset(cand) to o1
     perform varying i from 1 by 1 until i > naccepted
         move offset(accepted(i)) to o2
         evaluate true
             when valu(accepted(i)) = valu(cand) move 999 to i
             when o1 > o2 and o1 < o2 + len      move 999 to i
         end-evaluate
     end-perform
     if i < 999
         add 1 to naccepted
         move cand to accepted(naccepted)
         compute low_offset = offset(cand) + len
     end-if
     .

 af-add-range.
     compute m = r-end - r-begin
     if (r-end - r-begin) >= MINIMUM
         move 0 to naccepted
         move 0 to low_offset
         perform varying cand from r-begin by 1 until cand>=r-end
             perform af-add-candidate
         end-perform
         if (naccepted >= MINIMUM)
             perform af-print-one-result
     end-if
     .

 af-find-all.
     move    1 to q-sp
     compute q-first(q-sp) = 1
     compute q-last(q-sp)  = nwords - 1
     perform q-sort-main

     move 1 to r-begin
     move 2 to i
     perform varying r-end from i by 1 until r-end >= nwords
         if unified(r-end) <> unified(r-begin)
             perform af-add-range
             move r-end to r-begin
         end-if
     end-perform
     perform af-add-range
     .
*>----------------------------------------------------------- main program
*> add-letters-to-txt(*data-in, *txt, *txt-len)
 add-letters-to-txt.
     move function upper-case(data-in) to data-in.
     perform varying ix from 1 by 1 until ix > 80
         if data-in(ix:1) is alphabetic
             if data-in(ix:1) is not = ' '
                 add 1 to txt-len
                 move data-in(ix:1) to txt(txt-len:1)
             end-if
         end-if
     end-perform
     .

*> read-infile(anagram-in) => (data-in, txt, txt-len)
 read-infile.
     move 0 to txt-len
     move 0 to done-in
     perform until 1 = done-in
         move spaces to data-in
         read anagram-in into data-in
              at end
                  move 1 to done-in
              not at end
                  perform add-letters-to-txt
         end-read
     end-perform
     .

 main.
     open input  anagram-in
     perform  read-infile
     close    anagram-in
     perform varying len from 4 by 1 until len > 10
         if len > 4 display ' ' end-if
         move len to numfmt3
         display 'len = ' numfmt3
         perform af-new-len
         perform af-find-all
     end-perform
     move txt-len to numfmt6
     display 'string length = ' numfmt6 no advancing
     move comparisons to numfmt9
     display ', comparisons = ' numfmt9
     .

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.