Anagram finder in Fortran 77 – Lars Nordenström's blog

This is the Fortran 77 implementation of the anagram program. The problem and the solution is described in the main anagram article. Do read that article first.

Don’t write new programs in Fortran 77. No-one else does. But by all means, do write new programs in later versions of Fortran, especially if you are into high-performance computing, and area where Fortran shines.

This implementation is meant to follow the Fortran 77 standard. Since I do not have a true Fortran 77 compiler, I have used GNU gfortran with -std=legacy. But legacy is not Fortran 77. It does not accept ELSE IF, which is in the Fortran 77 standard, but it does accept other things that are not in the standard. To help me keep to the straight and narrow, I have used my old F77 textbook and some Fortran 77 manuals.

DEC: PDP-11 FORTRAN-77 Language Reference Manual (July 1983)
SGI: FORTRAN 77 Language Reference Manual (1990)
IBM: VS FORTRAN Version 2, Language and Library Reference, Release 6 (Nov 1993)

The code should be compatible with all versions of Fortran up to but not including Fortran 2018. Fortran 2018 deletes (yes, that is the word they use) COMMON blocks from the language, making Fortran 2018 incompatible with Fortran 77.

I have taken some liberty with the standard. Neither of the following is in the standard, but implemented by many compilers:

The exclamation mark (!) as a comment character.
The INCLUDE statement.

There is no built-in sorting in Fortran. Sorting in this program is implemented using heapsort. In a language like Fortran 77, that does not provide recursion, heapsort is much easier to implement tham recursive algorithms like quicksort. Another advantage of heapsort is that the execution time is always n log n, while quicksort can be n squared for some input.

Commented source code

Shared constants

      integer SHORTEST, LONGEST, TEXT_SIZE, ACC_SIZE, ACC_MINIMUM, A, Z

      parameter (SHORTEST    =        3  ) ! At least this many characters in an anagram
      parameter (LONGEST     =       10  ) ! At most     - " -
      parameter (TEXT_SIZE   =  500*512  ) ! Maximum size of the letter string
      parameter (ACC_SIZE    =       50  ) ! Max # of accepted entries.
      parameter (ACC_MINIMUM =        3  ) ! Fewer than this does not count
      parameter (A           = ichar('A')) ! Character code for 'A'
      parameter (Z           = ichar('Z')) ! Character code for 'Z'

Shared variables

text holds the uppercased text containing only letters. unified holds the array of unified strings. unified(n) is the unified version of the string of length len starting at text(n) words holds indices into unified. These indices are also indices into text. unified holds the indices for accepted candidates with the same unified value, and naccepted counts them. naccepted is reset to zero every time a new unified value is found. comparisons counts the number of calls to compare_words.

All shared variables are put in a single named COMMON block. Fortran 77 has no other way of sharing variables, except by passing them as parameters, which quickly becomes tedious.

      character*(TEXT_SIZE) text
      character*(LONGEST)   unified(1:TEXT_SIZE-SHORTEST+1  )
      integer               words  (1:TEXT_SIZE-SHORTEST+1+1)
      integer               len
      integer               nwords
      integer               accepted(1:ACC_SIZE)
      integer               naccepted
      integer               comparisons
      common /globals/words,unified,text,len,nwords,naccepted,accepted,
     *        comparisons

read_text(): Read the input file

Reads at most 1 million (effectively an infinite number of) lines from the standard input file. Discard all characters that are not letters. Convert lowercase letters to uppercase. Return the number of characters read.

      integer function read_text()
      include "anagram-f77-common.i"
      character*(80) line
      integer        put, i, n, ch
      character*(256) map
C-- Assumes fixed format, 72 chars per line
      data map/'--------------------------------------------------------
     *--------ABCDEFGHIJKLMNOPQRSTUVWXYZ------ABCDEFGHIJKLMNOPQRSTUVWXYZ
     *------------------------------------------------------------------
     *------------------------------------------------------------------
     *--'/

      put = 1
      do i = 1, 1000000, 1
         read (*,100,end=999) line
 100     format(A80)
         do n = 1, 80, 1
            ch = ichar(line(n:n))
            if (map(ch:ch) .ne. '-') then
               if (put > TEXT_SIZE) stop 'Input text too large'
               text(put:put) = map(ch:ch)
               put = put+1
            endif
         enddo
      enddo
 999  read_text = put-1
      end

get_unified()

Compute the unified version of the string of len characters starting at text(pos) To avoid sorting, count the number of times each letter occurs. The statement lcount = 0 sets all elements to zero.

      function get_unified(pos)
      include "anagram-f77-common.i"
      character*(LONGEST) get_unified
      integer             pos
      integer             lcount(A:Z), n, put, tmp

!!! Count each letter
      lcount  = 0
      do n = pos, pos+len-1
         tmp = ichar(text(n:n))
         lcount(tmp) = lcount(tmp) + 1
      enddo
!!! Build key from letter counts
      put = 1
      get_unified = ''
      do n = A, Z
         do tmp = 1, lcount(n)
            get_unified(put:put) = char(n)
            put = put + 1
         enddo
      enddo
      end

compare_words()

First compare the unified versions of the strings. If they are equal, then compare their positions in the text.

      integer function compare_words(i1, i2)
      include "anagram-f77-common.i"
      integer        i1, i2

      integer pos1, pos2
      pos1 = words(i1)
      pos2 = words(i2)

      comparisons = comparisons + 1
      compare_words = pos1 - pos2
      if (unified(pos1) .gt. unified(pos2)) then
         compare_words = +1
      else
         if (unified(pos1) .lt. unified(pos2)) then
            compare_words = -1
         endif
      endif
      end

function add_to_heap()

The subroutine is called with the size of the heap before insertion as its first parameter, and a “pointer” (in the form of an index into the array unified as it second argument. The new value is placed at the new bootom of the heap and then bubbled up. Unnecessary word swap are avoided by not actually moving the new word until its final position is known.

      subroutine add_to_heap(size, value)
      include "anagram-f77-common.i"
      integer size, value
      integer here, compare_words, cmp

      words(size+1) = value
      here = size
 100  if (here .gt. 1) then    ! while (here > 1)
         cmp = compare_words(size+1, here/2)
         if (cmp .gt. 0) then
            words(here) = words(here/2)
            here        = here / 2
            goto 100
         endif
      endif
      words(here) = words(size+1)
      end

subroutine heap_sort()

The heap is already a heap, so this is the second phase of a heap sort. Save the value at the top of the heap in a safe place, w In thought, but not in deed, Move the last element to the top of the heap. Then, still in thought, let it trickle down until its rightful place has been found. Values bubble up to leave space for the element. Then, in statement 199, move, in deed, the last element to its rightful place. Finally, in the next statement, move the element that was at the top to the head of the sorted list.

Repeat until the heap has size 1.

      subroutine heap_sort()
      include "anagram-f77-common.i"
      integer c12, empty, child, i, j, w, compare_words
      integer hsize
      do i = nwords, 2, -1
         w     = words(1)
         hsize = i - 1
         empty = 1
         do j = 1, 10000000
            if (2*empty .gt. hsize) goto 199
            c12 = compare_words(i, 2*empty  )
            if (2*empty .eq. hsize) then ! A single (left) child
               if (c12 .lt. 0) words(empty) = words(2*empty)
               goto 199         ! F77 does not have the EXIT statement
            else
               if (c12 .gt. 0) then
                  if (compare_words(i, 2*empty+1) .gt. 0) goto 199 ! Two children, both lesser
               endif
               child = 2*empty
               if (compare_words(child, child+1) .lt. 0) child = child+1 ! Right child is greater
               words(empty) = words(child)
               empty        = child
            endif
         enddo
 199     words(empty) = words(i)
         words(i) = w
      enddo
      end

subroutine add_candidate()

Add the candidate pointed to by the integer ptr (an index), but only if it does not overlap any existing candidates, and only if it is not a duplicate.

The candidate is an index into text and at the same time an index into unified. Candidates with the same unified value are added in increasing index order, so the overlap test need only look at the most recently accepted candidate.

      subroutine add_candidate(ptr)
      include "anagram-f77-common.i"
      integer ptr,i, p2
      if (naccepted .gt. 0) then
         if (ptr .le. accepted(naccepted)+(len-1)) then
            return
         endif
      endif
      do i = 1, naccepted
         p2 = accepted(i)
         if (text(ptr:ptr+len-1) .eq. text(p2:p2+len-1)) then
            return
         endif
      enddo
      naccepted = naccepted + 1
      accepted(naccepted) = ptr
      end

subroutine print_result()

      subroutine print_result()
      include "anagram-f77-common.i"
      integer i, ptr
      write(*,100, advance='no') naccepted
 100  format(I3, ': ')
      do i = 1, naccepted
         ptr = accepted(i)
         write(*,110, advance='no') ptr-1, text(ptr:ptr+len-1)
 110     format(I5, ': ', A)
      enddo
      write(*,*)
      end

subroutine find_all(text_len)

Find all anagrams of length len, already set by the main program.text_len is the number of characters in text.For each possible word, compute the unified value and add it to the heap words, then sort the heap. The words in words are now sorted by, and therefore grouped by, their unified value, and words with equal unified value are sorted by their index into text. For each word, call add_candidate. If the current word is the last in a suite of words with the same unifoed value 1) print the result if the number af accepted words reaches the threshold MINIMUM, and 2) clear the accepted list by setting naccepted to zero.

Note the use of a sentinel to simplify the testing for “last in a suite”.

      subroutine find_all(text_len)
      include "anagram-f77-common.i"

      integer             i
      integer             text_len
      character*(LONGEST) get_unified

      nwords = text_len-len+1
      do i=1, nwords
!        call get_unified(i, unified (i))
         unified(i) = get_unified(i)
         call add_to_heap(i, i)
      enddo

      call heap_sort()
      unified(nwords+1) = 'ZZZZZZZZZZ'
      words(nwords+1) = nwords + 1

      do i=1, nwords
         call add_candidate(words(i))
         if (unified(words(i)) .ne. unified(words(i+1))) then
            if (naccepted .ge. ACC_MINIMUM) then
               call print_result()
            endif
            naccepted = 0
         endif
      enddo
      end

program main

Read the input text into text, then call find_all for all lengths len in the range from to to inclusive.

F77 does not have routines for accessing the command line, so the range is set by a DATA statement.

      program main
      include "anagram-f77-common.i"

      integer  text_len
      integer  read_text
      integer  from, to

      data from,to/14,7/

      comparisons = 0
      if (to   .lt. from    ) stop '*** "from" > "to"'
      if (to   .gt. LONGEST ) stop '*** "to" is too large'
      if (from .lt. SHORTEST) stop '*** "from" is too small'

      text_len = read_text()
      do len = from, to
         if (len .gt. from) write(*,*)
         write(*,100) 'len = ', len
 100     format(A, I0)
         call find_all(text_len)
      enddo
      write(*, 110) text_len, comparisons
 110  format("string length = ", I5, ", comparisons = ", I5)
      end

Full source code

Parameters and `COMMON` block (`anagram-f77-common.i`)

!!!---* -*- mode: fortran -*-
!------------------------------------------------------------------
!!!--- Compile-time constants
      integer SHORTEST, LONGEST, TEXT_SIZE, ACC_SIZE, ACC_MINIMUM, A, Z

      parameter (SHORTEST    =        3  ) ! At least this many characters in an anagram
      parameter (LONGEST     =       10  ) ! At most     - " -
      parameter (TEXT_SIZE   =  500*512  ) ! Maximum size of the letter string
      parameter (ACC_SIZE    =       50  ) ! Max # of accepted entries.
      parameter (ACC_MINIMUM =        3  ) ! Fewer than this does not count
      parameter (A           = ichar('A')) ! Character code for 'A'
      parameter (Z           = ichar('Z')) ! Character code for 'Z'

!!!----------------------------------------------------------------
!!!--- Global variables
      character*(TEXT_SIZE) text
      character*(LONGEST)   unified(1:TEXT_SIZE-SHORTEST+1  )
      integer               words  (1:TEXT_SIZE-SHORTEST+1+1)
      integer               len
      integer               nwords
      integer               accepted(1:ACC_SIZE)
      integer               naccepted
      integer               comparisons
      common /globals/words,unified,text,len,nwords,naccepted,accepted,
     *        comparisons

The program (`anagram-f77.f`)

!!!----------------------------------------------------------- read_text
      integer function read_text()
      include "anagram-f77-common.i"
      character*(80) line
      integer        put, i, n, ch
      character*(256) map
C-- Assumes fixed format, 72 chars per line
      data map/'--------------------------------------------------------
     *--------ABCDEFGHIJKLMNOPQRSTUVWXYZ------ABCDEFGHIJKLMNOPQRSTUVWXYZ
     *------------------------------------------------------------------
     *------------------------------------------------------------------
     *--'/

      put = 1
      do i = 1, 1000000, 1
         read (*,100,end=999) line
 100     format(A80)
         do n = 1, 80, 1
            ch = ichar(line(n:n))
            if (map(ch:ch) .ne. '-') then
               if (put > TEXT_SIZE) stop 'Input text too large'
               text(put:put) = map(ch:ch)
               put = put+1
            endif
         enddo
      enddo
 999  read_text = put-1
      end

!!!--------------------------------------------------------- get_unified
      function get_unified(pos)
      include "anagram-f77-common.i"
      character*(LONGEST) get_unified
      integer             pos
      integer             lcount(A:Z), n, put, tmp

!!! Count each letter
      lcount  = 0
      do n = pos, pos+len-1
         tmp = ichar(text(n:n))
         lcount(tmp) = lcount(tmp) + 1
      enddo
!!! Build key from letter counts
      put = 1
      get_unified = ''
      do n = A, Z
         do tmp = 1, lcount(n)
            get_unified(put:put) = char(n)
            put = put + 1
         enddo
      enddo
      end

!!!------------------------------------------------------- compare_words
      integer function compare_words(i1, i2)
      include "anagram-f77-common.i"
      integer        i1, i2

      integer pos1, pos2
      pos1 = words(i1)
      pos2 = words(i2)

      comparisons = comparisons + 1
      compare_words = pos1 - pos2
      if (unified(pos1) .gt. unified(pos2)) then
         compare_words = +1
      else
         if (unified(pos1) .lt. unified(pos2)) then
            compare_words = -1
         endif
      endif
      end

!!!--------------------------------------------------------- add_to_heap
      subroutine add_to_heap(size, value)
      include "anagram-f77-common.i"
      integer size, value
      integer here, compare_words, cmp

      words(size+1) = value
      here = size
 100  if (here .gt. 1) then    ! while (here > 1)
         cmp = compare_words(size+1, here/2)
         if (cmp .gt. 0) then
            words(here) = words(here/2)
            here        = here / 2
            goto 100
         endif
      endif
      words(here) = words(size+1)
      end

!!!----------------------------------------------------------- heap_sort
      subroutine heap_sort()
      include "anagram-f77-common.i"
      integer c12, empty, child, i, j, w, compare_words
      integer hsize
      do i = nwords, 2, -1
         w     = words(1)
         hsize = i - 1
         empty = 1
         do j = 1, 10000000
            if (2*empty .gt. hsize) goto 199
            c12 = compare_words(i, 2*empty  )
            if (2*empty .eq. hsize) then ! A single (left) child
               if (c12 .lt. 0) words(empty) = words(2*empty)
               goto 199         ! F77 does not have the EXIT statement
            else
               if (c12 .gt. 0) then
                  if (compare_words(i, 2*empty+1) .gt. 0) goto 199 ! Two children, both lesser
               endif
               child = 2*empty
               if (compare_words(child, child+1) .lt. 0) child = child+1 ! Right child is greater
               words(empty) = words(child)
               empty        = child
            endif
         enddo
 199     words(empty) = words(i)
         words(i) = w
      enddo
      end
!!!------------------------------------------------------- add_candidate
      subroutine add_candidate(ptr)
      include "anagram-f77-common.i"
      integer ptr,i, p2
      if (naccepted .gt. 0) then
         if (ptr .le. accepted(naccepted)+(len-1)) then
            return
         endif
      endif
      do i = 1, naccepted
         p2 = accepted(i)
         if (text(ptr:ptr+len-1) .eq. text(p2:p2+len-1)) then
            return
         endif
      enddo
      naccepted = naccepted + 1
      accepted(naccepted) = ptr
      end

!!!-------------------------------------------------------- print_result
      subroutine print_result()
      include "anagram-f77-common.i"
      integer i, ptr
      write(*,100, advance='no') naccepted
 100  format(I3, ': ')
      do i = 1, naccepted
         ptr = accepted(i)
         write(*,110, advance='no') ptr-1, text(ptr:ptr+len-1)
 110     format(I5, ': ', A)
      enddo
      write(*,*)
      end

!!!------------------------------------------------------------ find_all
      subroutine find_all(text_len)
      include "anagram-f77-common.i"

      integer             i
      integer             text_len
      character*(LONGEST) get_unified

      nwords = text_len-len+1
      do i=1, nwords
!        call get_unified(i, unified (i))
         unified(i) = get_unified(i)
         call add_to_heap(i, i)
      enddo

      call heap_sort()
      unified(nwords+1) = 'ZZZZZZZZZZ'
      words(nwords+1) = nwords + 1

      do i=1, nwords
         call add_candidate(words(i))
         if (unified(words(i)) .ne. unified(words(i+1))) then
            if (naccepted .ge. ACC_MINIMUM) then
               call print_result()
            endif
            naccepted = 0
         endif
      enddo
      end

!!!----------------------------------------------------------------main
      program main
      include "anagram-f77-common.i"

      integer  text_len
      integer  read_text
      integer  from, to

      data from,to/14,7/

      comparisons = 0
      if (to   .lt. from    ) stop '*** "from" > "to"'
      if (to   .gt. LONGEST ) stop '*** "to" is too large'
      if (from .lt. SHORTEST) stop '*** "from" is too small'

      text_len = read_text()
      do len = from, to
         if (len .gt. from) write(*,*)
         write(*,100) 'len = ', len
 100     format(A, I0)
         call find_all(text_len)
      enddo
      write(*, 110) text_len, comparisons
 110  format("string length = ", I5, ", comparisons = ", I5)
      end

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.