Quicksort in COBOL on z/OS

It is tricky to write a small sample application that is short, non-trivial, portable and easy to run.

The program, qsort01, in this article implements the quicksort algorithm. It requires just a single input file, and writes a single output file. It uses no other system resources. No need for DB2, CICS or anything else.

This article assumes that the reader is familiar with the quicksort algorithm.

The input data looks like this:

           15297 /usr/include/EGL/egl.h
           47792 /usr/include/EGL/eglext.h
            2473 /usr/include/EGL/eglextchromium.h
            3820 /usr/include/EGL/eglmesaext.h

There are eight right-justified digits, a space, and some other data, which is just passed through.

qsort01 sorts the input file according to the numeric field, and writes the result to an output file

The input, at most 9999 lines, is read into an array, and sorted in memory.

This is just a simple example. A industrial-strength sorting program would be a lot larger. For brevity, there is almost no error checking.Also, swapping whole records is inefficient; A more efficient implementation would sort indices, not whole records.

The steps that qsort01 performs are:

  • Read the input file, at most 9999 lines, into a table.
  • Sort the table using a mix of Quicksort and Bubblesort.
  • Check that the table is indeed sorted.
  • Write the sorted data to an output file
  • Display some statistics
  • Terminate

The main difficulty in implementing Quicksort in Cobol is that Quicksort is recursive, but Cobol does not support recursion.This is solved by programmmatically implementing a stack. Each element of the stack contains the first and last index of a slice of the input data.

To illustrate, I will sort 12 two-digit numbers. Remember that quicksort divides the input data into smaller and smaller slices. When the slices are less than five elements, the slice is sorted using bubble sort.

The stack is initialized to a single element (two numbers) that specifies the whole slice that the qsort should work on. Below, the stack is on the left, with the data to the right

  1:  1 12              66  68  51  69  49  55  31  91  74  47  49  16
  2:                    |                                            |
  3:                    +----------------- 1..12 --------------------+

Since there are more than four elements in the slice, qsort pivots around an element, 49, picked at random It then replaces the topmost slice with the two new slices.

  1:  1  4              31  47  49  16  68  66  53  69  51  55  91  74
  2:  5 12              |            |  |                            |
  3:                    +--- 1..4 ---+  +--------- 5..12 ------------+

The topmost slice again contains more than four elements. After pivoting around 49 we have:

  1:  1  4              31  47  49  16  68  66  53  51  55  69  91  74
  2:  4  9              |            |  |                |  |        |
  3: 10 12              +--- 1..4 ---+  +----- 5..9 -----+  +-10..12-+

The topmost slice is only three elements, so it is bubblesorted and then popped from the stack. This slice is now sorted and in the right place, so Quicksort qill not return to it

  1:  1  4              31  47  49  16  68  66  53  51  55  69  74  91
  2:  5  9              |            |  |                |
  3:                    +--- 1..4 ---+  +----- 5..9 -----+

Once again the topmost slice contains more than four elements. After pivoting around 51 we have:

  1:  1  4              31  47  49  16  53  51  55  68  66  69  74  91
  2:  5  6              |            |  |    |  |        |
  3:  7  9              +--- 1..4 ---+  +5..6+  +--7-9---+

Topmost slice is `small’, so bubblesort and pop.

  1:  1  4              31  47  49  16  51  53  55  66  68  69  74  91
  2:  5  6              |            |  |    |
  3:                    +--- 1..4 ---+  +5..6+

And again.

  1:  1  4              31  47  49  16  51  53  55  66  68  69  74  91
  2:                    |            |
  3:                    +--- 1..4 ---+

And after a final bubblesort the whole table is sorted

  1:                    16  31  47  49  51  53  55  66  68  69  74  91

Pivoting can result in one of the slices having fewer than two elements.If so, that slice is not pushed to the stack since it is already sorted

I will walk through the code, but some boiler-plate code at the start of the program has been left out.

The partitioned data set (PDS) ????.SRC.COB.DATA.FB80 is fixed block, 80 characters.

The code below might be slightly different from the downloadable code. Use the latter when trying out the program.

Start of the program.

       identification division.
       program-id.   qsort01.
       author.       Lars Nordenstrom.

       environment division.
       configuration section.
       input-output section.
       file-control.
           select unsorted-in assign to qsortin.
           select sorted-out  assign to qsortout.

       data division.
       file section.

       fd unsorted-in
            record contains 80 characters
            recording mode f.
       01 row                    pic x(80).

       fd sorted-out
              record contains 80 characters
              recording mode f.
       01 lp                     pic x(80).

First the table that holds the data. The last element, at index 10000, is used as a temporary when swapping, so only indices 1 through 9999 are available for data.

       01 data-to-be-sorted.
           05 qrecords occurs 10000.
               10 qkey           pic 9(8).
               10 filler         pic x.
               10 qvalue         pic x(71).

The explicit stack, used instead of recursion. Each pair from q-first and q-last holds the first and last index of a slice. q-sp-max keeps track of the maximum stack depth, and is printed immediately before terminaation.

       01 the-stack.
           05 q-sp               pic  9(4) value 1.
           05 q-first occurs 100 pic  9(4).
           05 q-last  occurs 100 pic  9(4).
           05 q-sp-max           pic  9(4) value 0.

The format for the output file.

       01 formatted.
           05 num                pic z(2)9.
           05 report-line.
               10 report-index   pic bz(4)9.
               10 report-key     pic bz(8)9.
               10 filler         pic x value ' '.
               10 report-value   pic x(71).

idx is the index of the last element of the qrecords table. p1 and p2 are used independently by bubble and qsort; importantly, they are also the implicit arguments to swap.

       01 globals.
           05 idx                pic  9(9) comp value 0.
           05 p1                 pic  9(9) comp.
           05 p2                 pic  9(9) comp.

These are mostly local to a single paragraph.

       01 locals.
           05 i                  pic  9(9) comp.
           05 ffirst             pic  9(9) comp.
           05 llast              pic  9(9) comp.
           05 pivot              pic  9(9) comp.
           05 max1               pic  9(9) comp.
           05 min2               pic  9(9) comp.
           05 done-in            pic    9  comp value 0.

The main section is at the end of the source file.

       procedure division.
           perform main
           stop run
           .

Swap the elements at indices p1 and p2. Note that element 10000 in qrecords is used as a temporary. This is the reason why the program can only handle 9999 elements.

       swap.
           move qrecords (p1)    to qrecords (10000)
           move qrecords (p2)    to qrecords (p1)
           move qrecords (10000) to qrecords (p2)
           .

Bubble sort is used on slices of two to four elements. It operates on the topmost slice on the stack. When it is done, it pops the slice from the stack.

       bubble.
           move q-last (q-sp) to llast
           perform varying p1 from q-first (q-sp) by 1 until p1 >= llast
             compute p2 = p1
             compute i    = p1 + 1
             perform varying i from i by 1 until i > llast
               if qkey (i) < qkey (p2)
                 move i to p2
               end-if
             end-perform
             if p2 > p1 perform swap end-if
           end-perform
           add -1 to q-sp
           .

After pivoting the initial range should have been divided into two ranges, the lower range and the upper range, where all the records in the lower range are less than the records in the upper range.

The pivoting part of my implementation does not use an explicit pivot element. Instead it uses a middle range, initially containing all the elements.

The variables max1 holds the largest value yet included in the lower range, initially 0 when there are no elements in the range. Similarly, min2 holds the smallest value in the upper range.

In each iteration, if (p1) > (p2), then (p1) and (p2) are swapped. Next, one of several cases apply:

  • If (p1) <= max1 then (p1) is added to the lower range by increasing p1 by one.
  • Otherwise, if (p2) >= min2 then (p2) is added to the upper range by decreasing p2 by one.
  • Otherwise, if the middle range contains only a single value, then max1 < (p1) < min2, and (p1) can be added to either range.
  • Otherwise, max1 < (p1) < (p2) < min2, and (p1) and (p2) can be added to the lower and upper ranges respectively, and max1 and min2 updated.

It is not possible to add (p1) to the lower range and (p2) to the upper range immediately after the swap. If e.g. (p2) < max1 then adding (p2) to the upper range break the invariant, since there would then be a record in the upper range that is less than a record in the lower range.

       pivot.
           move q-first (q-sp) to p1
           move q-last  (q-sp) to p2
           move         0      to max1
           move 999999999      to min2

           perform until p1 > p2
               if p1 < p2 and qkey (p1) > qkey (p2)
                   perform swap
               end-if
               evaluate true
                   when qkey (p1) < max1
                       add +1 to p1
                   when qkey (p2) > min2
                       add -1 to p2
                   when p1 = p2
                       add +1 to p1
                   when other
                       if qkey (p1) > max1 move qkey (p1) to max1 end-if
                       if qkey (p2) < min2 move qkey (p2) to min2 end-if
                       add +1 to p1
                       add -1 to p2
               end-evaluate
           end-perform

           add -1 to p1
           add +1 to p2
           .

qsort itself is simple. It first does the pivoting. Then it pops its own range from the stack. Finally it pushes each of the two ranges, assuming they are each contain at least two elements.

       qsort.
           perform pivot.
           move  q-first (q-sp) to ffirst
           move  q-last  (q-sp) to llast
           add -1 to q-sp
           if p1 > ffirst
               add 1 to q-sp
               move ffirst to q-first (q-sp)
               move p1     to q-last  (q-sp)
           end-if
           if llast > p2
               add 1 to q-sp
               move p2    to q-first (q-sp)
               move llast to q-last  (q-sp)
           end-if
           compute q-sp-max = function max(q-sp, q-sp-max)
           .

If the current slice is four items or fewer, then bubble sort, otherwise qsort

       sort-main.
           if q-last (q-sp) - q-first (q-sp) < 5
               perform bubble
           else
               perform qsort
           end-if
           .

Verify that the table is actually sorted.

       verify-result.
           perform varying i from 2 by 1 until i > idx
             if qkey (i) < qkey (i - 1)
               display 'unsorted: ' i
             end-if
           end-perform
           .

Read the input file. Make sure not to read more than 9999 lines. For debugging, optionally terminate reading after fewer records.

       read-input.
           perform until done-in > 0
             read unsorted-in
                 at end
                     move 1 to done-in
                 not at end
                     add 1 to idx
                     move row to qrecords (idx)
      *              if idx     >=   100 move 1 to done-in end-if
                     if idx + 1 >= 10000 move 1 to done-in end-if
                 end-read
           end-perform
           .

Write the sorted table. Note that the output includes the table index, so the output format is different form the input format

       write-table.
           perform varying i from 1 by 1  until i > idx
             move             i  to report-index
             move qkey (i) to report-key
             move qvalue (i) to report-value
             write lp from report-line
           end-perform
           .

Paragraph main is called from the beginning of the procedure division.

       main.
           open input  unsorted-in
           open output sorted-out

           perform  read-input
           move     1   to q-first (q-sp)
           move     idx to q-last  (q-sp)
           perform  sort-main  until q-sp = 0
           perform  write-table
           perform  verify-result

           move     q-sp-max to num
           display 'maximum stack depth = ' num
           display 'idx = ' idx

           close    sorted-out
           close    unsorted-in
           .

Downloading the files

The individual files are avaiable for download. There are also archives, one .zip and one .tar.gz.

Compiling and running on z/OS with Enterprise Cobol

Sample JCL for compile, load and go.

        //ZEKE        JOB (),MSGCLASS=A,MSGLEVEL=(1,1),TIME=(0,10)
        //TEST        EXEC IGYWCLG,LNGPRFX=IGY410
        //COBOL.SYSIN DD DSN=&SYSUID..SRC.COB(QSORT01),DISP=SHR
        //*----------------------------------------------------------------
        //GO.QSORTIN  DD DSN=&SYSUID..SRC.COB.DATA.FB80(DAT9900),DISP=SHR
        //*----------------------------------------------------------------
        //GO.QSORTOUT DD DSN=&SYSUID..SRC.COB.DATA.FB80(QSORTOUT),DISP=SHR

Compiling and running on Linux with GNU Cobol

The source is almost identical for z/OS and GNU.The only difference is the we must code LINE SEQUENTIAL on the SELECT statement.

The below script compiles and runs the program. It also generates the fixed-block input file if it does not already exist.

        #!/bin/bash -e

        PROG=qsort01-gnu

        #
        # In GNU Cobol, The SELECT statement looks in the environment for
        # the "assinged-to" identifiers prefix by "DD_".
        # Similar to JCL DD statement.
        #
        export DD_qsortin=dat9900.txt
        export DD_qsortout=sorted.txt

        #
        # Build.  Tested with GNU Cobol 2.2.
        # (Using ".elf" for executables is a local convention.)
        #
        set -x
        rm -f *.elf
        cobc -std=cobol85 -x -o $PROG.elf $PROG.cob

        #
        # Run the program
        #
        ./$PROG.elf

Compiling and running on Windows with GNU Cobol

Untested, but should work. GNU Cobol is available on Windows.

Portability

The only difference between the z/OS and the GNU version is the use of LINE SEQUENTIAL.

  • Use the version without LINE SEQUENTIAL on z/OS when the files are located on the native file system, not the Herarchical File System (HFS).
  • Use the version with LINE SEQUENTIAL in all other cases, including on z/OS for files on HFS.

Bugs

Probably. There is no error checking, and no test for stack overflow. But this program is meant as a demonstration, and a lot of testing would clutter up the code.

One way to handle stack overflow is to randomize the data and start again. This could easily be done at the beginning of qsort. Since the most likely cause of the stack overflow is input data that is already (almost) sorted, randomizing input should produce data which quicksort can handle well.

One strange behaviour is that the z/OS output is different from the GNU output. Records with the same key, but different values, sometimes come out in different order. This is not strictly a bug, since the output is sorted. But why?

You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090

View source for the content of this page.