It is tricky to write a small sample application that is short, non-trivial, portable and easy to run.
The program, qsort01, in this article implements the quicksort algorithm. It requires just a single input file, and writes a single output file. It uses no other system resources. No need for DB2, CICS or anything else.
This article assumes that the reader is familiar with the quicksort algorithm.
The input data looks like this:
15297 /usr/include/EGL/egl.h 47792 /usr/include/EGL/eglext.h 2473 /usr/include/EGL/eglextchromium.h 3820 /usr/include/EGL/eglmesaext.h
There are eight right-justified digits, a space, and some other data, which is just passed through.
qsort01 sorts the input file according to the numeric field, and writes the result to an output file
The input, at most 9999 lines, is read into an array, and sorted in memory.
This is just a simple example. A industrial-strength sorting program would be a lot larger. For brevity, there is almost no error checking.Also, swapping whole records is inefficient; A more efficient implementation would sort indices, not whole records.
The steps that qsort01 performs are:
- Read the input file, at most 9999 lines, into a table.
- Sort the table using a mix of Quicksort and Bubblesort.
- Check that the table is indeed sorted.
- Write the sorted data to an output file
- Display some statistics
The main difficulty in implementing Quicksort in Cobol is that Quicksort is recursive, but Cobol does not support recursion.This is solved by programmmatically implementing a stack. Each element of the stack contains the first and last index of a slice of the input data.
To illustrate, I will sort 12 two-digit numbers. Remember that quicksort divides the input data into smaller and smaller slices. When the slices are less than five elements, the slice is sorted using bubble sort.
The stack is initialized to a single element (two numbers) that specifies the whole slice that the qsort should work on. Below, the stack is on the left, with the data to the right
1: 1 12 66 68 51 69 49 55 31 91 74 47 49 16 2: | | 3: +----------------- 1..12 --------------------+
Since there are more than four elements in the slice, qsort pivots around an element, 49, picked at random It then replaces the topmost slice with the two new slices.
1: 1 4 31 47 49 16 68 66 53 69 51 55 91 74 2: 5 12 | | | | 3: +--- 1..4 ---+ +--------- 5..12 ------------+
The topmost slice again contains more than four elements. After pivoting around 49 we have:
1: 1 4 31 47 49 16 68 66 53 51 55 69 91 74 2: 4 9 | | | | | | 3: 10 12 +--- 1..4 ---+ +----- 5..9 -----+ +-10..12-+
The topmost slice is only three elements, so it is bubblesorted and then popped from the stack. This slice is now sorted and in the right place, so Quicksort qill not return to it
1: 1 4 31 47 49 16 68 66 53 51 55 69 74 91 2: 5 9 | | | | 3: +--- 1..4 ---+ +----- 5..9 -----+
Once again the topmost slice contains more than four elements. After pivoting around 51 we have:
1: 1 4 31 47 49 16 53 51 55 68 66 69 74 91 2: 5 6 | | | | | | 3: 7 9 +--- 1..4 ---+ +5..6+ +--7-9---+
Topmost slice is `small’, so bubblesort and pop.
1: 1 4 31 47 49 16 51 53 55 66 68 69 74 91 2: 5 6 | | | | 3: +--- 1..4 ---+ +5..6+
1: 1 4 31 47 49 16 51 53 55 66 68 69 74 91 2: | | 3: +--- 1..4 ---+
And after a final bubblesort the whole table is sorted
1: 16 31 47 49 51 53 55 66 68 69 74 91
Pivoting can result in one of the slices having fewer than two elements.If so, that slice is not pushed to the stack since it is already sorted
I will walk through the code, but some boiler-plate code at the start of the program has been left out.
The partitioned data set (PDS) ????.SRC.COB.DATA.FB80 is fixed block, 80 characters.
The code below might be slightly different from the downloadable code. Use the latter when trying out the program.
Start of the program.
identification division. program-id. qsort01. author. Lars Nordenstrom. environment division. configuration section. input-output section. file-control. select unsorted-in assign to qsortin. select sorted-out assign to qsortout. data division. file section. fd unsorted-in record contains 80 characters recording mode f. 01 row pic x(80). fd sorted-out record contains 80 characters recording mode f. 01 lp pic x(80).
First the table that holds the data. The last element, at index 10000, is used as a temporary when swapping, so only indices 1 through 9999 are available for data.
01 data-to-be-sorted. 05 qrecords occurs 10000. 10 qkey pic 9(8). 10 filler pic x. 10 qvalue pic x(71).
The explicit stack, used instead of recursion. Each pair from q-first and q-last holds the first and last index of a slice. q-sp-max keeps track of the maximum stack depth, and is printed immediately before terminaation.
01 the-stack. 05 q-sp pic 9(4) value 1. 05 q-first occurs 100 pic 9(4). 05 q-last occurs 100 pic 9(4). 05 q-sp-max pic 9(4) value 0.
The format for the output file.
01 formatted. 05 num pic z(2)9. 05 report-line. 10 report-index pic bz(4)9. 10 report-key pic bz(8)9. 10 filler pic x value ' '. 10 report-value pic x(71).
idx is the index of the last element of the qrecords table. p1 and p2 are used independently by bubble and qsort; importantly, they are also the implicit arguments to swap.
01 globals. 05 idx pic 9(9) comp value 0. 05 p1 pic 9(9) comp. 05 p2 pic 9(9) comp.
These are mostly local to a single paragraph.
01 locals. 05 i pic 9(9) comp. 05 ffirst pic 9(9) comp. 05 llast pic 9(9) comp. 05 pivot pic 9(9) comp. 05 max1 pic 9(9) comp. 05 min2 pic 9(9) comp. 05 done-in pic 9 comp value 0.
The main section is at the end of the source file.
procedure division. perform main stop run .
Swap the elements at indices p1 and p2. Note that element 10000 in qrecords is used as a temporary. This is the reason why the program can only handle 9999 elements.
swap. move qrecords (p1) to qrecords (10000) move qrecords (p2) to qrecords (p1) move qrecords (10000) to qrecords (p2) .
Bubble sort is used on slices of two to four elements. It operates on the topmost slice on the stack. When it is done, it pops the slice from the stack.
bubble. move q-last (q-sp) to llast perform varying p1 from q-first (q-sp) by 1 until p1 >= llast compute p2 = p1 compute i = p1 + 1 perform varying i from i by 1 until i > llast if qkey (i) < qkey (p2) move i to p2 end-if end-perform if p2 > p1 perform swap end-if end-perform add -1 to q-sp .
After pivoting the initial range should have been divided into two ranges, the lower range and the upper range, where all the records in the lower range are less than the records in the upper range.
The pivoting part of my implementation does not use an explicit pivot element. Instead it uses a middle range, initially containing all the elements.
The variables max1 holds the largest value yet included in the lower range, initially 0 when there are no elements in the range. Similarly, min2 holds the smallest value in the upper range.
In each iteration, if (p1) > (p2), then (p1) and (p2) are swapped. Next, one of several cases apply:
- If (p1) <= max1 then (p1) is added to the lower range by increasing p1 by one.
- Otherwise, if (p2) >= min2 then (p2) is added to the upper range by decreasing p2 by one.
- Otherwise, if the middle range contains only a single value, then max1 < (p1) < min2, and (p1) can be added to either range.
- Otherwise, max1 < (p1) < (p2) < min2, and (p1) and (p2) can be added to the lower and upper ranges respectively, and max1 and min2 updated.
It is not possible to add (p1) to the lower range and (p2) to the upper range immediately after the swap. If e.g. (p2) < max1 then adding (p2) to the upper range break the invariant, since there would then be a record in the upper range that is less than a record in the lower range.
pivot. move q-first (q-sp) to p1 move q-last (q-sp) to p2 move 0 to max1 move 999999999 to min2 perform until p1 > p2 if p1 < p2 and qkey (p1) > qkey (p2) perform swap end-if evaluate true when qkey (p1) < max1 add +1 to p1 when qkey (p2) > min2 add -1 to p2 when p1 = p2 add +1 to p1 when other if qkey (p1) > max1 move qkey (p1) to max1 end-if if qkey (p2) < min2 move qkey (p2) to min2 end-if add +1 to p1 add -1 to p2 end-evaluate end-perform add -1 to p1 add +1 to p2 .
qsort itself is simple. It first does the pivoting. Then it pops its own range from the stack. Finally it pushes each of the two ranges, assuming they are each contain at least two elements.
qsort. perform pivot. move q-first (q-sp) to ffirst move q-last (q-sp) to llast add -1 to q-sp if p1 > ffirst add 1 to q-sp move ffirst to q-first (q-sp) move p1 to q-last (q-sp) end-if if llast > p2 add 1 to q-sp move p2 to q-first (q-sp) move llast to q-last (q-sp) end-if compute q-sp-max = function max(q-sp, q-sp-max) .
If the current slice is four items or fewer, then bubble sort, otherwise qsort
sort-main. if q-last (q-sp) - q-first (q-sp) < 5 perform bubble else perform qsort end-if .
Verify that the table is actually sorted.
verify-result. perform varying i from 2 by 1 until i > idx if qkey (i) < qkey (i - 1) display 'unsorted: ' i end-if end-perform .
Read the input file. Make sure not to read more than 9999 lines. For debugging, optionally terminate reading after fewer records.
read-input. perform until done-in > 0 read unsorted-in at end move 1 to done-in not at end add 1 to idx move row to qrecords (idx) * if idx >= 100 move 1 to done-in end-if if idx + 1 >= 10000 move 1 to done-in end-if end-read end-perform .
Write the sorted table. Note that the output includes the table index, so the output format is different form the input format
write-table. perform varying i from 1 by 1 until i > idx move i to report-index move qkey (i) to report-key move qvalue (i) to report-value write lp from report-line end-perform .
Paragraph main is called from the beginning of the procedure division.
main. open input unsorted-in open output sorted-out perform read-input move 1 to q-first (q-sp) move idx to q-last (q-sp) perform sort-main until q-sp = 0 perform write-table perform verify-result move q-sp-max to num display 'maximum stack depth = ' num display 'idx = ' idx close sorted-out close unsorted-in .
Downloading the files
The individual files are avaiable for download. There are also archives, one .zip and one .tar.gz.
Compiling and running on z/OS with Enterprise Cobol
Sample JCL for compile, load and go.
//ZEKE JOB (),MSGCLASS=A,MSGLEVEL=(1,1),TIME=(0,10) //TEST EXEC IGYWCLG,LNGPRFX=IGY410 //COBOL.SYSIN DD DSN=&SYSUID..SRC.COB(QSORT01),DISP=SHR //*---------------------------------------------------------------- //GO.QSORTIN DD DSN=&SYSUID..SRC.COB.DATA.FB80(DAT9900),DISP=SHR //*---------------------------------------------------------------- //GO.QSORTOUT DD DSN=&SYSUID..SRC.COB.DATA.FB80(QSORTOUT),DISP=SHR
Compiling and running on Linux with GNU Cobol
The source is almost identical for z/OS and GNU.The only difference is the we must code LINE SEQUENTIAL on the SELECT statement.
The below script compiles and runs the program. It also generates the fixed-block input file if it does not already exist.
#!/bin/bash -e PROG=qsort01-gnu # # In GNU Cobol, The SELECT statement looks in the environment for # the "assinged-to" identifiers prefix by "DD_". # Similar to JCL DD statement. # export DD_qsortin=dat9900.txt export DD_qsortout=sorted.txt # # Build. Tested with GNU Cobol 2.2. # (Using ".elf" for executables is a local convention.) # set -x rm -f *.elf cobc -std=cobol85 -x -o $PROG.elf $PROG.cob # # Run the program # ./$PROG.elf
Compiling and running on Windows with GNU Cobol
Untested, but should work. GNU Cobol is available on Windows.
The only difference between the z/OS and the GNU version is the use of LINE SEQUENTIAL.
- Use the version without LINE SEQUENTIAL on z/OS when the files are located on the native file system, not the Herarchical File System (HFS).
- Use the version with LINE SEQUENTIAL in all other cases, including on z/OS for files on HFS.
Probably. There is no error checking, and no test for stack overflow. But this program is meant as a demonstration, and a lot of testing would clutter up the code.
One way to handle stack overflow is to randomize the data and start again. This could easily be done at the beginning of qsort. Since the most likely cause of the stack overflow is input data that is already (almost) sorted, randomizing input should produce data which quicksort can handle well.
One strange behaviour is that the z/OS output is different from the GNU output. Records with the same key, but different values, sometimes come out in different order. This is not strictly a bug, since the output is sorted. But why?
You can reach me by email at “lars dash 7 dot sdu dot se” or by telephone +46 705 189090