Stream Input/Output in Fortran

Background - Record-based versus Stream I/O

Traditionally, and indeed until the advent of Fortran2003, Standard Fortran I/O has been entirely record-based.  This is fine if you are reading or writing a file of records such as a text file, where each line is a record, since it means that  you don't need to be concerned about the record terminators (which, depending on the platform, may be line-feeds, or carriage-returns, or both).  It can, however, be a serious handicap when you want to use Fortran to read a file generated by some instrument, or produced by a package such as a spreadsheet or database system.  Often these files do not have records that Fortran can recognise, or they have a more complex structure than a simple linear sequence of records, but Standard Fortran has not previously provided any means of accessing them simply as a stream of bytes or characters.   Another problem is that sometimes one wants to access the records in some random order, rather than in a strict sequence.  One can do this in Fortran using direct-access files, but these are restricted to files where all records have the same length.  Stream I/O provides solutions to all these problems.
 
Many compiler vendors have in fact recognised the need for more general I/O and have provided extensions to support stream I/O (sometimes called binary I/O), but the syntax varies, so reducing portability.   Fortran2003 standardises stream I/O facilities, which can already be used, since the g95 compiler has implemented them already, in full conformance with the Standard.

Stream I/O, as with other forms of I/O comes in two flavours: formatted and unformatted.  It is the unformatted form which provides the more powerful facilities, so I will deal with this first.

It is interesting to note that Fortran, invented by a team working for IBM where the main data medium was the 80-column punched card, had nothing but record-oriented I/O until recently, whereas C, invented by people using computers made by Digital Equipment where paper tape was much more common, had stream I/O from the outset.  I think the characteristics of these devices influenced the design of the programming languages, but I have to say that not everyone agrees with me.

Unformatted Stream I/O

In unformatted stream I/O the file is treated as a sequence of file storage units.  These units are in principle system-dependent, but the Standard recommends the use of bytes, and I doubt if any other unit will be used in practice.  The facilities are closely modeled on those of the binary stream file in C.

The file is opened using an OPEN statement containing ACCESS="STREAM" (FORM="UNFORMATTED" is the default).  A new file can be written using simple WRITE statements, just like those required to populate an unformatted sequential file, using any mixture of data types you choose.  The effect of each WRITE is simply to append the appropriate sequence of bytes to the file, uninterrupted by record markers.  Similarly when reading an existing binary file opened for stream access, the READ statement will move the current position marker through the file by the number of bytes needed to satisfy its I/O list.  For numerical data types the order of bytes within each item will depend upon whether the processor uses big-endian or little-endian number formats.  Generally you will need to be concerned about this only if you transfer a file from a platform using one endian convention to one using the opposite convention (and the same concerns apply to record-based unformatted reads and writes).  

Here is a trivial example of writing a file using unformatted stream access (note that Fortran keywords are shown in upper-case only to distinguish them from user-chosen names):
  PROGRAM writeUstream
IMPLICIT NONE
INTEGER :: myvalue = 12345, mypos
OPEN(UNIT=11, FILE="ustream.demo", STATUS="NEW", ACCESS="STREAM")
WRITE(11) "first"
WRITE(11) "second"
INQUIRE(UNIT=11, POS=mypos)
PRINT *, "Myvalue will be written at position ", mypos
WRITE(11) myvalue
CLOSE(UNIT=11)
END PROGRAM writeUstream
The first two WRITE statements will put a total of 11 bytes on the file; assuming the integer value occupies a 32-bit (4-byte) word then the third WRITE will extend the file to 15 bytes.  An INQUIRE statement can be used at any point to find out the next character position in the file; in this case it should return 12.  When as here the preceding WRITE was appending to the file, the value returned will be the current length of the file in bytes plus one. 

The power of stream I/O derives from the fact that a READ or WRITE statement can specify the position at which the operation is to start using a POS= specifier, remembering that POS=1 means the start of the file.  For example if we were to open the file just produced we could access parts of it like this::
  PROGRAM readUstream
IMPLICIT NONE
CHARACTER :: string*3
INTEGER :: n
OPEN(UNIT=42, FILE="ustream.demo", STATUS="OLD", ACCESS="STREAM")
READ(42, POS=4) string
READ(42, POS=12) n
END PROGRAM readUstream
Then the character variable would be set to "sts", being the contents from the end of the word "first" and the start of "second", and the integer n would read in the number derived originally from myvalue.  This provides a form of random access to a file, similar to that provided by direct-access files, but with addresses specified to the byte rather than to the record.  It is permitted to write an unformatted stream file at any position: if this was beyond the previous end of the file, then the contents of the gap are left undefined.
In fact if one uses a POS= specifier in a WRITE statement with an empty I/O list, then it resets the position in the file without actually writing anything to the file.  If the position precedes the previous end of the file and the list is not empty, then the byte positions specified are re-written, but the length of the file is unchanged.  This quite different to what happens if you try to write data to some intermediate point in a (non-stream) sequential file: the file length is reset and the contents beyond the point at which the WRITE was executed are all lost.  Of course, if a READ statement attempts to read at a position in a file which has never been written then the results are undefined, as one would expect.

A more practical example is given here, which opens a file of type .DBF and lists some of its contents.  The .DBF format is used by many PC-based database management systems including dBASE, Alpha-5, and Paradox.  Actually there are several variants of the format, this is merely one of the most common.  The file consists of a header of up to 32 bytes, followed by the column details, and then the data records themselves.
PROGRAM readbf
! Reads .DBF files, lists header and first few records.
! Clive Page, 2005 July 9
IMPLICIT NONE
INTEGER, PARAMETER :: maxcol = 128
CHARACTER :: colname(maxcol)*11, coltype(maxcol)*1
INTEGER :: colwidth(maxcol), coldec(maxcol), coloff(maxcol)
CHARACTER :: version*1, year*1, month*1, day*1, &
ca*4, cwidth, cdec, string*100, flag*1
INTEGER :: nrecs, ncols, icol, irec, ioffset, dataoff, cw, k
INTEGER(kind=selected_int_kind(3)) :: lhead, lenrec ! =integer*2
!
OPEN(unit=1, file="recs.dbf", status='old', ACCESS='stream')
READ(1) version, year, month, day, nrecs, lhead, lenrec
ncols = (lhead - 32)/32
WRITE(*, '(a,i4, a,i4, 2("-",i2.2), 3(i6,a))') &
'Version ', ichar(version), &
' Date ', ichar(year)+1900, ichar(month), ichar(day), &
nrecs, ' rows', ncols, ' columns', lenrec, ' bytes/row'
!
WRITE(*,*)'Col ---Name--- T Width Decimals Offset'
ioffset = 1
DO icol = 1,ncols
READ(1, POS=32*icol+1) colname(icol), coltype(icol), ca, cwidth, cdec
k = INDEX(colname(icol), char(0))
colname(icol)(k:) = " "
colwidth(icol) = ichar(cwidth)
coldec(icol) = ichar(cdec)
coloff(icol) = ioffset
ioffset = ioffset + colwidth(icol)
WRITE(*, '(i3,1x,a,1x,a,2i6,i8)') icol, colname(icol), &
coltype(icol), colwidth(icol), coldec(icol), coloff(icol)
END DO
! print contents of first three records
dataoff = 32 * ncols + 35
DO irec = 1,3
WRITE(*,'(a,i0)') 'Record ', irec
READ(1, pos=dataoff + (irec-1)*lenrec) flag
WRITE(*, '(2A)') 'Deleted flag = ', flag
DO icol = 1,ncols
ioffset = dataoff + (irec-1) * lenrec + coloff(icol)
cw = colwidth(icol)
READ(1, pos=ioffset) string(1:cw)
WRITE(*, '(i3,1x,3a)') icol, colname(icol), '=', string(1:cw)
END DO
END DO
END PROGRAM readbf

Formatted Stream Files

Formatted stream I/O essentially provides an alternative way of reading or writing a formatted sequential file, i.e. a text file, but with a little extra flexibility.  Such files are opened with ACCESS="STREAM" and FORM="FORMATTED", and every READ and WRITE statement must specify a formatted transfer.  It appears to be legal to use list-directed formatting or even NAMELIST input/output on such a file, but it is hard to see a good reason for wanting to do this. 

The additional power of steam I/O arises from the fact that the READ or WRITE statement can specify a position for the transfer, using a POS= specifier, but this cannot be chosen freely (one might say randomly) as for unformatted stream files, but it must be either the position of 1 (meaning the start of the file), or a position previously obtained using an INQUIRE statement with the file positioned after some earlier operations.   The reason for this restriction, one can guess, is that the actual number of characters in a text file is system dependent (the record separator may be one or two characters, or in principle even more).

When reading from a formatted stream file the usual rules concerning reading beyond the end of a record apply: the default setting is PAD="YES" which means that a record will appear to be extended with an indefinite number of spaces.  When writing the records have no defined record terminators, but the intrinsic function NEW_LINE is provided to allow a record terminator to be produced (it takes a single character argument, the value of which is not used, but required to specify the kind of character value in use).

Another difference from unformatted stream output is that (according to my reading of the Standard) whenever a WRITE statement writes to a position preceding the end of the file, it has the effect of truncating the file at that position, i.e. all subsequent data in the file are lost. 

Here is a program fragment showing formatted stream output:
  OPEN(UNIT=11, FILE="mystream", STATUS="REPLACE", ACCESS="STREAM", FORM="FORMATTED")
WRITE(11, "(4A)") "first line", NEW_LINE("x"), "second line", NEW_LINE("x")
INQUIRE(UNIT=11, POS=mypos)
Now the integer variable mypos contains a value which can be used in a subsequent READ or WRITE statement using POS=mypos to return to the same point in the file.

Note that it is advisable to insert a final newline sequence in the file or the last line of the file will be incomplete and may be hard to read as a piece of text.  There are other minor restrictions: the BACKSPACE statements cannot be used on a stream file (as there are no records to move back over), nor may  ENDFILE statement - but this seems to me to be a totally redundant statement anyway.

Portability

I am grateful to James van Buskirk for pointing out that one can use stream I/O with other compilers if there is a small change to the options of the OPEN statement.  His code contains this:
   if(is_g95()) then
access = 'stream'
form = 'unformatted'
else ! Use values appropriate for lf95 express 7.10.02 or
! ifort Package ID: W_FC_C_9.0.029
access = 'sequential'
form = 'binary'
end if
open(10, file='fire.bmp', access=access, form=form, status='replace')
For details of how he manages to distinguish between g95 and other compilers using the function is_g95 refer to his example at: http://home.comcast.net/~kmbtib/Fortran_stuff/fire.f90 . He says that this works not only with g95 but also cvf, ifort, and lf95.  I would guess that another way of doing this would be to attempt an OPEN with access='stream' and form='unformatted' and then if this gives an error, re-try with the options accepted by the other compilers, but I have not had the opportunity to try this yet.
Clive Page
First draft: 2005 July 9.
Revised: 2005 Oct 31, following feedback from Richard Maine and others on the comp.lang.fortran newsgroup.
Revised 2006 April 14 following feedback from James van Buskirk on portability.