Stream Input/Output in Fortran
Background - Record-based versus Stream I/O
Traditionally, and indeed until the advent of Fortran2003, Standard
Fortran I/O has been entirely record-based. This is fine if you
are reading or writing a file of records such as a text file, where
each line is a record, since it means that you don't need to be
concerned about the record terminators (which, depending on the
platform, may be line-feeds, or carriage-returns, or both). It
can, however, be a serious handicap when you want to use Fortran to
read a file generated by some instrument, or produced by a package such
as a spreadsheet or database system. Often these files do not
have records that Fortran can recognise, or they have a more complex
structure than a simple linear sequence of records, but Standard
Fortran has not previously provided any means of accessing them simply
as a stream of bytes or characters. Another problem is that
sometimes one wants to access the records in some random order, rather
than in a strict sequence. One can do this in Fortran using
direct-access files, but these are restricted to files where all
records have the same length. Stream I/O provides solutions to
all these problems.
Many compiler vendors have in fact recognised the need for more general
I/O and have provided extensions to support stream I/O (sometimes
called binary I/O), but the syntax varies, so reducing
portability. Fortran2003 standardises stream I/O
facilities, which can already be used, since the g95 compiler has implemented them
already, in full conformance with the Standard.
Stream I/O, as with other forms of I/O comes in two flavours: formatted
and unformatted. It is the unformatted form which provides the
more powerful facilities, so I will deal with this first.
It is interesting to note that Fortran, invented by a team working for
IBM where the main data medium was the 80-column punched card, had
nothing but record-oriented I/O until recently, whereas C, invented by
people using computers made by Digital Equipment where paper tape was
much more common, had stream I/O from the outset. I think the
characteristics of these devices influenced the design of the
programming languages, but I have to say that not everyone agrees with
me.
Unformatted Stream I/O
In unformatted stream I/O the file is treated as a sequence of file storage units. These
units are in principle system-dependent, but the Standard recommends
the use of bytes, and I doubt if any other unit will be used in
practice. The facilities are closely modeled on those of the
binary stream file in C.
The file is opened using an OPEN statement containing
ACCESS="STREAM" (FORM="UNFORMATTED" is the default). A new file
can be written using simple WRITE
statements, just like those required to populate an unformatted
sequential file, using any mixture of data types you choose. The
effect of each WRITE is simply to append
the appropriate sequence of bytes to the file, uninterrupted by record
markers. Similarly when reading an existing binary file opened
for stream access, the READ statement will move the current position
marker through the file by the number of bytes needed to satisfy its
I/O list. For numerical data types the order of bytes within each
item will depend upon whether the processor uses big-endian or
little-endian number formats. Generally you will need to be
concerned about this only if you transfer a file from a platform using
one endian convention to one using the opposite convention (and the
same concerns apply to record-based unformatted reads and
writes).
Here is a trivial example of writing a file using unformatted stream
access (note that Fortran keywords are shown in upper-case only to
distinguish them from user-chosen names):
PROGRAM writeUstream
IMPLICIT NONE
INTEGER :: myvalue = 12345, mypos
OPEN(UNIT=11, FILE="ustream.demo", STATUS="NEW", ACCESS="STREAM")
WRITE(11) "first"
WRITE(11) "second"
INQUIRE(UNIT=11, POS=mypos)
PRINT *, "Myvalue will be written at position ", mypos
WRITE(11) myvalue
CLOSE(UNIT=11)
END PROGRAM writeUstream
The first two WRITE statements will put a total of 11 bytes on the
file; assuming the integer value occupies a 32-bit (4-byte) word then
the third WRITE will extend the file to 15 bytes. An INQUIRE
statement can be used at any point to find out the next character position in the
file; in this case it should return 12. When as here the
preceding WRITE was appending to the file, the value returned will be
the current length of the file in bytes plus one.
The power of stream I/O derives from the fact that a READ or WRITE
statement can specify the position at which the operation is to start
using a POS= specifier, remembering that POS=1 means the start of the
file. For example if we were to open the file
just produced we could access parts of it like this::
PROGRAM readUstream
IMPLICIT NONE
CHARACTER :: string*3
INTEGER :: n
OPEN(UNIT=42, FILE="ustream.demo", STATUS="OLD", ACCESS="STREAM")
READ(42, POS=4) string
READ(42, POS=12) n
END PROGRAM readUstream
Then the character variable would be set to "sts", being the contents
from the end
of the word "first" and the start of "second", and the integer n would read in the number
derived originally from myvalue.
This provides a form of random access to a file, similar to that
provided by direct-access files, but with addresses specified to the
byte rather than to the record. It is permitted to write an
unformatted stream file at any position: if this was beyond the
previous end of the file, then the contents of the gap are left
undefined.
In fact if one uses a POS= specifier in a WRITE
statement with an empty I/O list, then it resets the position in the
file without actually writing anything to the file. If the
position precedes the previous end of the file and the list is not
empty, then the byte positions specified are re-written, but the length
of the file is unchanged. This quite different to what happens if
you try to write data to some intermediate point in a (non-stream)
sequential file: the file length is reset and the contents beyond the
point at which the WRITE was executed are all lost. Of course, if
a READ statement attempts to read at
a position in a file which has never been written then the results are
undefined, as one would expect.
A more practical example is given here, which opens a file of type .DBF
and lists some of its contents. The .DBF format is used by many
PC-based database management systems including dBASE, Alpha-5, and
Paradox. Actually there are several variants of the format, this
is merely one of the most common. The file consists of a header
of up to 32 bytes, followed by the column details, and then the data
records themselves.
PROGRAM readbf
! Reads .DBF files, lists header and first few records.
! Clive Page, 2005 July 9
IMPLICIT NONE
INTEGER, PARAMETER :: maxcol = 128
CHARACTER :: colname(maxcol)*11, coltype(maxcol)*1
INTEGER :: colwidth(maxcol), coldec(maxcol), coloff(maxcol)
CHARACTER :: version*1, year*1, month*1, day*1, &
ca*4, cwidth, cdec, string*100, flag*1
INTEGER :: nrecs, ncols, icol, irec, ioffset, dataoff, cw, k
INTEGER(kind=selected_int_kind(3)) :: lhead, lenrec ! =integer*2
!
OPEN(unit=1, file="recs.dbf", status='old', ACCESS='stream')
READ(1) version, year, month, day, nrecs, lhead, lenrec
ncols = (lhead - 32)/32
WRITE(*, '(a,i4, a,i4, 2("-",i2.2), 3(i6,a))') &
'Version ', ichar(version), &
' Date ', ichar(year)+1900, ichar(month), ichar(day), &
nrecs, ' rows', ncols, ' columns', lenrec, ' bytes/row'
!
WRITE(*,*)'Col ---Name--- T Width Decimals Offset'
ioffset = 1
DO icol = 1,ncols
READ(1, POS=32*icol+1) colname(icol), coltype(icol), ca, cwidth, cdec
k = INDEX(colname(icol), char(0))
colname(icol)(k:) = " "
colwidth(icol) = ichar(cwidth)
coldec(icol) = ichar(cdec)
coloff(icol) = ioffset
ioffset = ioffset + colwidth(icol)
WRITE(*, '(i3,1x,a,1x,a,2i6,i8)') icol, colname(icol), &
coltype(icol), colwidth(icol), coldec(icol), coloff(icol)
END DO
! print contents of first three records
dataoff = 32 * ncols + 35
DO irec = 1,3
WRITE(*,'(a,i0)') 'Record ', irec
READ(1, pos=dataoff + (irec-1)*lenrec) flag
WRITE(*, '(2A)') 'Deleted flag = ', flag
DO icol = 1,ncols
ioffset = dataoff + (irec-1) * lenrec + coloff(icol)
cw = colwidth(icol)
READ(1, pos=ioffset) string(1:cw)
WRITE(*, '(i3,1x,3a)') icol, colname(icol), '=', string(1:cw)
END DO
END DO
END PROGRAM readbf
Formatted Stream Files
Formatted stream I/O essentially provides an alternative way of reading
or writing a formatted sequential file, i.e. a text file, but with a
little extra flexibility. Such files are opened with
ACCESS="STREAM" and FORM="FORMATTED", and every READ
and WRITE statement must specify a formatted transfer. It appears
to be legal to use list-directed
formatting or even NAMELIST input/output on such a file, but it is hard
to see a good reason for wanting to do this.
The additional power of steam I/O arises from the fact that the READ or
WRITE statement can specify a position for the transfer, using a POS=
specifier, but this cannot be chosen freely (one might say randomly) as
for unformatted stream files, but it must be either the position of 1
(meaning the start of the file), or a position previously obtained
using an INQUIRE statement with the file positioned after some earlier
operations. The reason for this restriction, one can guess,
is that the actual number of characters in a text file is system
dependent (the record separator may be one or two characters, or in
principle even more).
When reading from a formatted stream file the usual rules concerning
reading beyond the end of a record apply: the default setting is
PAD="YES" which means that a record will appear to be extended with an
indefinite number of spaces. When writing the records have no
defined record terminators, but the intrinsic function NEW_LINE is
provided to allow a record terminator to be produced (it takes a single
character argument, the value of which is not used, but required to
specify the kind of character
value in use).
Another difference from unformatted stream output is that (according
to my reading of the Standard) whenever a WRITE statement writes to a
position preceding the end of the file, it has the effect of truncating
the file at that position, i.e. all subsequent data in the file are
lost.
Here is a program fragment showing formatted stream output:
OPEN(UNIT=11, FILE="mystream", STATUS="REPLACE", ACCESS="STREAM", FORM="FORMATTED")
WRITE(11, "(4A)") "first line", NEW_LINE("x"), "second line", NEW_LINE("x")
INQUIRE(UNIT=11, POS=mypos)
Now the integer variable mypos
contains a value which can be used in a subsequent READ or WRITE
statement using POS=mypos
to return to the same point in the file.
Note that it is advisable to insert a final newline sequence in
the file or the last line of the file will be incomplete and may be
hard to read as a piece of text. There are other minor
restrictions: the BACKSPACE statements cannot be used on a stream file
(as there are
no records to move back over), nor may ENDFILE statement - but
this seems to me to be a totally redundant statement anyway.
Portability
I am grateful to James van Buskirk for pointing out that one can use
stream I/O with other compilers if there is a small change to the
options of the OPEN statement. His code contains this:
if(is_g95()) then
access = 'stream'
form = 'unformatted'
else ! Use values appropriate for lf95 express 7.10.02 or
! ifort Package ID: W_FC_C_9.0.029
access = 'sequential'
form = 'binary'
end if
open(10, file='fire.bmp', access=access, form=form, status='replace')
For details of how he manages to distinguish between g95 and other compilers using the function is_g95 refer to his example at: http://home.comcast.net/~kmbtib/Fortran_stuff/fire.f90
. He says that this works not only with g95 but also cvf, ifort, and
lf95. I would guess that another way of doing this would be to
attempt an OPEN with access='stream' and form='unformatted' and then if
this gives an error, re-try with the options accepted by the other
compilers, but I have not had the opportunity to try this yet.
Clive Page
First draft: 2005 July 9.
Revised: 2005 Oct 31, following feedback from Richard Maine and others
on the comp.lang.fortran newsgroup.
Revised 2006 April 14 following feedback from James van Buskirk on portability.