Lots of file access and manipulation notes

RISC OS provides a powerful filing system which insulates programs from the actual hardware to a much greater degree than most other machines. Your program doesn't have to know anything about differences between floppy disks and the latest magneto-optical drives, for instance.

Basic provides a relatively small set of commands to let you read and write files, but these are not particularly efficient. If you talk to the operating system directly, you open up more sophisticated facilities which allow you to process files with much greater efficiency than Basic commands - and, in some ways, they're easier to use.

Accessing whole files

If you use the Basic commands to access your files, you have to open the file first and then read bits of data in a value at a time. This is not very efficient and, if you've written programs which load large files in this way, you'll probably have noticed that they're a bit slow.

A much faster way of loading a file is to load it directly into memory, but first you need to know how big it is. You might also like to find out its file type just to check it's one your program understands.

To find out about a file, you call OS_File, a SWI which performs operations on whole files. The first parameter is always a reason code specifying exactly what you want it to do.

The call to read information about a file is:

SYS "OS_File", 17, name$ TO type%, ,load%, exec%, length%, attributes%

name$ is the full pathname of the file you're interested in. The call returns various bits of information. attributes% will be described in more detail below. type$ is the type of the object, not the filetype, which can take four values:

0 - Object not found - it doesn't exist
1 - Object is a file
2 - Object is a directory
3 - Image file - both a file and a directory

An image file is something like an archive or a PC partition. It's a file which can be loaded in memory but also behaves like a directory which can be opened and files accessed from within it.

length% is the length of the file in bytes while load% and exec% are the addresses at which the file will be loaded and run from, respectively.

However, under RISC OS these two parameters are more often used for the file type and time stamp of the file, and their names really come from the days of the BBC Micro.

The values are split up as follows:

load% = &FFFtttdd
exec% = &dddddddd

The lower 12-bits of the load address are set to &FFF to tell the operating system that this file has a file type and a date stamp, and the 12-bits, ttt, are the file type.

The date stamp is in the standard 5 byte format, and is split across the two words with the most significant byte in the load address.

To find the file type of the file, you use:

filetype% = (load% AND &FFF00) >> 8

Knowing all this information, you can then allocate a block of memory for the file and load it in using another OS_File call:

DIM file% length%
SYS "OS_File", 16, name$ file%, 0

The zero tells the operating system to load the file at the address given in file% rather than the load address of the file.

You might like to use a heap to store your data as described in the December 1993 issue, rather than using DIM which you can't give back.

Saving a file is very easy. You use a call like:

SYS "OS_File", 10, name$, filetype%, file%, file% + length%

filetype% is the filetype you want the saved file to have. The last parameter is file% + length% because you give the call the start and end addresses of data rather than the start and length.

If you've saved a file using Basic's filing commands, your files will have the Data filetype. You can change the filetype of a file with another OS_File call:

SYS "OS_File", 18, name$, filetype%

Deleting a file is, again, a simple OS_File call:

SYS "OS_File", 6, name$

Use it with care.

Example 1 on the MegaDisk demonstrates all these calls. Double-click on SetDir before running it.
Copy the files from the cover disk to another disk, with some free space before you run them as you can't write to the MegaDisk.

Directories

It's sometimes useful to be able to create directories and to be able to read the names of files stored within a directory. Basic doesn't provide any statements to perform these tasks.

To create a directory, you use another OS_File call:

SYS "OS_File", 8, name$, , , entries%

entries% is the number of entries the directory can have before it needs extending.

However, directories on most filing systems have their size fixed by the disk format - although Acorn are rumoured to be working on removing this restriction with FileCore-type filing systems, along with allowing long filenames.

Set it to zero to set the default number of entries. name$ is the full pathname of the directory you want to create.

To read the contents of a directory, you need to use OS_GBPB. This stands for Get Byte, Put Byte and so, logically, is used to read directory contents - this too has its basis in the BBC Micro.

This call will read lots of entries into a large buffer but it's a lot easier to read one entry at a time, so you don't have to work out the start of the next entry in the buffer from the length of the filename - each entry is a variable length to allow for long filenames in the future.

You call this SWI many times, and each time you pass it the number of the item you want it to read. This may not necessarily be the number of the object within the directory.

The call tells you the number of the next object, and you keep on calling the SWI until it returns -1. This number is kept in item% in the example below.

To print the names of all the objects (files and directories) in a directory, you would use:

DIM result% 128
item% = 0
WHILE item% <> -1
  SYS "OS_GBPB", 10, directory$, result%, 1, item%, 128, 0 TO , , , read%, item%
  IF read% > 0 THEN
    n% = 20
    name$ = ""
    WHILE result%?n% <> 0:name$ = name$ + CHR$(result%?n%):n%+=1:ENDWHILE
    PRINT name$
  ENDIF
ENDWHILE

The dimensioned block result% in the first line is best claimed at initialisation of your program so that you don't repeatedly claim it and waste memory. read% is the number of objects read.

Because Basic insists on terminating strings with ascii code 13 rather than zero - like the operating system and every other language - it's necessary to read the name returned in a rather roundabout way.

The call also returns the same information as the OS_File call to read information about a file in the result block. The locations of this information in the result block are, in Basic notation:

result%!0        Load address
result%!4        Execution address
result%!8        Length
result%!12       Attributes
result%!16       Object type
result% + 20     Object name

All the values are the same as the OS_File SWI.

The program Example2 shows how to recursively read directory entries and print a directory tree of everything below a directory. Run it, and type in the pathname of the directory.

For example, to display the tree of the disk IDEdisc4 you would enter ADFS::IDEdisc4.$.

File attributes

You will have noticed earlier that the OS_File and OS_GBPB SWIs return the attributes of the file. This specifies how the file can be accessed - whether it has read and write access and whether it can be deleted.

You can see all these attributes if you select a file in a directory viewer, click Menu and open the dialogue box File 'x' => Access => Access details=>.

There are two types of access set by the attributes, owner and public. The distinction is mainly for networks, where you would obviously want to be able to stop other people from reading, altering and even deleting your files. The attributes of any files you create are, by default, set to unlocked and owner read and write. The contents of the attributes word is as below. Only the top 8-bits have a universal meaning. Some filing systems return extra information in the rest of the word and you should ignore it.

Bit     Meaning when set
0       Owner read
1       Owner write
3       Locked
4       Public read
5       Public write
7       Locked against public deletion

All other bits are undefined and should be set to zero. To test to see if bit n is set, use:

IF (attributes% AND (1 << n)) = (1 << n) THEN REM bit is set

You can change the attributes using OS_File, as below:

SYS "OS_File", 4, name$, , , , new_attributes%

However, it's probably best to let the user set them using the Filer, but you may want to test for certain bits set. RISC OS will return an error if you try, for example, to read a file which hasn't got read access set.

Canonical names

The canonical name of a file is its full pathname. For example, if the current directory is ADFS::IDEdisc4.$.Dog and you ask for the canonicalised name of the file Fish, the pathname ADFS::IDEdisc4.$.Dog.Fish will be returned. This can be useful for non-multitasking programs where the user is required to enter the name of a file. This facility is only available under RISC OS 3.

The call is:

DIM buffer% 256
SYS "OS_FSControl", 37, name$, buffer%, 0, 0, 256
n% = 0
canonical_name$ = ""
WHILE buffer%?n% <> 0
  canonical_name$ = canonical_name$ + CHR$(buffer%?n%
  n%+=1
ENDWHILE

The block buffer% should be claimed at initialisation of your program. The name is read in the same way as for the OS_GBPB read directory call for the same reason.

This call can also be used to find out the name of a disk in a particular drive. For example, to find the name of the disk in drive 0, you would canonicalise ADFS::0.$. It's necessary to include the .$ as you need to have a valid object name to canonicalise.

As specifying the disk by its drive number is more vague than by its disk name, it's canonicalised to include the actual name of the disk. Something like ADFS::Floppy.$ is returned, and it's trivial to separate the name of the disk from this pathname.

Example3 illustrates this technique. You shouldn't need to use it in most programs, but occasionally it's useful. For example, if your program is designed for unattended use, you might like to use this to check that a disk is present before trying to save to it, and possibly locking up the machine while the operating system asks for the disk to be inserted.

Open files

As well as accessing whole files with OS_File, you can also use a mechanism which is similar to, but much more efficient than, Basic's own file-handling commands.

The equivalent of the Basic OPENOUT and OPENIN commands is OS_Find. The call looks like this:

SYS "OS_Find", reason%, name$ TO handle%

This returns a file handle, similar to Basic, in handle%. How the file is opened depends on the reason code in reason%. This can be:

&4F   open a file with read access
&8F   create a new file and open it with read/write access
&CF   open a file with read/write access

You can specify more options with the reason code, but these mainly control how errors are returned - the full details are on page 2-76 of the PRM.

Once the file is open, you use OS_GBPB for actually getting and putting bytes from the file. It can either read or write data at the current file pointer - automatically set to the byte after the last one you accessed - or you can tell it explicitly where you want to get the data from. Like OS File, the first parameter to OS_GBPB is a reason code.

To read bytes from an open file, you would use:

DIM buffer% bytes%
SYS "OS GBPB", 3, handle%, buffer%, bytes%, pointer% TO , , , bytes_not%, pointer%

handle% is the file handle returned by OS_Find, buffer% is the address of where you want the data to go and bytes% is the number of bytes you want to read from the file.

You probably won't want to claim a block of memory from Basic before reading the file, but load it into a previously-claimed block of memory.

pointer% is the position of the first byte you want to access from the file, staffing from zero, and it is updated when the call returns. bytes_not% is the number of bytes which weren't read, for example, if you tried to read bytes past the end of the file.

To read bytes from the current file pointer, you use a reason code of 4 rather than 3, and omit pointer%.

To write bytes to an open file, the file must be opened with read/write access. You use a call like:

SYS "OS_GBPB", 1, handle%, buffer%, bytes%, pointer%

The data you want to write is at buffer% and bytes% bytes will be written, overwriting any data which is already there rather than inserting it.

pointer% has the same function as in the previous call. If the specified pointer is past the end of the file, it is extended by writing zeros to get to the given pointer.

To write bytes to the current file pointer, use a reason code of 2 rather than 1, and omit pointer%.

When you have finished with the file, close it with:

SYS "OS_Find", 0, handle%

As with the Basic commands, you should always close files when you finish with them. If you're writing a desktop program, it's not considered good practise to leave a file open over calls to Wimp_Poll unless absolutely necessary - some naughty program (or user) might issue a global file close command and then your program might be in big trouble.

Sorry, I can't find the example files for this article! If you have them, please get in touch.


Source: Acorn Computing October 1994
Publication: Acorn Computing
Contributor: Ben Summers