8.301 The UNIX File System and its Manipulation in C

Objectives

Upon completion of this lesson, you will be able to:

explain the hierarchical structure of the Unix file system
describe the role of inodes in the Unix file system
show how to use the C to work with the file system
demonstrate how to change the permissions and ownership

Introduction

The Unix file system is a hierarchical file system used in Unix-like operating systems. In this tutorial, we will discuss the implementation of the Unix file system, the role of inodes, and how to manipulate and access files, directories, and file information on Unix using the C programming language.

Unix File System

The Unix file system is a hierarchical file system that organizes files and directories in a tree-like structure. At the root of the tree is the root directory (“/”), which contains all other directories and files. Directories can contain other directories and files, and files can be stored in any directory. Each file and directory is identified by a path, which is the sequence of directories from the root directory to the file or directory.

Inodes

The Unix file system uses inodes to store information about files and directories. The inode (short for index node) is a crucial data structure that represents the metadata of a file or directory. The inode structure contains essential information about a file, such as its owner, permissions, timestamps, size, and location on the disk.Each file or directory has an inode associated with it, and the inode is used to locate and access the file or directory.

Below is a detailed explanation of the key components in a typical UNIX inode structure:

Inode number (i-number): A unique identifier for the inode within the file system. This number is used by the system to reference the inode and associated file or directory.

File type: Specifies the type of file associated with the inode, such as a regular file, directory, symbolic link, character device, or block device.

File mode: Represents the file permissions, including read, write, and execute permissions for the file’s owner, group, and others. It also contains additional flags, such as setuid, setgid, and sticky bit.

Link count: Indicates the number of hard links to the inode. When the link count is zero, the inode and its associated file or directory can be safely deleted.

Owner (UID) and Group (GID): Specifies the user identifier (UID) and group identifier (GID) of the file’s owner and group, respectively.

File size: Represents the size of the file in bytes.

Timestamps: There are usually three timestamps in the inode structure:

atime (access time): Records the last time the file was accessed or read.
mtime (modification time): Records the last time the file was modified or written to.
ctime (change time): Records the last time the file’s inode was changed, such as permission updates or ownership changes.

Block pointers: These are pointers to the blocks of data that make up the file. Typically, an inode contains a set of direct pointers, indirect pointers, double indirect pointers, and possibly triple indirect pointers.

Direct pointers: Point directly to the data blocks containing the file’s content.
Indirect pointers: Point to a block that contains additional pointers to data blocks.
Double indirect pointers: Point to a block containing pointers to other blocks, which in turn contain pointers to data blocks.
Triple indirect pointers: Similar to double indirect pointers, but with an additional level of indirection.

Extended attributes: Some UNIX file systems support extended attributes, which are additional metadata attached to the inode. These attributes can store information like access control lists (ACLs) and other file properties.

The exact structure of the inode and the information it contains can vary depending on the specific UNIX file system implementation, such as ext4, XFS, or HFS+. However, the core concepts remain largely the same across different file systems.

Manipulating and Accessing Files and Directories

In C programming language, the standard library provides functions for manipulating and accessing files and directories. Some of the most commonly used functions are:

fopen: Opens a file and returns a pointer to a FILE structure that can be used to read from or write to the file.

fclose: Closes a file that was opened with fopen.

fread and fwrite: Reads and writes data to a file.

opendir: Opens a directory and returns a pointer to a DIR structure that can be used to read the contents of the directory.

readdir: Reads the next entry in a directory opened with opendir.

closedir: Closes a directory that was opened with opendir.

stat: Returns information about a file or directory, including its size, permissions, and owner.

chmod: Changes the permissions of a file or directory.

chown: Changes the owner of a file or directory.

Here is an example program that demonstrates how to use some of these functions:

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <sys/stat.h>

int main(int argc, char *argv[]) {
    DIR *dir;
    struct dirent *ent;
    struct stat statbuf;

    if (argc < 2) {
        fprintf(stderr, "Usage: %s <directory>\n", argv[0]);
        exit(EXIT_FAILURE);
    }

    if ((dir = opendir(argv[1])) == NULL) {
        perror("opendir");
        exit(EXIT_FAILURE);
    }

    while ((ent = readdir(dir)) != NULL) {
        printf("%s\n", ent->d_name);

        if (stat(ent->d_name, &statbuf) < 0) {
            perror("stat");
            continue;
        }

        printf("\t Size: %lld bytes\n", (long long) statbuf.st_size);
        printf("\t Permissions: %o\n", statbuf.st_mode & 0777);
        printf("\t Owner: %d\n", statbuf.st_uid);
    }

    closedir(dir);

    return 0;
}

This program takes a directory name as a command line argument, opens the directory with opendir, reads the contents of the directory with readdir, and prints information about each file in the directory using stat.

File Descriptors vs File Structures

open and fopen are both C functions that are used to open files, but they have some differences.

open is a low-level function that is part of the POSIX library, while fopen is a higher-level function that is part of the C standard library.

open returns a file descriptor, which is an integer that represents the file being opened. This file descriptor can be used with other low-level I/O functions like read and write to manipulate the file. open allows for greater control over file access modes and file permissions, as it takes a bitmask of flags that specify these parameters.

On the other hand, fopen returns a pointer to a FILE structure, which is a higher-level abstraction of the file. FILE provides a buffered interface for reading and writing data, and it is designed to be used with other higher-level I/O functions like fread, fwrite, fprintf, and fscanf. fopen provides less control over file access modes and permissions than open, but it is simpler to use.

Here’s an example of using open to open a file:

#include <fcntl.h>
#include <unistd.h>

int main() {
    int fd = open("file.txt", O_RDWR);
    if (fd == -1) {
        perror("open");
        exit(EXIT_FAILURE);
    }

    // Use the file descriptor with read and write operations
    // ...

    close(fd);
    return 0;
}

And here’s an example of using fopen to open a file:

#include <stdio.h>

int main() {
    FILE *fp = fopen("file.txt", "r+");
    if (fp == NULL) {
        perror("fopen");
        exit(EXIT_FAILURE);
    }

    // Use the FILE pointer with fread, fwrite, fprintf, fscanf, etc.
    // ...

    fclose(fp);
    return 0;
}

In summary, open is a lower-level function that provides greater control over file access modes and permissions, while fopen is a higher-level function that provides a buffered interface for reading and writing data.

I/O Control with `ioctl`

The ioctl function in C is a system call used for performing various input/output control operations on a file descriptor. It allows you to manipulate the underlying device parameters or properties of the file associated with the given file descriptor. The name ioctrl stands for “input/output control.”

The ioctl function is particularly useful for managing device drivers and handling specialized or non-standard operations that don’t have dedicated system calls. Common use cases for ioctl include configuring serial ports, managing terminal settings, or interacting with various hardware devices.

The ioctl function has the following signature:

int ioctl(int fd, unsigned long request, ...);

fd: The file descriptor for the device or file you want to perform the control operation on. request: A constant that specifies the control operation to be performed.

…: Zero or more additional arguments, depending on the specific request used. These arguments are typically used to pass data to or receive data from the ioctl call.

The ioctl function returns 0 on success and -1 on failure. When it fails, the errno variable is set to indicate the error code.

Example: Traverse Directories

Here’s an example C program that traverses a directory recursively and puts all files found in the directories into an array:

#include <stdio.h>
#include <stdlib.h>
#include <dirent.h>
#include <string.h>
#include <sys/stat.h>

#define MAX_FILES 1024

void traverse_directory(char *dir_path, char **files, int *file_count);

int main(int argc, char *argv[]) {
    char *dir_path = argv[1];
    char **files = malloc(MAX_FILES * sizeof(char *));
    int file_count = 0;

    traverse_directory(dir_path, files, &file_count);

    for (int i = 0; i < file_count; i++) {
        printf("%s\n", files[i]);
        free(files[i]);
    }
    free(files);

    return 0;
}

void traverse_directory(char *dir_path, char **files, int *file_count) {
    DIR *dir;
    struct dirent *entry;
    struct stat statbuf;

    if ((dir = opendir(dir_path)) == NULL) {
        fprintf(stderr, "Error opening directory %s\n", dir_path);
        exit(EXIT_FAILURE);
    }

    while ((entry = readdir(dir)) != NULL) {
        char path[1024];
        sprintf(path, "%s/%s", dir_path, entry->d_name);

        if (stat(path, &statbuf) == -1) {
            fprintf(stderr, "Error getting file stats for %s\n", path);
            continue;
        }

        if (S_ISDIR(statbuf.st_mode)) {
            if (strcmp(entry->d_name, ".") == 0 || strcmp(entry->d_name, "..") == 0) {
                continue;
            }

            traverse_directory(path, files, file_count);
        } else if (S_ISREG(statbuf.st_mode)) {
            char *filename = malloc(strlen(path) + 1);
            strcpy(filename, path);
            files[(*file_count)++] = filename;
        }
    }

    closedir(dir);
}

This program takes a directory path as a command-line argument and recursively traverses the directory, adding any files found to a dynamically allocated array of strings. The traverse_directory function takes a directory path, the array of files, and a pointer to an integer that keeps track of the number of files in the array. It opens the directory, reads each entry, and determines whether it is a file or directory. If it is a directory, the function recursively calls itself on the directory. If it is a file, the function adds the file path to the array of files.

Finally, the program prints out each file path in the array and frees the memory allocated for each string in the array.

Note that this program allows for a maximum of 1024 files in the files array. You need to increase or decrease this value as needed.

Conclusion

The Unix file system is a hierarchical file system used in Unix-like operating systems. Inodes are used to store information about files and directories, and C programming language provides functions for manipulating and accessing files,

All Files for Lesson 8.301

Acknowledgements

Elements of the lesson were generated by OpenAI’s GPT-3.5 and GPT-4.

Errata

Let us know.

8.301
The UNIX File System and its Manipulation in C

Martin Schedlbauer, PhD

2024-02-14

Objectives

Introduction

Unix File System

Inodes

Manipulating and Accessing Files and Directories

File Descriptors vs File Structures

I/O Control with `ioctl`

Example: Traverse Directories

Conclusion

Acknowledgements

Errata

8.301The UNIX File System and its Manipulation in C

Martin Schedlbauer, PhD

2024-02-14

Objectives

Introduction

Unix File System

Inodes

Manipulating and Accessing Files and Directories

File Descriptors vs File Structures

I/O Control with ioctl

Example: Traverse Directories

Conclusion

Acknowledgements

Errata

8.301
The UNIX File System and its Manipulation in C

I/O Control with `ioctl`