Recursive Directory Listing - dirls


As a final project, we will be using the FileSystem Interface (API) to develop a program that is a limited version of ls but does ls -R by default. that takes arguments from the command line that includes a set of flags such as -l, -a, -f, and -d that prints only the directory. Use getopt to parse the arguments rather than manually doing it. This is a useful function for writing small, professional looking programs

Your program should have at least one usage function that prints the a line showing the syntax of the program and then a list of arguments and how they modify the function of the program.


[harazduk@storm cs3595]$ dirls -h
Usage: dirls [(-[adflh]+) (dir)]*
        -a: include dot files
        -f: follow symbolic links
        -d: only this directory
        -l: long form
        -h: prints this message

Make sure to create a struct/class to hold the various arguments in combination and pass in the struct to your function that navigates the directory recursion. This exercise is designed to learn something about the Linux File System Interface. Manual pages for POSIX API is given below.

Samples

The program should be able to accept multiple argument lists, each made up of a -flagstring plus a dirname. If the last dirname is left out, it should assume the current working directory (pwd).

dirls -la
dirls -l testdir -al ../anotherdir ../andanother
dirls -lf testdir -d

If no directory is given, use the current working directory. You can restrict this to the last option if that is easier.


Program Flags

There are many projects online that do something like this assignment. It is fine to include code that you find online for pieces of the assignment. As professional programmers, we get ideas from code that we find online all the time. Please include a link to the original code in the comments giving credit to the original author.


Parsing Command Line Arguments - getopt or getopt_long

One of the most useful tools in a programmer's bag of tricks is the getopt function in unistd.h or getopt_long in getopt.h. These functions provide generic parsing of argument lists. They return the opt which is the flag and sets a global variable optarg to the corresponding option if one is required. To specify the set of options, give getopt a list of single character flags in a string. If any option requires an argument, put a ':' after the option.

NOTE: Using getopt for this turned out to be harder than I thought. Interleaved options and input arguments requires some thought to implement. You need to be able to detect when you've come to the end of an option arg to know that the next arg is an input. First, see the example in the GNU Manual GNU Example of getopt then read the description below.

More Getopt Details

The getopt function returns the next option as a character to be used in a switch. I had an example that I worked with:
[harazduk@storm cs3595]$ dirls -laf ../dirls -d .
For our application, the call would be getopt(argc, argv, "adfl"). getopt returns -1 if there are no more option args. There can still be other input args but no more option args. Also, getopt returns the index of the next arg in optind. This was the key to figuring out how to do interleaved option args and input args.

When the argv string is "-laf",   while ((opt = getopt(argc, argv, "adfl")) != -1) returns

Test at the bottom of the loop, if argv[optind][0] != '-', then the next arg is an input argument not an option argument. You can use this to grab all of the directories and files that are interleaved with the option arguments.

For this project, none of the options require additional arguments. The getopt and getopt_long functions are part of the POSIX Interface. As a result, they can be used in either C or C++. A more object-oriented approach to this can be found in the Boost library. As of C++17, many of the Boost library functions have been incorporated into the C++ Runtime Library but they may not be available on storm. Double check before making any assumptions.

Create a class that allows you to reference the flags that during the execution of the program. In fact, since mulitple argument lists can be supplied to the program, a vector of your class objects will allow you to apply a different set of arguments for each directory argument. It is also possible to supply more than one directory for a flag string. Make a copy of the previous arg object and only change the directory name when that happens.

NOTE: If you do use something other than C++11, please include a makefile or a readme in your submission stating how to build the program.

GETOPT(3)

NAME getopt, getopt_long, getopt_long_only, optarg, optind, opterr, optopt - Parse command-line options SYNOPSIS #include <unistd.h> int getopt(int argc, char * const argv[], const char *optstring); extern char *optarg; extern int optind, opterr, optopt; OR #include <getopt.h> int getopt_long(int argc, char * const argv[], const char *optstring, const struct option *longopts, int *longindex); int getopt_long_only(int argc, char * const argv[], const char *optstring, const struct option *longopts, int *longindex); extern char *optarg; extern int optind, opterr, optopt;

See the following reference page for documentation on getopt_long which includes documentation for getopt as well.


Directory entries - dirent.h

Unless we find a way to use C++17 FileSystem interface, we will be using the POSIX interface that is referenced in the dirent.h manpage below. To use these functions, include dirent.h in your program. Assume that you will have to call opendir and readdir to open and read any directory. Use the macros associated with the struct dirent d_type field to determine if any entry is a file, directory or link:

In addition, DO NOT TRAVERSE '.' or '..' when they come up im the list to avoid infinite recursion.

The function that traverses the entries can be a recursive function. It might make sense to keep a list (vector) of the entries as the directories are traversed.
DIRENT.H(7)

NAME
       dirent.h — format of directory entries

SYNOPSIS
       #include <dirent.h>

DESCRIPTION
       The internal format of directories is unspecified.

		The <dirent.h> header shall define the following type:

       DIR     A type representing a directory stream. The DIR type may be an incomplete type.

       It shall also define the structure dirent which shall include the following members:


           ino_t  d_ino       File serial number.
           char   d_name[]    Filename string of entry.


       The <dirent.h> header shall define the ino_t type as described in <sys/types.h>.

       The array d_name is of unspecified size, but shall contain a filename of at most {NAME_MAX} bytes followed by a  terminating
       null byte.

       The following shall be declared as functions and may also be defined as macros. Function prototypes shall be provided.


           int            alphasort(const struct dirent **, const struct dirent **);
           int            closedir(DIR *);
           int            dirfd(DIR *);
           DIR           *fdopendir(int);
           DIR           *opendir(const char *);
           struct dirent *readdir(DIR *);
           int            readdir_r(DIR *restrict, struct dirent *restrict,
                              struct dirent **restrict);

The program will need to use opendir and readdir. It should also remember to closedir when done traversing a directory. See the following reference for more information about dirent.h


Traversing Symbolic Links

To traverse a symbolic link, the program will have to resolve the link to the file being referenced. The realpath function will return the absolute path for a symbolic link. It would be wonderful if the program printed symbolic links as the link -> realpath. EXTRA CREDIT: Only print the relative path of the file from the link.

REALPATH(2)

NAME
       realpath - return the canonicalized absolute pathname

SYNOPSIS

       #include <limits.h>
       #include <stdlib.h>

       char *realpath(const char *path, char *resolved_path);


DESCRIPTION
       realpath()  expands  all symbolic links and resolves references to /./, /../ and extra '/' characters in the null-terminated
       string named by path to produce a canonicalized absolute pathname.  The resulting pathname is stored  as  a  null-terminated
       string,  up to a maximum of PATH_MAX bytes, in the buffer pointed to by resolved_path.  The resulting path will have no sym‐
       bolic link, /./ or /../ components.

       If resolved_path is specified as NULL, then realpath() uses malloc(3) to allocate a buffer of up to PATH_MAX bytes  to  hold
       the resolved pathname, and returns a pointer to this buffer.  The caller should deallocate this buffer using free(3).

RETURN VALUE
       If there is no error, realpath() returns a pointer to the resolved_path.

       Otherwise, it returns NULL, the contents of the array resolved_path are undefined, and errno is set to indicate the error.
See the following link for more information about realpath

Long Format - stat, lstat and fstat

To implement the long directory listing (e.g. -l option), use lstat. This function fills a struct with data about the file that can be used to produce the long listing format. Use the st_mode field to generate the mode string (e.g. drwxr-xr-x). Use the st_uid and st_gid to resolve the username and groupname with getpwuid() and getgrgid() functions referenced below.

STAT(2)                                              Linux Programmer's Manual                                              STAT(2)

NAME
       stat, fstat, lstat, fstatat - get file status

SYNOPSIS

       #include <sys/types.h>
       #include <sys/stat.h>
       #include <unistd.h>

       int stat(const char *pathname, struct stat *statbuf);
       int fstat(int fd, struct stat *statbuf);
       int lstat(const char *pathname, struct stat *statbuf);

See the following reference page for more information about stat and lstat.

NOTE: It is probably best to use lstat since part of the assignment is to identify symbolic links.


Mode bits

In stat.h, several MACROS are defined. It might be hard to read that code but it is very useful for implementing the mode bits in the long form of the directory listing: drwxr-xr-x. This is the mode_t st_mode in the file meta data returned from stat (lstat or fstat). The useful macros are the following:

Masks that can be used to identify the read-write-execute bits are as follows:

Username and Groupname

In the long form, ownership of the entry should be displayed as user name and group name. The stat functions return the user id and group id and they have to be converted to user name and group name respectively. The functions getpwuid and getgrgid do this conversion.

Getting Username

Getting the username involves having the user-id, which is available on the struct stat that is returned from stat, lstat and fstat. Use stat.st_uid as the argument to getpwuid. Add the returned struct passwd* to access the pw_name field for the output string.
GETPWNAM(3)

NAME
       getpwnam, getpwnam_r, getpwuid, getpwuid_r - get password file entry

SYNOPSIS

       #include <sys/types.h>
       #include <pwd.h>

       struct passwd *getpwnam(const char *name);

       struct passwd *getpwuid(uid_t uid);

       int getpwnam_r(const char *name, struct passwd *pwd,
                      char *buf, size_t buflen, struct passwd **result);

       int getpwuid_r(uid_t uid, struct passwd *pwd,
                      char *buf, size_t buflen, struct passwd **result);


DESCRIPTION
       The  getpwnam()  function  returns  a  pointer to a structure containing the broken-out fields of the record in the password
       database (e.g., the local password file /etc/passwd, NIS, and LDAP) that matches the username name.

       The getpwuid() function returns a pointer to a structure containing the broken-out fields of  the  record  in  the  password
       database that matches the user ID uid.

       The passwd structure is defined in <pwd.h&tt; as follows:


           struct passwd {
               char   *pw_name;       /* username */
               char   *pw_passwd;     /* user password */
               uid_t   pw_uid;        /* user ID */
               gid_t   pw_gid;        /* group ID */
               char   *pw_gecos;      /* user information */
               char   *pw_dir;        /* home directory */
               char   *pw_shell;      /* shell program */
           };


       See passwd(5) for more information about these fields.

       The  getpwnam_r()  and  getpwuid_r()  functions  obtain  the  same  information  as getpwnam() and getpwuid(), but store the
       retrieved passwd structure in the space pointed to by pwd.  The string fields pointed to by the members of the passwd struc‐
       ture  are  stored  in the buffer buf of size buflen.  A pointer to the result (in case of success) or NULL (in case no entry
       was found or an error occurred) is stored in *result.
See the following reference for more information about getpwuid.

Getting Groupname

Getting the group name involves having the group-id, which is available on the struct stat that is returned from stat, lstat and fstat. Use stat.st_gid as the argument to getgruid. Add the returned struct group* to access the gr_name field for the output string.

GETGRNAME(3)

NAME
       getgrnam, getgrnam_r, getgrgid, getgrgid_r - get group file entry

SYNOPSIS

       #include <sys/types.h>
       #include <grp.h>

       struct group *getgrnam(const char *name);

       struct group *getgrgid(gid_t gid);

       int getgrnam_r(const char *name, struct group *grp,
                 char *buf, size_t buflen, struct group **result);

       int getgrgid_r(gid_t gid, struct group *grp,
                 char *buf, size_t buflen, struct group **result);


DESCRIPTION
       The  getgrnam()  function returns a pointer to a structure containing the broken-out fields of the record in the group data‐
       base (e.g., the local group file /etc/group, NIS, and LDAP) that matches the group name name.

       The getgrgid() function returns a pointer to a structure containing the broken-out fields of the record in the  group  data‐
       base that matches the group ID gid.

       The group structure is defined in  as follows:

           struct group {
               char   *gr_name;        /* group name */
               char   *gr_passwd;      /* group password */
               gid_t   gr_gid;         /* group ID */
               char  **gr_mem;         /* NULL-terminated array of pointers
                                          to names of group members */
           };

       For more information about the fields of this structure, see group(5).

       The  getgrnam_r()  and  getgrgid_r()  functions  obtain  the  same  information  as getgrnam() and getgrgid(), but store the
       retrieved group structure in the space pointed to by grp.  The string fields pointed to by the members of the  group  struc‐
       ture  are  stored  in the buffer buf of size buflen.  A pointer to the result (in case of success) or NULL (in case no entry
       was found or an error occurred) is stored in *result.
See the following reference for more information on getgrgid.

Grading Rubric

Feel free to work with other classmates or to find code on the internet to help implement this project. Always credit the source of imported code. It's a good practice to get into so that those who come after you have all of the necessary information to maintain the code. Obviously, answers to quick questions don't have to be included. Major inclusions of code should be properly documented.

Please have everyone on the team submit their own copy of the program using the submitOS script. Indicate in a comment at the top all of the team members. Everyone will get the same grade on the project.

You may also work independently on this project. It will be graded to the same standard regardless.

If special compilation instructions are required, submit a readme.txt or add a comment at the top of the program explaining how to build. You may also submit a makefile if you'd like.