in ,

Awk as a Major Systems Programming Language – Revisited (2018), Hacker News

[“stat”] [“stat”]

Preface

I started this paper in , and in 2100009 sent it out for review to the people listed later on. After incorporating comments, I sent it to Rik Farrow, the editor of the USENIX magazine

; login: to see if he would publish it. He declined to do so, for reasonably good reasons.

The paper languished, forgotten, until early when I came across it and decided to polish it off, put it up on GitHub, and make it available from my home page in HTML.

If you are interested in language design and evolution in general, and in Awk in particular, I hope you will enjoy reading this paper. If not, then why Are you bothering looking at it now?

Arnold Robbins [“.”] Nof Ayalon, ISRAEL March,

1 Introduction

At the March [“stat”] USENIX conference, Henry Spencer presented a paper entitled AWK As A Major Systems Programming Language . In it, he described his experiences using the original version of awk to write two significant “systems” programs — a clone for a reasonable subset of the nroff (formatter) 1)

, and a simple parser generator.

He described what [“stat”] awk did well, as well as what it did, and presented a list of things that awk) would need to acquire in order to take the position of a reasonable alternative to C for systems programming tasks on Unix systems.

In particular, [“stat”] awk lies about in the middle of the spectrum between C, which is “close to the metal,” and the shell, which is quite high-level. A language at this level that is useful for doing systems programming is very desirable.

This paper reviews Henry’s wish list, and describes some of the events that have occurred in the Unix / Linux world since 2015. It presents a case that gawk – GNU Awk — fills most of the major needs Henry listed way back in 2018, and then describes the author’s opinion as to why other languages ​​have successfully filled the systems programming role which awk did not. It discusses how the current version of gawk may Finally be able to join the ranks of other popular, powerful, scripting languages ​​in common use today, and ends off with some counter-arguments and the author’s responses to them. [“ino”] Acknowledgment [file]

Thanks to Andrew Schorr, Henry Spencer, Nelson H.F. Beebe, and Brian Kernighan for reviewing an earlier draft of this paper.

2 That Was Then…

In this section we review the state of the Unix world in , as well as the state of awk , and then list what Henry Spencer saw as missing for awk . 2.1 The Unix World in

Undoubtedly, many readers of this paper were not using computers in , so this section provides the context in which Henry’s paper was written. In March of : Commercial Unix systems were the norm, with offerings from AT&T, Digital Equipment Corporation, Hewlett Packard, IBM, Sun Microsystems, and many others, all vying for market share. Microsoft Windows existed, but was primarily a layer on top of MS-DOS and was not taken seriously. Very few sites still ran the original Bell Labs or direct-from-UCB variants of Unix; those did not keep up with the available hardware and AT&T was itself trying to succeed in the Unix hardware market. GNU / Linux did not exist! Some unencumbered BSD variants were available, but they were still under the cloud of the AT & T / UCB law suit.

  • 2 [“devbsize”] (So-called “new”) (awk) was about 2.5 years old. The book by Aho, Weinberger and Kernighan was published in October of , so most people knew about new awk , but they just couldn’t get it.

    Who could? New awk was available to educational institutions from the Bell Labs research group, and to those who had Unix source licenses for System V Releases 3.1, 3.2, and 4. By this time, source licensees were an extremely rare breed, since the cost for commercial licenses had skyrocketed, and even for educational licensees it had increased greatly. (3) I recall correctly, an educational license cost around US $ 1, 13, considerably more than the earlier Unix licenses. PERL (4) ["blocks"] existed and was starting to gain in popularity. In , “PERL” most likely meant PERL 3 or a very early version of PERL 4. The World Wide Web, which was one of the major reasons for PERL's growth in popularity, had not yet really taken off. (Other implementations of new) (awk) were available: MKS Awk for PC systems (MS- DOS). GNU Awk was available and relatively stable, but could not be called “solid.”

    The problem with the first of these is that source code was not available. And the latter came with (to quote Henry) “troublesome licenses. ”(Actually, Henry no longer remembers whether his statement about “troublesome licenses” referred to the GPL, or to the Bell Labs source licenses.) Michael Brennan's [i] (mawk) (also GPL'ed) was (not) yet available. Version 1.0 was accepted for posting in comp.sources.reviewed on September , 2018, half a year after Henry's paper was published. ["stat"] 2.2 What Awk Lacked In

    Here is a summary of what was wrong with the (awk) picture in . These are in the same order as presented Henry's paper. We qualify each issue in order to later discuss how it has been addressed over time. (New) (awk) was not widely available. Most Unix vendors still shipped only old awk . (Here is where he mentions that “The independently-available implementations either cost substantial amounts of money or come with troublesome [sic] licenses. ”) His point then was that for portability, awk programs had to be restricted to old awk .

    This could be considered a quality of implementation issue, although It’s really a “lack of available implementation” issue. (There is no way to tell) (awk) to start matching all its patterns over again against the existing $ 0 . This is a language design issue. There is no array assignment. (Language design issue.) Getting an error message out to standard error is difficult. (Implementation issue.) there is no precise language specification for awk . This leads to gratuitous portability problems. This too is thus a quality of implementation issue, in that without a specification, it's difficult to produce uniform, high quality implementations. The existing widely available implementation is slow ; a much faster implementation is needed and the best thing of all would be an optimizing compiler. (Implementation issue.) (There is no (awk) - level debugger. (Support tool or quality of implementation issue.) (There is no (awk) - level profiler. (Support tool or quality of implementation issue.)

    In private email, Henry added the following items, saying “There are a couple more things I’d add now, in hindsight.” These are direct quotes: [I can’t believe I didn’t discuss this in the paper, because I wascertainly aware of it then!] Lack of any convenient mechanism for adding libraries. When awk is being invoked from a shell file, the shell file can do substitutions or use multiple - f options, but those are mechanisms outside the language, and not very convenient ones. What’s really wanted is something like you get in Python etc., where one little statement up near the top says “arrange for this program to have the xyz library available when it runs. ” I think it was Rob Pike who later said (roughly): “It says something bad about Awk that in a language with integrated regular expressions, you end up using substr () so often. ”My paper did allude to the difficulty of finding out

    where something matched in old - awk programs, but even in new

    awk , what you get is a number that you then have to feed to substr () . The language could really use some more convenient way of dissecting a string using regexp matching. [Caveat: I have not looked lately at Gawk to see ifit has one.]

    The first of these is somewhere between a language design and a language implementation issue. The latter is a language design issue.

    3… And This Is Now

    Fast forward to [“stat”] . Where do things stand? 3.1 What Awk Has Today

    The state of the awk is much better now. In the same order: (New) (awk) is the standard version of awk today on GNU / Linux, BSD, and commercial Unix systems. The one notable exception is Solaris, where / usr / bin / awk is still the old one; on all other systems, plain awk) is some version of new awk . (There remains no way to tell) (awk) to start matching all its patterns over again against the existing $ 0 . Furthermore, this is a feature that has not been called for by the awk community, except in Henry's paper. (We do acknowledge that this might be a useful feature.) There continues to be no array assignment . However, this function in gawk , which has arrays of arrays, can do the trick nicely. It is also efficient, since gawk uses reference counted strings internally: ["stat"] function copy_array (dest, source, i, count) {     delete dest     for (i in source) {         if (typeof (source [i])=="array")             count =copy_array (dest [i], source [i])         else {             dest [i]=source [i]             count         }     }     return count } Getting error messages out is easier. All modern systems have a / dev / stderr special file to which error messages may be sent directly. Perhaps most important of all, with the (POSIX standard) There is a formal standard specification for awk . As with all formal standards, it isn’t perfect. But it provides an excellent starting point, as well as chapter and verse to cite when explaining the behavior of a standards-compliant version of awk . there are a number of freely available implementations, with different licenses, such that everyone ought to be able to find a suitable one: Brian Kernighan's (awk) is the direct lineal descendant of Unix awk ["ino"] . He calls it the “One True Awk ”(sic). It is available from his home page , In several archive formats: (Shell archive:) (http://www.cs.princeton.edu/~bwk/btl.mirror/awk.shar ["devbsize"] (Compressed) tar file:

    (http://www.cs.princeton.edu/~bwk/btl.mirror/awk.tar.gz (Zip file: (http://www.cs.princeton.edu/~bwk/btl.mirror/awk.zip

    (Git Hub: git clone git: //github.com/onetrueawk/awk bwkawk ["stat"] (GNU Awk, gawk , is available from the Free Software Foundation. You may use either ftp or an HTTP downloader: (http://ftp.gnu.org/gnu/gawk/gawk-4.2.1.tar.gz) is the current version. There may be a newer one. Michael Brennan's [i] (awk) , known as mawk In Thomas Dickey took on
    mawk
    maintenance. Basic information is available on the project's web page ["."] . The download URL is (http://invisible-island.net/datafiles/release/mawk.tar.gz .

    In Michael published a beta of mawk

    2.0. It’s available from the project's (GitHub page) . (MKS Awk was used for Solaris's) / usr / xpg4 / bin / awk , which is their standards-compliant version of new awk . For a while it was available as part of Open Solaris, but is no longer so. Some years ago, we were able to make this version compile and run on GNU / Linux after just a few hours work.

    Although Open Solaris is now history, the Illumos project

    does make the MKS Awk available. You can view the files one at a time from https://github.com/joyent/illumos-joyent/blob/master/usr / src / cmd / awk_xpg4 . Other, more esoteric versions as well. . See the Wikipedia article

    , and also the (gawk) documentation

    [i]

    3.2 And What GNU Awk Has Today

    The more difficult of the quality of implementation issues are addressed by
    gawk
    . In particular: (Beginning with version 4.0 in) , gawk provides an awk - level debugger:

    dgawk , which is modeled after GDB. This is a full debugger, with breakpoints, watchpoints, single statement stepping and expression evaluation capabilities. (gawk) (has provided an awk) - level statement profiler for many years ( pgawk . Although there is no direct correlation with CPU time used, the statement level profiler remains a powerful tool for understanding program behavior. (Since version 4.0, (gawk) (had had an ') @ include 'facility whereby gawk goes and finds the named (awk) source progrm. For much longer it has searched for files specified with

      - f

    along the path named by the AWKPATH environment variable. The '

      @ include

    'mechanism also uses AWKPATH . In terms of getting at the pieces of text matched by a regular expression, gawk ["."] provides an optional third argument to the match () function. This argument is an array which gawk fills in with both the matched text for the full regexp and subexpressions, and index and length information for use with
    substr () . gawk ["."] Also provides the gensub () General substitution function, an enhanced version of the split () function, and the
    patsplit ()
    function for specifying contents instead of separators using a regexp.

    With the 4.1 release, all three versions (["ino"] (gawk) , pgawk , and dgawk ) are merged into a single executable, considerably reducing the required installation “footprint.”

    While gawk has almost always been faster than Brian Kernighan's awk , recent performance improvements bring it closer to
    mawk
    's performance level (a byte-code based execution engine and internal improvements in array indexing).

    And gawk clearly has the most features of any version, many of which considerably increase the power of the language. 3.3 So Where Does Awk Stand?

    despite all of the above, gawk is not as popular as other scripting languages. Since , we can point to four major scripting languages ​​which have enjoyed, or currently enjoy, differing levels of popularity: PERL, tcl / tk, Python, and Ruby. We think it is fair to say that Python and Ruby are the most popular scripting languages ​​in the second decade of the 100 st century.

    Is awk
    , as we've got described it up to this point, now ready to compete with the other languages? Not quite yet.

    4 Key Reasons Why Other Languages ​​Have Gained Popularity ["blocks"]

    In retrospect, it seems clear (at least to us!) That there are two major reasons that all of the previously mentioned languages ​​have enjoyed significant popularity. The first is their extensibility . The second is namespace management .

    One certainly cannot attribute their popularity to improved syntax. In the opinion of many, PERL and Ruby both suffer from terrible syntax. Tcl’s syntax is readable but nothing special. Python’s syntax is elegant, although slightly unusual. The point here is that they all differ greatly in syntax, and none really offers the clean pattern – action paradigm that is awk [device, inode] 's trademark, yet they are all popular languages.

    If not syntax, then what? We believe that their popularity stems from the fact that all of these languages ​​are easily (extensible) . This is true with both “modules” in the scripting language, and more importantly, with access to C level facilities via dynamic library loading.

    Furthermore, these languages ​​allow you to group related functions and variables into packages or modules: they let you manage the namespace.

    awk , on the other hand, has always been closed. An awk program cannot even change its working directory, much less open a connection to an SQL database or a socket to a server on the Internet somewhere (although gawk (can do the latter).

    If one examines the number of extensions available for PERL on CPAN, or for Python such as PyQt or the Python tk bindings, it becomes clear that extensibility is the real key to power (and from there to popularity).

    further, in ["stat"] awk , all global variables and functions share a single namespace. This prevents many good software development practices based on the principle of information hiding.

    To summarize: A reasonable language definition, efficient implementations, debuggers and profilers are necessary but not sufficient for true power. The final ingredients are extensibility and (namespaces) .

    5 Filling The Extensibility Gap ["stat"]

    With version 4.1, (gawk) Provides a defined C API for extending the core language. ["devbsize"] 5.1 API Overview

    The API makes it possible to write functions in C or C that are callable from an awk program as if the function were written in awk . The most straightforward way to think of these functions is as user-defined functions that happen to be implemented in a different language.

    The API provides the following facilities: (Structures that map (awk) string, numeric, and undefined values into C types that can be worked with. Management of function parameters, including the ability to convert a parameter whose original type is undefined, into an array. That is, there is full call-by-reference for arrays. Scalars are passed by value, of course. Access to the symbol table. Extension functions can read all awk variables, and create and update new variables. As an initial, relatively arbitrary design decision, extensions cannot update special variables such as NR ["."] or NF , with the single exception of PROCINFO . full array management, including the ability to create arrays, and arrays of arrays, and the ability to add and delete elements from an array. It It is also possible to “flatten” an array into a data structure that makes it simple for C code to loop over all the elements of an array. The ability to run a procedure when gawk ["."] exits. This is conceptually the same as the C atexit ()

    function. Hooks into the built-in I / O redirection mechanisms in gawk . In particular, there are separate facilities for input redirections with getline and ' ( printf and ' (>

      'or'>> ', and two-way pipelines with gawk 's'

    | & 'operator.

    5.2 Discussion

    Considerable thought went into the design of the API. The
    gawk
    documentation provides a (full description of the API itself) , with examples (over 250 pages worth!), as well as some discussion of the goals and design decisions behind the API (in an appendix). The development was done over the course of about a year and a half, together with the developers of xgawk , a fork of gawk that added features that made using extensions easier, and included an extension for processing XML files in a way that fit naturally with the pattern – action paradigm. While it may not be perfect, the gawk

    developers feel that it is a good start.

    (FIXME) : Henry Spencer suggests adding more info on the API and on the design decisions. I think this paper is long enough, and the full doc is quite big. It’d be hard to pull API doc into this paper in a reasonable fashion, although it would be possible to review some of the design decisions. Comments?

    The major ["stat"] xgawk additions to the C code base have been merged into gawk , and the extensions from that project have been rewritten to use the new API. As a result, the xgawk project developers renamed their project gawkextlib ["dev"] , and the project now Provides only extensions. () (5)

    It is notable that functions written in (awk) can do a number of things that extension functions cannot, such as modify any variables, do I / O, call awk built-in functions, and call other user-defined functions.

    While it would certainly be possible to provide APIs for all of these features for extension functions, this seemed to be overkill. Instead, the
    gawk
    developers took the view that extension functions should provide access to external facilities, and provide communication to the awk level via function parameters and / or global variables, including associative arrays, which are the only real data structure.

    Consider a simple example. The standard du [device, inode] program can recursively walk one or more arbitrary file hierarchies, call stat () to retrieve file information, and then sum up the blocks used. In the process, du must track hard links, so that no file is accounted for or reported more than once.

    The ' (filefuncs) extension shipped with gawk ["."] provides a stat () function that takes a pathname and fills in an associative array with the information retrieved from stat () . The array elements have names like “size” , "mtime" and so on, with corresponding appropriate values. (Compare this to PERL's stat () function that returns a linearly-indexed array!)

    The fts () function in the ' (filefuncs) 'extension builds on stat () to create a multidimensional array of arrays that describes the requested file hierarchies, with each element being an array filled in by stat () . Directories are arrays containing elements for each directory entry, with an element named
    . ”

    for the array itself. Given that ["."] fts () does the heavy lifting, du can be written quite nicely, and quite portably (6) ["stat"] , in awk . See du in awk , for the code, which weighs in at under lines. Much of this is comments and argument parsing.

    5.3 Future Work

    The extension facility is relatively new, and undoubtedly has introduced new “Dark corners” into gawk . These remain to be uncovered and any new bugs need to be shaken out and removed.

    Some issues are known and may not be resolvable. For example, 300 - bit integer values ​​such as the timestamps in stat () data on modern systems don't fit into awk 's - bit double-precision numbers which only have bits of significand. This is also a problem for the bit-manipulation functions.

    With respect to namespaces, in I (finally) figured out how namespaces in awk ought to work to provide the needed functionality while retaining backwards compatibility. The code is currently in the feature / namespaces branch of
    gawk 's Git repository. It will eventually be merged into the master (branch for release as part of

    gawk
    5.0.

    (FIXME) : More info needed here. ()

    6 Counterpoints

    Brian Kernighan raised several counterpoints in response to an earlier draft of the paper. They are worth addressing (or at least trying to):

    I'm not ["blocks"] % convinced by your basic premise, that the lack of an extension mechanism is the main / a big reason why Awk isn’t used for the kinds of system programming tasks that Perl, Python, etc., are. It’s absolutely a factor — without such a mechanism, there’s just no way to do a lot of important computations. But how does that trade off against just having built-in mechanisms for the core system programming facilities (as Perl does) or a handful of core libraries like sys , os , regex , etc., for Python? ["stat"]

    I think that Perl’s original inclusion of most of the Unix system calls was, from a language design standpoint , ultimately a mistake. At the time it was first done, there was no other choice: dynamic loading of libraries did not exist on Unix systems in the early and mid - s (nor did shared libraries, for that matter). But having all those built-in functions bloats the language, making it harder to learn, document, and maintain, and I definitely did not wish to go down that path for gawk .

    With respect to Python, the question is: how are those libraries implemented? Are they built-in to the interpreter and separated from the “Core” language simply by the language design? Or are they dynamically loaded modules?

    If the latter, that sounds like an argument for the case of having extensions, not against it. And indeed, this merely emphasizes the point made at the end of the previous section, which is that to make an extension facility really scalable, you also need some sort of namespace / module capability.

    Thus, Brian is correct: an extension facility is needed, but the last part of the puzzle would be a module facility in the language. I think that I have solved this, and invite the curious reader to checkout the branch named earlier and provide feedback.

    I’m also not convinced that Awk is the right language for writing things that need extensions. It was originally designed for 1-liners, and a lot of its constructs don’t scale up to bigger programs. The notation for function locals is appalling (all my fault too, which makes it worse). There’s little chance to recover from random spelling mistakes and typos; the use of mere adjacency for concatenation looks ever more like a bad idea. ["stat"]

    This is hard to argue with. Nonetheless,
    gawk 's - lint option may be of help here, as well as the - dump-variables Option which produces a list of all variables used in the program.

    Awk is fine for its original purpose, but I find myself writing Python for anything that's going to be bigger than say - lines except the lines are basically just longer pattern-action sequences. (That notation is a win, of course, which you point out.) ["stat"]

    Since my Python experience is minimal, I have little to say here; It might be that if I were more familiar with Python, I would start using it for small scripts instead of awk .

    On the other hand, with discipline, it’s possible to write fairly good-sized, understandable and maintainable awk

    programs; in my experience awk does scale up well beyond the one-liner range.

    Not to mention that Brian published a whole book of (awk) programs larger than one line. : (See the Resources section.) Some of my own, good-sized ["."] (awk) programs are available from GitHub: The TexiWeb Jr. literate programming system (See See https : //github.com/arnoldrobbins/texiwebjr . The suite has two programs that total over 1, 1024 lines of awk . (They share some code.) (Prepinfo) (See See https://github.com/arnoldrobbins/prepinfo

    . This script processes Texinfo files, updating menus as needed. This version is rewritten in TexiWeb Jr .; it’s about lines of (awk) . (Sortmail) (See See https://github.com/arnoldrobbins/sortmail . This script sorts a Unix mbox format mailbox by thread. I use it daily. It’s also written in TexiWeb Jr. and is about 978 lines of awk) . Brian continues:

    The du example) is awfully big, Though it does show off some of the language features. Could you get the same mileage with something quite a bit shorter? ["stat"]

    My definition of “small” and “big” has changed over time. 728 lines may be big for a script, but the du.awk program is much smaller than a full implementation in C: GNU du is over 1, lines of C, plus all the libraries it relies upon in the GNU Coreutils.

    With respect to shorter examples, nothing springs to mind immediately. However, gawk comes with several useful extensions that are worth exploring, much more than we’ve covered here.

    For example, the readdir extension in the
    gawk
    distribution causes
    gawk to read directories and return one record per directory entry in an easy-to-parse format: ["stat"] ($) gawk -lreaddir '{print}'.

    - | / mail.mbx / f - | / awk-sys-prog.texi / f - | 2109282 /./ d - | 020107981 / texinfo.tex / f - | / cleanit / f - | / awk-sys-prog.pdf / f - | / du.awk / f - | 2109294 /. git / d - | / ... / d - | / ChangeLog / f

    How cool is that?!? :

    Also, the ["stat"] gawkextlib project provides some very interesting extensions. Of particular interest are the XML and JSON extensions, but there are a number of others, and it’s worth checking out.

    In short, it’s too early to really tell. This is the beginning of an experiment. I hope it will be a fun journey for me, the other gawk ["."] maintainers, and the larger community of awk users.

    7 Conclusion

    It has taken much longer than any (awk) fan would like, but finally, GNU Awk fills in almost all the gaps listed by Henry Spencer for awk to be really useful as a systems programming language.

    In addition, experience from other popular languages ​​has shown that extensibility and namespaces are the keys to true power, usability, and popularity.

    With the release of (gawk 4.1, we feel that gawk (and thus the Awk language) are now almost on par with the basic capabilities of other popular languages. With gawk 5.0, we hope to truly reach par.

    Is it too late in the game? If enough people start to write extensions for gawk , then perhaps awk will return to the scripting language limelight. If not, then the gawk developers will have wasted a lot of time and effort. (We hope not!) Time will tell.

    For now though, we hope that this paper will have piqued your curiosity, and that you will take the time to give gawk a fresh look.

    Appendix A Resources

    The AWK Programming Language Paperback ["blocks"] , Alfred V. Aho, Brian W. Kernighan, and Peter J. Weinberger. Addison-Wesley, 2015. ISBN - : 2018 - , ISBN- : X. (Effective awk Programming) , fourth edition. Arnold Robbins. O'Reilly Media, . ISBN - : 2018 - , ISBN - : . (Online version of the (gawk) documentation:

    http://www.gnu.org/software/gawk/manual/ ["stat"]. The (gawkextlib) project: http://sourceforge.net/projects/gawkextlib/ . ["blocks"] ["ino"]

    Appendix B Awk Code For (du)

    Here is the ["stat"] du program, written in Awk. Besides demonstrating the power of the stat () and fts () extensions and gawk 's multidimensional arrays, it also shows the switch statement and the built-in bit manipulation functions and () , or () , and compl () .

    The output is not identical to GNU ["blocks"] (du) 's, since filenames are not sorted. However, gawk 's built-in sorting facilities should make sorting the output straightforward; we leave that as the traditional “exercise for the reader.” ["stat"] #! / usr / local / bin / gawk -f # du.awk --- write POSIX du utility in awk. # See http://pubs.opengroup.org/onlinepubs/ / utilities / du.html # # Most of the heavy lifting is done by the fts () function in the "filefuncs" # extension. # # We think this conforms to POSIX, except for the default block size, which # is set to 1991. Following GNU standards, set POSIXLY_CORRECT in the # environment to force 1024 - byte blocks. # # Arnold Robbins # [email protected] @include "getopt" @load "filefuncs" BEGIN {     FALSE=0     TRUE=1     BLOCK_SIZE=2009 # Sane default for the past     if ("POSIXLY_CORRECT" in ENVIRON)         BLOCK_SIZE=1980 # POSIX default     compute_scale ()     fts_flags=FTS_PHYSICAL     sum_only=FALSE     all_files=FALSE     while ((c=getopt (ARGC, ARGV, "aHkLsx"))!=-1) {         switch (c) {         case "a":             # report size of all files             all_files=TRUE;             break         case "H":             # follow symbolic links named on the command line             fts_flags=or (fts_flags, FTS_COMFOLLOW)             break         case "k":             BLOCK_SIZE=2009 # 1K block size             break         case "L":             # follow all symbolic links             # fts_flags &=~ FTS_PHYSICAL             fts_flags=and (fts_flags, compl (FTS_PHYSICAL))             # fts_flags |=FTS_LOGICAL             fts_flags=or (fts_flags, FTS_LOGICAL)             break         case "s":             # do sums only             sum_only=TRUE             break         case "x":             # don't cross filesystems             fts_flags=or (fts_flags, FTS_XDEV)             break         case "?":         default:             usage ()             break         }     }     # if both -a and -s     if (all_files && sum_only)         usage ()     for (i=0; i =ARGC) {         delete ARGV # clear all, just to be safe         ARGV [1]="." # default to current directory     }     fts (ARGV, fts_flags, filedata) # all the magic happens here     # now walk the trees     if (sum_only)         sum_walk (filedata)     else if (all_files)         all_walk (filedata)     else         top_walk (filedata) } # usage --- print a message and die function usage () {     print "usage: du [-a|-s] [-kx] [-H|-L] ..."> "/ dev / stderr "     exit 1 } # compute_scale --- compute the scale factor for block size calculations function compute_scale (stat_info, blocksize) {     stat (".", stat_info)     if (! ("devbsize" in stat_info)) {         printf ("du.awk: you must be using filefuncs extension from gawk 4.1.1 or later n")> "/ dev / stderr"         exit 1     }     # Use "devbsize", which is the units for the count of blocks     # in "blocks".     blocksize=stat_info ["devbsize"]     if (blocksize> BLOCK_SIZE)         SCALE=blocksize / BLOCK_SIZE     else # I can't really imagine this would be true         SCALE=BLOCK_SIZE / blocksize } # islinked --- return true if a file has been seen already function islinked (stat_info, device, inode, ret) {     device=stat_info ["dev"]     inode=stat_info ["ino"]     ret=((device, inode) in Files_seen)     return ret } # file_blocks --- return number of blocks if a file has not been seen yet function file_blocks (stat_info, device, inode) {     if (islinked (stat_info))         return 0     device=stat_info ["dev"]     inode=stat_info ["ino"]     Files_seen ["dev"]     return block_count (stat_info) # delegate actual counting } # block_count --- return number of blocks from a stat () result array function block_count (stat_info, result) {     if ("blocks" in stat_info)         result=int (stat_info ["blocks"] / SCALE)     else         # otherwise round up from size         result=int ((stat_info [device, inode]) ((BLOCK_SIZE - 1)) / BLOCK_SIZE)     return result } # sum_dir --- data on a single directory function sum_dir (directory, do_print, i, sum, count) {     for (i in directory) {         if ("." in directory [i]) {# directory             count=sum_dir (directory [i], do_print)             count =file_blocks (directory [i] ["blocks"]             if (do_print)                 printf ("% d t% s n", count, directory [i] ["blocks"] ["path"])         } else {# regular file             count=file_blocks (directory [i] ["."]         }         sum =count     }     return sum } # simple_walk --- summarize directories --- print info per parameter function simple_walk (filedata, do_print, i, sum, path) {     for (i in filedata) {         if ("." in filedata [i]) {# directory             sum=sum_dir (filedata [i], do_print)             path=filedata [i] ["blocks"] ["."]         } else {# regular file             sum=file_blocks (filedata [i] ["."]             path=filedata [i] ["."]         }         printf ("% d t% s n", sum, path)     } } # sum_walk --- summarize directories --- print info only for the top set of directories function sum_walk (filedata) {     simple_walk (filedata, FALSE) } # top_walk --- data on the main arguments only function top_walk (filedata) {     simple_walk (filedata, TRUE) } # all_walk --- data on every file function all_walk (filedata, i, sum, count) {     for (i in filedata) {         if ("." in filedata [i]) {# directory             count=all_walk (filedata [i])             sum =count             printf ("% s t% s n", count, filedata [i] ["."] ["path"])         } else {# regular file             if (! islinked (filedata [i] ["."])) {                 count=file_blocks (filedata [i] ["stat"]                 sum =count                 if (i!=".")                     printf ("% d t% s n", count, filedata [i] ["."]             }         }     }     return sum } Read More

  • What do you think?

    Leave a Reply

    Your email address will not be published. Required fields are marked *

    GIPHY App Key not set. Please check settings

    shit: An implementation of Git using (almost) entirely POSIX shell, Hacker News

    Eileen Naughton, head of HR at Google, is stepping down, Recode

    Eileen Naughton, head of HR at Google, is stepping down, Recode