in ,

If you use GNU Grep on text files, use the -a (–text) option, Hacker News

If you use GNU Grep on text files, use the -a (–text) option

April 16,

Today, I happened to notice that one of my email log scanning scripts was not reporting on a log entry that I knew was there (because another, related script was reporting it). My log scanning script starts out with a grep to filter out some things I don’t want to include:

 grep -hv 'a specific pattern' "$ @" | exigrep '...' | [...]    

I had all sorts of paranoid thoughts about whether I had misunderstood exactly what the -v option did, or if exigrep was doing something peculiar, and so on. But eventually I ran the grep itself alone on the file, piped to less , and jumped to the end in less because I happened to know that the missing entry was relatively late in the file. What I was expecting to happen is that the grep output would just stop at some point. What I actually found was simple:

  -  - 111 165:  (H=(241 ) [] [...]  -  - 111 165:  13 unexpected disconnection [...] Binary file / var / log / exim4 / mainlog matches    

Ah. Yes. How helpful. While reading along in what it had up until then thought was a text file, GNU Grep encountered some funny characters (in a DKIM signature information line, as it happened) and decided that the file was actually binary and so it wouldn't report anything more for the rest of the file than that final line.

(This is a different and much more straightforward cause than the time GNU Grep thought some text files were binary because of a filesystem bug combined with its clever tricks .

I generally like the GNU versions of standard Unix utilities and the things that they've added, but this is not one of them, especially when GNU Grep's output is not going to a terminal. Especially if it starts out initially printing out text lines, it should continue to do so rather than surprise people this way.

The valuable learning experience here is that any time I'm processing a text file with GNU Grep (which is pretty much all of the time in my scripts), I should explicitly force it to always treat things as text. This is unfortunately going to make some scripts more awkward, because sometimes I have pipelines with several greps involved as text is filtered and manipulated. Either I spray ' - a ' over all of the greps or I try to figure out what minimal LC _ environment variable will turn this off, or I reach for the gigantic hammer of ' LC_ALL=C ' (as suggested by the GNU Grep manpage).

PS: This is not just a Linux issue because GNU Grep appears on more than just Linux machines, depending on what you install and what you add to your path. A FreeBSD machine I have access to uses GNU Grep as / usr / bin / grep , for example.

Read More

What do you think?

Leave a Reply

Your email address will not be published.

GIPHY App Key not set. Please check settings

Intention, Hacker News

Intention, Hacker News

FCC blasted for “shameful” ruling against cities and fire department, Ars Technica

FCC blasted for “shameful” ruling against cities and fire department, Ars Technica