Just how does the diamond operator in Perl work?ADHERER
Many Perl books don’t go into deep detail about the diamond operator. Still others don’t put all the facts about the diamond operator together for easy digestion. So herein is an attempt to de-mystify and bring a good explanation of the diamond operator, what it does, and how it works.
The diamond operator can be one of the most powerful tools in your Perl arsenal. It allows you to pull in data from either files or STDIN without really having to decide up front where the data is coming from. The diamond operator is invoked/used the same as any file operator opened for input; you just don’t put anything between the angle brackets. It should then become obvious why it’s called the diamond operator.
my $line=<>;
There’s no real mystery to where the diamond operator gets its data. If there are filenames in @ARGV, then it will open each file in turn and will then pass each line of each file to the operator. If @ARGV is empty, then the diamond operator reads from STDIN. Easy-peasy.
The great thing is that @ARGV can be manipulated in all manner of ways before the diamond operator is used, so the sky’s the limit. For instance, let’s say you’re writing a grep replacement. The general format for calling grep is:
grep [options] [files]
Well, in your Perl program, you can’t just start reading from the diamond operator… how will you parse your options? Since they’re not valid files, Perl will complain.
Can’t open -h: No such file or directory at Diamond.pl line 12.
Somewhat fortunately (this could really bite your butt, however) Perl chugs merrily along ignoring the bad filenames. But as I said, you can manipulate @ARGV before using the diamond operator, which means you can pull out the options first. I like to do something somewhat generic:
my @files;
my %parms;
#Separate out the parms… anything that is -xyz=abc will cause
# $parms{XYZ}="abc"
#Whatever doesn’t match that format is assumed to be a file or glob
foreach (@ARGV) {
if (m/^[\/-](.*?)=(.*)/) {
push @{$parms{uc $1}},$2;
}
else {
push @files,$_;
}
}
@ARGV=map {glob($_)} @files; #get file list from glob
As you can see, I set up a hash called %parms and an array called @files. Anything in the format of –<any letters>=<any string> (or a slash instead of a dash will also work) will get plunked in the %parms hash for later use. Anything that isn’t in this format is assumed to be a file or a file spec (a collection of files denoted with a wildcard… such as “*.txt” for all text files).
Notice I translate the parm’s key to upper case. This is my own preference but it means that -m=whatever and –M=whatever are indistinguishable. Grep is a program where upper-case parms and lower-case parms have different meanings, so you may wish to do this; just don’t use the “uc” on the key when storing it in the %parms hash. As well, notice that I also collect up the values instead of just having 1 value for each parm. This is so that the options
-to=address@domain.com –to=address2@domain2.com
in a proposed email program will create a list of addresses in the hash entry $parm{TO}. If I didn’t do this, then the second address would overlay the first and then the email wouldn’t get to all the intended recipients. It all depends on how you want your program to operate.
There is a bit of niceness in the last line. Notice that I extend the @files array with a map of the glob of each of the entries. This is because I’m a Windows programmer and Windows’ DOS command line doesn’t expand file specs with wildcards automatically. So I have to do it myself. Essentially, if you are programming in Linux and pass “*.c” as a parameter, Linux will expand that to all the c programs in the current directory before passing them all into your Perl program. Windows doesn’t do that… so I actually get “*.c” as an input parm, which isn’t a valid file itself, and so I have to get all the c files by myself. The real cool thing here is that, since I’ve assumed that anything that wasn’t sucked into the %parms hash is a file or a file spec with wildcards, anything that isn’t will get thrown away when glob doesn’t find anything. So I know that I’m left with a list of files that exist.
Since I’ve pumped that list of files back into the @ARGV array, the diamond operator will read each of those files in turn. And if there were no valid files or file specs on the command line, then the diamond operator will be looking for its input to come from STDIN.
Finally, there are a few special variables that you may wish to use. First of all, $ARGV has the filename that is currently being read. So if you need to know which file the current line from the diamond operator is from, use this variable.
$. contains the line number of the file being read. Well, actually, it is the line number of everything being read. For instance, say I have 2 files in my @ARGV, the first has 10 lines and the second has 20 lines. $. will go from 1 for the first read from <> and go to 30 for the last read from the second file. I haven’t tried this, but I imagine this can be reset to 0 when the $ARGV variable changes.
Well, there it is. Not so scary, is it? I use the diamond operator as much as I can because it’s extremely powerful and builds in automatic flexibility into my programs. My program can still use options and the input can come from STDIN or any list of files without having to write special code to detect and handle each of those possibilities.
Do you program in Perl? Do you use the diamond operator a lot? Let me know in the comments below.
