Perl File Operations

Perl File operations

Variables which represent files are called “file handles”, and they are handled differently from other variables. They do not begin with any special character — they are just plain words. By convention, file handle variables are written in all upper case, like FILE_OUT or SOCK. The file handles are all in a global namespace, so you cannot allocate them locally like other variables. File handles can be passed from one routine to another like strings (detailed below).

The standard file handles STDIN, STDOUT, and STDERR are automatically opened before the program runs. Surrounding a file handle with is an expression that returns one line from the file including the “n” character, so returns one line from standard input. The operator returns undef when there is no more input. The “chop” operator removes the last character from a string, so it can be used just after an input operation to remove the trailing “n”. The “chomp” operator is similar, but only removes the character if it is the end-of-line character.

  $line = ; ## read one line from the STDIN file handle   chomp($line); ## remove the trailing "n" if present   

File Open and Close

The “open” and “close” operators operate as in C to connect a file handle to a filename in the file system.

  open(F1, "filename"); ## open "filename" for reading as file handle F1   open(F2, ">filename"); ## open "filename" for writing as file handle F2   open(F3, ">>appendtome") ## open "appendtome" for appending   close(F1); ## close a file handle   

Open can also be used to establish a reading or writing connection to a separate process launched by the OS. This works best on Unix.

  open(F4, "ls -l |"); ## open a pipe to read from an ls process   open(F5, "| mail $addr"); ## open a pipe to write to a mail process   

Passing commands to the shell to launch an OS process in this way can be very convenient, but it’s also a famous source of security problems in CGI programs. When writing a CGI, do not pass a string from the client side as a filename in a call to open().

Open returns undef on failure, so the following phrase is often to exit if a file can’t be opened. The die operator prints an error message and terminates the program.

  open(FILE, $fname) || die "Could not open $fnamen";   

In this example, the logical-or operator || essentially builds an if statement, since it only evaluates the second expression if the first if false. This construct is a little strange, but it is a common code pattern for Perl error handling.

Input Variants

In a scalar context the input operator reads one line at a time. In an array context, the input operator reads the entire file into memory as an array of its lines… @a = ; ## read the whole file in as an array of lines

This syntax can be dangerous. The following statement looks like it reads just a single line, but actually the left hand side is an array context, so it reads the whole file and then discards all but the first line….

my($line) = ;

The behavior of also depends on the special global variable $/ which is the current the end-of-line marker (usually “n”). Setting $/ to undef causes to read the whole file into a single string.

  $/ = undef;   $all = ; ## read the whole file into one string   

You can remember that $/ is the end-of-line marker because “/” is used to designate separate lines of poetry. I thought this mnemonic was silly when I first saw it, but sure enough, I now remember that $/ is the end-of-line marker.

Print Output

Print takes a series of things to print separated by commas. By default, print writes to the STDOUT file handle.

  print "Woo Hoon"; ## print a string to STDOUT     $num = 42;   $str = " Hoo";   print "Woo", $a, " bbb $num", "n"; ## print several things   

An optional first argument to print can specify the destination file handle. There is no comma after the file handle, but I always forget to omit it.

  print FILE "Here", " there", " everywhere!", "n";    

File Processing Example

As an example, here’s some code that opens each of the files listed in the @ARGV array, and reads in and prints out their contents to standard output…

  #!/usr/bin/perl -w   require 5.004;   ## Open each command line file and print its contents to standard out     foreach $fname (@ARGV) {   open(FILE, $fname) || die("Could not open $fnamen");   while($line = ) {   print $line;   }   close(FILE);   }   

The above uses “die” to abort the program if one of the files cannot be opened. We could use a more flexible strategy where we print an error message for that file but continue to try to process the other files. Alternately we could use the function call exit(-1) to exit the program with an error code. Also, the following shift pattern is a common alternative way to iterate through an array…

 

while($fname = shift(@ARGV)) {...

/pre>

If you have found my website useful, please consider buying me a coffee below ;)