Perl Basics¶

Table of Contents¶

Getting Help
Running Perl
Variables
Scalar Context vs List Context with Arrays
- Scalar Context
  - List Context
Perl File Structure
Pragmas
Subroutines
Using Arrays in Perl
Using Data::Dumper to Print Data
Accessing Command Line Arguments
Command Line Options
BEGIN and END Blocks
File Operations
Resources

Getting Help¶

man perl is available, but it is not as robust as some man pages. The man page recomands perldoc. This is a separate package that will need to be installed.

sudo apt-get install -y perl-doc
man perldoc

For runtime options/flags:

perldoc perlrun

To get help with a perl function, use perldoc -f func:

perldoc -f chomp

For perl variables, use perldoc -v var

perldoc -v ARGV

For perl modules, use perldoc -m module

perldoc -m data

Running Perl¶

perldoc perlrun

Running Perl Scripts¶

From the command line, you can run a perl script like any other language. Type perl then the name of the perl script.

perl myscript.pl

Alternatively, if you have the correct shebang line (e.g., #!/usr/bin/local/perl or #!/usr/bin/env perl) then you can just execute the script directly.

./myscript.pl

Running Perl One-Liners¶

To run perl commands from the command line, use the -e flag.

perl -e 'print "Hello, world\n"'

If you use double quotes in your -e expression, the shell expands variables.
Single quotes are preferred for that reason.

Use the -E flag to run the commands to enable some of the pro core features (i.e., the use strict pragma.)

perl -E 'print "Hello, world\n"'
# Hello, world
perl -E 'say "Hello, world"'
# Hello, world

The -E option behaves just like -e but enables all optional features.

say automatically adds a newline at the end of a string. Does not work with -e, because features are not enabled with -e.

perl -E 'while(<>) { say uc $_ }'

- while(<>) reads from STDIN - This could also be done with the -n option.
- say uc $_ prints (say) the uppercase (uc) version of the current line ($_) This will wait for user input and print it back in uppercase.

Piping to Perl¶

If you're piping input to Perl, use either -p or -n to loop over the input.

printf "Hello, world.\n" | perl -pe 's/Hello/Hi/'

The -p options is usually what you want for basic substutitions.
It wraps the input in a printing loop.
This means that it will print each line as it is being processed.

Basically, it is equivalent to this perl program:

while (<>) {
    print $_;
}

Whatever code you have in -e will also be inside this while loop.

while (<>) {
    $_ =~ s/Hello/Hi/;
    print $_;
}

Note: The =~ operator serves as both a comparison operator and an assignment operator.

If you use -n (e.g., perl -ne), the input will be wrapped in a non-printing loop.

So doing this:

perl -ne 'print "The current line is: $_\n"'

Will not print the lines by default, unless explicitly printing the $_ variable.
This is what that command is doing:

while (<>) {
    print "The current line is: $_\n";
}

Setting the IRS from the CLI¶

The IRS (Input Record Separator) is a special variable ($/) which determines how Perl reads in lines. By default, $/ is set to newline (\n), which means it reads input line-by-line.

Use the -0 flag to set the input record separator when running Perl.

perl -pi -0<OCTAL_VALUE>

The -0 option accepts any octal or hexadecimal value to use as the IRS.

If you're specifying a hexadecimal value, add an x:

perl -pi -0x<HEX_VALUE>

When using hexadecimal, you can actually specify a unicode character... if you have weirdly-delimited input.

Using -0 without any arguments will set the IRS ($/) to NUL.

perl -pi -0 ...

This will set the separator to NUL (\0), which is good for working with tools that output NUL-delimited text. For example, using find -print0 or xargs -0.

Paragraph Mode¶

perl -pi -00

The -00 option is special, it causes Perl to "slurp" files in paragraph mode, which sets the IRS to an empty string, and forces Perl to read in paragraphs separated by one or more blank lines, e.g., two consecutive newline characters (\n\n).

One newline to end the paragraph, and another to represent a blank line.

Variables¶

See variables for more indepth explanations.

Types of variables in perl are:

Scalar: A single values.
Array: A single-vector list of values.
Hash: An array of key/value pairs.

Every variable type has its own namespace, along with some non-variable identifiers.

Basically meaning, you can use the same name for a scalar variable and an array variable and they won't conflict.

my @var = (1, 2, 3);
my $var = "Hello, world.";
print "@var\n";
# 1 2 3
print "$var\n"
# Hello, world.

Not "technically" variables, but the same rule applies to these:

Handles
- File handles
- Directory handles
Subroutine names
Format names
Labels

Just because you can doesn't mean you should.
As a best practice, you should always give your variables and functions unique names for clarity.

Scalar Variables¶

In perl, anything that is a single unit of data is a scalar value.
That unit of data could be a string, number, or reference.
Scalars are represented with the dollar $ sign prefix.

A scalar always holds one value at a time.

There's really no need for different types since perl is dynamically typed.
So, scalar is kind of its own data type in perl. There are still numbers and strings, but they're dynamically cast into the correct type based on the context they're used in.

Examples of Scalars in Perl¶

Dynamically typed scalar:

my $var = 42;  # $var scalar holds a number
$var = "Hi";   # Now it holds a string

Strings:

my $greeting = "Hello, perl.\n";
print $greeting;

References:

my $array_ref = [1, 2, 3]; # This is a *reference* to an array.
print $array_ref->[0];     # outputs: 1

Scalar Operations¶

Scalars can holds numbers and perform mathmatical operations.

my $a = 10;
my $b = 21;
my $sum = $a + $b;
print $sum;

Scalars can also holds strings, and you can perform concatenation.

my $first = "Hello";
my $second = "World";
my $combined = $first . ", " . $second;
print $combined;

Use . to concatenate strings in perl.

Scalars can be evaluated to true or false for boolean operations.
- Non-zero numbers and non-empty strings are true.
- Zero 0 and empty strings "" are false.
```
my $x = 44;
if ($x) {
    print "True. Variable exists and is not 0.\n"
}
```

Scalar context is the way to get the length of an array.

my @colors = ('red', 'green', 'blue');
my $count = @colors;  # get the number of elements
print "Number of colors: $count\n";

Accessing Variables¶

From perldoc -m data:

$days               # the simple scalar value "days"
$days[28]           # the 29th element of array @days
$days{'Feb'}        # the 'Feb' value from hash %days
$#days              # the last index of array @days
@days[$#days]       # the last element of array @days
@days               # ($days[0], $days[1],... $days[n])
@days[3,4,5]        # same as ($days[3],$days[4],$days[5])
@days{'a','c'}      # same as ($days{'a'},$days{'c'})
%days               # (key1, val1, key2, val2 ...)

Scalar Context vs List Context with Arrays¶

In perl, context determines how expressions are evaluated.
Scalars are always evaluated in scalar context, but arrays/hashes are a little different.
You set the context by using the prefix $ or @ for scalar and list respectively.

If we wanted to force array context for a scalar variable:

my ($string) = @list;

Scalar Context¶

In scalar context, if an operation or function is expected to return a singel value, it operates in a scalar context.
An example of this:

my @arr = (1, 2, 3);
my $count = @arr;  # In scalar context, @arr returns its size
print $count; # 3

Using $count as the variable, with $, sets the context as scalar.

List Context¶

If an operation or function is expected to return a list of values, it operates in list context.

Example:

my @arr = (1, 2, 3);
my @copy = @arr;  # in list context, @arr returns all its elements
print @copy; # outputs: 123 (flattened)

Using @copy as the variable, with @ to specify an array, it sets the context as list.

Lowercase Input¶

ls -alh | perl -pe '$_ = lc $_'

This turns all input to lowercase.
This doesn't actually use any regular expressions, it utilizes the "default" variable (holds the current line) and the lc (lowercase) perl function.

Perl File Structure¶

#!/bin/perl

use strict;
use warnings;

print "Hello, world!\n";

1;

Each line in a Perl script should be ended with a semicolon ;.
A perl file will start with a shebang line (#!/bin/perl or /usr/bin/env perl).

The lines starting with use are called pragmas. There are a lot of pragmas that can be used to enable or disable certain features

The return code can be stated at the bottom.

Pragmas¶

Pragmas in perl change the way the code behaves.
They're compiler directives. Instructions that modify the behavior of Perl during compilation.
They're not functions or modules, but rather flags that control the compilation and execution of the script.
They're included using the use keyword.

Some common pragmas:

strict: Enforces stricter programming rules, like declaring variables before using them.
- Helps catch typos and errors early.
```
use strict;
my $var = 42;  # Without `my`, perl would throw an error
```
warnings: Outputs warnings for potentially problematic code
- For example, if you're using an uninitialized variable.
```
use warnings; 
my $x;
print $x; # warns: use of uninitialized value $x
```

utf8: Enables utf-8 encoding for the script's source code.

use utf8;
my $str = "こんにちは"; # Japanese greeting

autodie: Makes file operations (e.g., open) throw exceptions on failure.

use autodie; 
open my $gh, '<', 'nonexistent.txt'; # dies if the file doesn't exist

You can specify what to load from a pragma by giving it an argument.

use charnames ":full";

":full": This is a pragma argument, or "tag."
- It's not normal Perl syntax, it's now certain pragmas (open, charnames, strict, warnings) allow you to configure what they load or activate.

View the perldoc page for the pragma to see what tags you can specify and what they do.

Subroutines¶

Functions in perl are called subroutines.
Subroutines are reusable blocks of code that perform a specific task.

Use the sub keyword to define a subroutine.

use warnings;
use strict;

sub say_hello {
    print "Hello, world.\n"
}

say_hello(); # Call the subroutine.

Passing arguments to subroutines¶

You can access any arguments passed to a subroutine using the @_ array.

use warnings;
use strict;

sub say_hello {
    my ($name) = @_;
    print "Hello, $name.\n"
}

say_hello("Kolkhis"); # Call the subroutine. 
# Outputs: Hello, Kolkhis.

The parentheses in ($name) says you want to assign the first value from the list @_ to the variable.

Without the parentheses, perl would not treat the right-hand side as a list. It would assign $name to the number of elements in the @_ list, since the right hand side is being evaluated in scalar context due to the left hand side being a scalar assignment.

So, using parentheses around the scalar variable assignment allows the RHS to be evaluated in array context (sort of).
It's like a tuple assignment in Python. It will take the first argument from @_, and assign that to the scalar variable $name.

Returning values from subroutines¶

Subroutines return values using the return keyword.
Or, they implicitly return the last evaluated expression.

sub add {
    my ($a, $b) = @_;
    return $a + $b;
}

my $sum = add(2, 3)
print $sum; # outputs: 5

Example subroutine: Check if a file exists¶

sub file_exists {
    my ($file) = @_;
    return -e $file; # Returns 1 if file exists, 0 otherwise. 
}

Using Arrays in Perl¶

Also see arrays.md.

Arrays are generally accessed by using @ (the whole array, or "array context") or with $ ("scalar context") to get a single value.

Perl can have arrays that are simply references to an array, and not actually arrays themselves, using the syntax \@array_name. This creates a reference to the array @array_name.

Using `Data::Dumper` to Print Data¶

Normal print statements will flatten any sort of data structures. Arrays, dictionaries/hashes, and nested combinations won't be seen correctly with print.

The Data::Dumper sub from the Data module is used to format these data structures into human-readable strings.

use Data::Dumper;

my $input = $ARGV[0];
print "First argument: $input\n";
print "Remaining arguments: ", Dumper(\@ARGV); 

my $input = shift;
print "First argument: $input\n";
print "Remaining arguments: ", Dumper(\@ARGV);

Data::Dumper: A perl module that converts complex data structures (arrays, dictionaries/hashes, etc) into a human-readable string.
- Regular print statements won't give you the output, it will flatten this data.
\@ARGV: The \ is used to pass a reference to the array @ARGV.
- This creates a reference so that Dumper knows you're passing a whole array, not the contents of the array.
The difference between $ARGV[n] and @ARGV comes from how variables are accessed in Perl:
- @ARGV: Refers to the entire array. I.e., all the command-line arguments.
- $ARGV[0]: Accesses a single element (scalar) from the array @ARGV.
- $ARGV (without [], scalar context): Holds file name passed in via command line arguments or stdin when used in scalar context.
  - This will hold the filename that is currently being processed if there are multiple files.

Accessing elements in arrays:

$ = Single value (scalar).
@ = Full array.

To output an array: If we pass an array to Data::Dumper without a reference (\@), then the output will look different:

$VAR1 = 1;
$VAR2 = 2;
$VAR3 = 3;

If we pass the array with a reference, the output will look like how the array is actually structured:

If we use print to output a list, it will flatten all the elements into one string.

print "All elements of the array: @ARGV\n"

Then at the command line:

./argtest.pl this is a test
# output:
# All elements of the array: this is a test

Accessing Command Line Arguments¶

You can access CLI arguments from a script in a couple different ways.

@ARGV: An array that holds all the CLI arguments.
- Stands for "Argument Vector."
- Using $ARGV[0] will not modify the @ARGV array.
shift: Command that removes and returns the first element from @ARGV.
- If called inside a subroutine (function), it pulls from the default array @_.
- Just like shift in bash.

Command Line Options¶

Some CLI arguments for perl:

-p: Places a printing loop around your command so that it acts on each line of standard input.
- Use to loop over the contents of a file line by line and output every line after being processed.
- This is similar to what sed and awk do by default.
-n: Places a non-printing loop around your command.
- Use to loop over the contents of a file line by line and NOT output anything other than what you specify.
-e: Allows you to provide the perl script as an argument rather than in a file.
- Identical to -c in Python or Bash.
-E: Same as -E but also enables all optional features.
- Identical to -c in Python or Bash.
-i: Edit the file in place, making a backup of the original.
- Allows you to modify files without {copy, delete-original, rename}.
-w: Activates some warnings.
- Someone said "Any good Perl coder will use this."
-d: Run the command under the Perl debugger.
-t: Taint mode. Treats certain operations as "tainted" code.
- It treats any external input (i.e., CLI args) as tainted until it's sanitized.
- Use to beef up Perl security, e.g., when running as setuid scripts.

-T: Taint mode, for a whole script.

Doesn't just use taint mode for certain operations, it treats all external data as taineted until sanitized.
This is used to prevent bad actors for performing destructive operations.
```
./script.pl '; rm -rf /'
```

Use in scripts to check input.

#!/usr/bin/env perl -T
use strict;
use warnings;

my $input = $ARGV[0];
my $input = shift;
if ($input =~ /^(\w+)$/) {
    print "Safe input: $1\n"
} else {
    die "Tainted input detected.\n"
}

BEGIN and END Blocks¶

Like awk, perl has a BEGIN and END block.
Anything inside the BEGIN block will run once, before the main code block starts execution.
Likewise, anything in the END block runs once, after the main code block finishes execution.
This is really only useful when doing one-liners from the command line.

Example: print the total word count of a file in the END block

perl -ne 'END { print $t } @w = /(\w+)/g; $t += @w' file.txt

-n: Loop over the file, line by line.
- Same as while(<>)
-e: Allows execution of the code provided directly as a string.
- Similar to -c in other tools.
END { print $t }
- The END block is executed once after all lines of the file have been processed.
- It prints the value of $t, which is the total word count.
@w = /(\w+)/g:
- @w is an array.
- /(\w+)/g: Regex that matches every word in the current line.
  - \w+: Matches one or more word characters (letters, digits, or underscores).
  - g: Global modifier. Ensures all matches in the line are captured.
- For each line, @w contains all words found in that line.
$t += @w:
- $t: Scalar variable, initialized to 0 by default.
- @w: In scalar context, gives the number of elements in the array (words).
- $t holds the total number of words across all lines.
file.txt: The input file.

Using Subshells in Perl¶

Subshells are a thing in perl. You can capture the output of shell commands.

In order to achieve the same result as $(...) (bash) in perl, you can do one of two things:

Wrap a shell command in backticks(`cmd`)
Use the qx operator (qx/cmd/ or qx(cmd))

Then save the output to a variable.

This is idiotmatic, core Perl and doesn't rely on external packages.