Substitution & Translation in Regular Expressions with Perl

Substitution and Translation are quite useful in Perl which is used to identifying regular expressions and make substitutions based on matches. In previous article Matching Regular Expression with Perl i have explained what are match operators in Perl and how you can use regular expressions to find patterns in strings. Substitution and Translation is an integral part of Perl regular expression operators. Ideally, these are used to change strings, this gives us the power and tools to manipulate our information in any way we wish. We could scan an entire text file and change all the words “Sky” to “earth” if we want.

billige canada goose jakker i danmark, Canada Goose Sale, Canada Goose Outlet,goose jacket 511 330 116-0 number 16 seed, Canada Goose, Canada Goose,canada goose jackets stores in toronto., Canada Goose Store, Canada Goose Parkas

Lets take a quick example of substitutions in regular expressions. In its simplest form, substitutions work as follows:

$string =~ s/a/b/;

This will replace the first “a” in $string with a “b”. If you wanted to replace all “a”s with “b”s then all we need to do is put a “g” for global at the end of the line like so:

$string =~ s/a/b/g;

We can use all of the special operators with substitution that we did with match, for example, if we were working on the phone number example from the previous article and we wanted to smooth user input issues by removing everything that was not a digit then we could use the following:

$string =~ s/[^0-9]//g;

This replaces anything matched by the first expression, ie: anything except a digit, with what’s in the second expression, which is empty. We can’t do something like the following however, if we were looking to make all vowels uppercase.

$string =~ s/[aeiou]/[AEIOU]/g;

Square bracket notation does not work in the replacement side of the substitution, since in general there would be no way of knowing which character should be inserted. Instead this will replace every vowel with the string “[AEIOU]”. To properly replace all lowercase vowels with their uppercase equivalent, we can use another method: the translation tool:

$string =~ tr/aeiou/AEIOU/;

Translation works on a per character basis, replacing each item in the first list with the character at the same position in the second list. Handily, the second list wraps around, allowing us to write an expression like:

$string =~ tr/[1-9]/ /;

which replaces all numbers with a space. Translation is a simple operation, there’s no way to handle repetition or grouping, so it’s suitable only for basic replacements, for anything more substantial you’re better off with a series of substitutions.

Now let’s look at how you can use these regular expression tools in a real program. We’ll now look at a simple command line utility to help you cheat at crossword puzzles. We want a program which takes in incomplete information about a word and then searches a word list for possible solutions. Virtually all UNIX based systems (eg Linux and Mac) come with a reasonable word list, usually found at /usr/share/dict/words, but Windows users can pick one up here.

A perl program to solve this task could be written like this:

Running quickly through this example: first we take the first command line argument, then replaces all gaps with periods, then uses this as the pattern in a regular expression match, filtering standard input for lines that match the pattern. When I run this as so

cat /usr/share/dict/words | perl "h l"

the following output is printed:


Or, more usefully:

cat /usr/share/dict/words | perl "ab lu y"

prints “absolutely”.

Command line aficionados may notice that we’ve just implemented a very stripped down version of the common utility “grep”. In fact, the previous command could easily be replaced by:

grep "" /usr/share/dict/words

grep is an extremely handy utility for searching in text files using regular expressions, but be careful, the syntax for grep is not 100 percent identical to what perl uses. For more info take a look at the grep manual page by typing “man grep” in your shell.

A lot of the time you’ll want to change a line subtly, rather than replace static text with completely different text. One of the most common ways of doing this is by using groups in the replacement expression. In a previous article I showed how you can combine parts of an expression by surrounding it with parentheses, for example the following expression will replace a hyphen at the start of a line, or any amount of white space with a tab character:

$string =~ s/(^- )|([ \t]+)/\t/g;

The other advantage of groups is that you can insert the characters matched by a group in the match expression in the replacement. In perl the first 10 groups of a regular expression are automatically put into the variables $1-$0.

$string =~ s/^(.+)<BR/?>/<p>$1<\/p>/g;

Similarly, we can convert Comma Separated Variable (.csv) files into html tables quite easily, by applying a few regular expressions:

$string =~ s/([^,]+)[,\n]/<td>$1<\/td>/g;
$string =~ s/^(.+)$/<tr>$1<\/tr>/g;

Now in these expressions, particularly the paragraphing one, there is a consistent flaw, namely that regular expressions are by default case sensitive, whilst the html they run over may not be. We can tell perl to treat our regular expressions as case insensitive by using pattern modifier. We’ve already been using the modifier “g” to tell Perl to match globally, and we can tell it to be case insensitive in the same way:

$string =~ s/^(.+)<BR\/?>/<p>$1<\/p>/gi;

works the same as before, but will now pick up <BR> and <Br>. There are four more pattern modifiers that may be of use to you:

  • m: Treat the string as multiple lines, rather than as a single string with embedded new lines.
  • o: Only compile the expression once, regardless of the status of included variables
  • s: Treat the string as a single line.
  • x: Use extended syntax for regular expressions. This means that any white space that is not escaped is ignored, and regular expressions can be broken up over multiple lines. This allows you to write your more complicated expressions in an easier to read format, and let’s you insert comments.

Let’s run through a quick usage of the extended syntax on the paragraphing expression:


It’s the same expression, but the match pattern is broken up into three lines with comments at the end of each line explaining the three parts of the match. Comments inside extended regular expressions are contained within (?# and ). Now for this example, the comments might seem a little trivial, but for longer and more complicated expressions they can greatly increase the readability of your regular expressions.

I think that will be enough for you to explore more for Substitution. This tutorial must have given you enough confidence to begin with it.

Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.

Matching Regular expressions with Perl

Matching regular expressions with Perl is quite easy and extremely popular since long because of its text processing due to its native regular expression support. Many developers find it bit difficult to match regular strings in other languages but Perl will work like a gem if you are familiar enough with the language. Lets walk through with some of the quick examples on how you can use regular expressions in your applications to give you enough power to search and substitute strings.

kensington canada goose fake, Canada Goose Online, Cheap Canada Goose,discount canada goose jackets & parka online sale, Canada Goose, Cheap Canada Goose,canada goose mens snow mantra parka., Canada Goose, Canada Goose Online

The simplest regular expression operation is the match which returns true if the pattern is found in the string. So the following expression:

$string =~ m/text/

will only be true if the string in the variable “$string” contains the substring “text”. This is the most basic kind of regular expression, where each character is matched literally. This is, of course, just a taste of what regular expressions can do. If we are to find four letter words that end in “ext”. For this we use the special character “.”, a period in a regular expression tells Perl to match any single character in its place. So the expression:

$string =~ m/.ext/

would match the word “text” or “next”.

This expression is not perfect, however, since it will also match parts of longer words which contain “ext”, such as “dextrous” and “flextime”. We can restrict the position in which the match can occur by using anchors. The “^” character matches the start of the string, so:

$string =~ m/^.ext/

matches “dextrous” but not “context”.

Similiarly the “$” character matches the end of the string:

$string =~ m/.ext$/

matches “context” and not “dextrous”.

If you wanted to match only four letter strings ending in “ext” then you could combine these two like so:

$string =~ m/^.ext$/

Now what if you need to match a given set of characters, rather than any character in place of the period? Regular expressions provides a means of doing this through using square brackets. Take the following expression:

$string =~ m/^[tT]ext$/

This will match only the words “text” and “Text”, but not for example “next”. A pair of square brackets will translate to any single character contained within. This is quite powerful, for example:

$string =~ m/[aeiouAEIOU]/

The above example is true if $string contains any vowels.

If the first character inside the brackets is a “^”, rather than acting as an anchor, it negates the list, making it match anything that is not contained within the brackets, so adjusting the previous example to be true only if $string contains consonants or punctuation:

$string =~ m/[^aeiouAEIOU]/

Square bracket notation also lets you specify ranges of characters, to save you having to list a whole bunch of consecutive numbers or letters, for example. The following example matches any lower case character:

$string =~ m/[a-z]/

Up until now we’ve been dealing with our strings one character at a time, but most of the time we need to be able to have more complicated options. One way of doing this is by using the “|” or branch operation. Say we wanted to check if $string contained the substring “next” or “previous” then we could use the following

$string =~ m/next|previous/

If we wanted to use anchors together with this expression then we need to group the options together, to do this we use parentheses just like in arithmetic. So if we wanted to adjust this to only match “next” or “previous” at the start of the string we would write:

$string =~ m/^(next|previous)/

All of these operators are what we call atomic operators, that is, they correspond to a single character. The real strength of regular expressions however, lies in the handling of repetition. To illustrate this let’s take the example of needing to determine if a string contains a valid phone number. We’ll use the simplest definition of a number to start off with; we’ll just look for any series of numbers. We could start by using the “glob” operator, which is written “*”. Most who have been in contact with the command line in some form should be familiar with “*” being used as a wildcard, and it has a similar use in Perl, matching any amount of the previous character. Thus:

$string =~ m/a*/

matches any amount of a’s and now we will match any amount of digits:

$string =~ m/[0-9]*/

This is not quite what we want, as it will match any amount at all, even zero. We could have used “+” instead, which will match one or more of the previous character, but this won’t fix the problem of finding numbers that are too long or too short. What we really want is to specify exactly how many repetitions we are looking for, in this case seven. This can be done using braces:

$string =~ m/^[0-9]{7}$/

This is closer to what we’re looking for, it will match only a string containing a seven digit number. Braces have a few more options that make them a powerful way to specify repetitions, for example you can match a range of repetitions:

$string =~ m/[0-9]{6,8}/

This will match between six and eight digits, but if we replaced it with “{6,}” we could match six or more digits, whereas “{,8}” matches eight or less.

Let’s take another look at those phone numbers, at the moment it’s working all right, but it’s still a bit too restrictive. Whenever you’re dealing with user input you need to anticipate that people are going to do simple things in a number of different ways.

It’s a good idea to try and anticipate some of the more common formats for entering a phone number, as a simple example let’s take the number “2391720”, this could be entered as either “239-1720” or “239 1720”. Now we can use brackets to match either a “-” or ” “, but we need something new to handle the case of not having a separator at all: the “?” operator, which means the previous character may or may not be found. We can match all three of these formats with the following expression:

$string =~ m/[0-9]{3}[- ]?[0-9]{4}/

Similarly, let’s take a look at supporting area codes. Australian phone numbers have a two digit area code, let’s add them with the following:

$string =~ m/([0-9]{2}[- ]?)?[0-9]{3}[- ]?[0-9]{4}/

This expression will match numbers like “02 114 7682”, and, since we wrapped the area code part of the expression in parentheses and made it optional, it will also match everything matched by the previous expression. There are more improvements that could be made, such as allowing the area code to be enclosed in “(” and “)”, but as you can see the more options you add to the expression the longer and more complicated it becomes, so I’ll leave that one up to you.

Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.

Why Perl Complex Code requires less lines

Perl developers usually notice how easily you can write complex code by writing very few lines of code and in very small about of space. Arguably the best usage of Perl can be determined by exploring how and why these complex code can be written in few lines and what goes in the background. Lets go through with few example codes to test it out

We’ll start with the simplest of programs, which simply reads in characters from the keyboard and repeats them back to the console. In Perl you might write this like so:

Even to start this program is quite compact, but what does it do? Simple: is a special file handle, in this case belonging to the standard input buffer (called STDIN), which is usually connected to the keyboard. Each time we assign to the variable $line we take the top line off the STDIN buffer and put it in $line. When the buffer runs out of lines, it returns EOF, which the while statement treats as being false. The rest of the program is fairly self explanatory, now that we have the input line in the variable $line then we use the print function to print it to screen, or more accurately, the standard output buffer (STDOUT), which is usually connected to the screen. Both the standard input and standard output buffers can be redirected, for example to files for storing the output of programs, but if you’re dealing with text it’s usually safe to assume they’re equivalent to the keyboard and screen.

You may think that this program is already as short as it can be, but through using Perl’s special variables, we can make it shorter:

The default scalar variable: $_

Perl has a number of special variables that are automatically assigned in the general course of a program, they can be used to access information about the program itself, such as the name or process id, the command line arguments, or the results of the last regular expression. The most general, and maybe the most useful of these special variables is $_, the default variable. The default variable is where the results of some Perl constructs and functions are put if you don’t specify an assignment, and is used as the argument to certain functions if none is given. This sounds vague and can be confusing until you’re familiar with it, but it can also be powerful. We can use $_ to eliminate the need for the variable $line in our program:

This program is equivalent since when a file handle is used by itself in the test of a while statement, it puts its input into the default variable. Then when we print we can just reference $_ to access that input. But we can make this program shorter. Remember when I said that $_ is used as a default argument for some functions if none is given, well print is one of those functions. So we can now write this program as follows:

Now we’ve got a program that does the same thing, but eliminates explicit variables all together. Since we’re really just connecting STDIN to STDOUT it would be nice if we could get rid of that while loop, it’s not doing anything interesting except iterating over the buffer. Well, this too is possible:

How this works is a little more complicated. When we use $_ with print, we put the variable into what’s called a scalar context, meaning simply that it is treated as an individual object, such as a number or a character, and not a collection. print can also be used in an array context, meaning that the argument is treated like a list of objects, when used with print this will print each one in turn. When we use the file handle with print in this way it will treat standard input as an array of strings and print them in order, which has the same result of the while loop. It might be an extreme example, but by using a few Perl shortcuts we’ve cut the length of our program in half.

This is fine if we just want to mirror STDIN to STDOUT, but what if we’d like our program to act more like the Unix filter cat, which can open and print files as well. Now we could check the command line arguments and test to see if they’re valid files, open and print them in order, but since this is such a common thing to do Perl has an easier (and shorter!) way.

The special file handle: <>

Like the default variable, the special file handle — written as <> is a short cut in the language added to make programs easier to write. The special file handle treats all command line arguments as file names, and opens each in turn. If there are no command line arguments then it opens and reads from STDIN. As per the UNIX convention, if “-” is given as a command line argument it opens STDIN in place of a file. So if we wanted to write a version of the above program that could support files given on the command line it would be as simple as:

When you consider that you can write a working implementation of cat in only eight characters, you can see why Perl is considered so powerful. But what if we want to do something more significant with the input rather than just echo it back to the screen?

Counting Line Numbers

If we want to process the lines of the input individually then it’s not enough to just link the file handle to print, let’s take a look at a simple program to add line numbers to the lines of input:

In this example we use the variable $num to keep track of the line number. For each line of input we increment this number, then print out the number and the line of input together. When we refer to variables inside strings with double quote characters (“) the variable name is replaced with the contents of that variable, this makes formatted output in Perl a breeze.

Even with these simple programs, it’s easy to see how using special variables can make your programs smaller and faster to write. If you’re interested, information about all of Perl’s special variables can be found in the perlvar section of the Perl manual (

Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.

How to Create Random Strong passwords with Perl

Hackers have made it quite difficult for the programmers to create strong passwords. Weak passwords are too easy and pose threat to network security. Its a big security hole which you don’t want to leave. People use weak passwords like name or date of birth but these are cracked by sophisticated password sniffing tools, making it easy for unauthorized users to gain back-door entry into servers. System administrators keeps checking and changing their passwords to ensure they’re secure enough to stand up to an attack. Depending on the level of security needed, some administrators go one step further: they generate and assign user passwords themselves.

However, generating passwords for users automatically is tricky as it needs to be easy enough to be remembered, yet not so simple that it can be easily broken. There are many algorithms available on the Internet to help you generate a secure, pronounceable password; however, if you’re on deadline or don’t have any development experience, it may not always be possible for you to implement these solutions.

However, there is a solution at hand. CPAN contains many modules for automatic password generation, allowing you to easily add this capability to your application. These modules are sophisticated tool you can customize the length of the password, the characters permitted in it, the “pronounceability” of the end result, and other attributes. Two of the more interesting modules in this category are discussed below.

The String::MkPasswd module

The String::MkPasswd module provides a simple API to generate random, unpronounceable passwords. To use it in Perl, follow the steps below:

1. Install the module

The simplest way to install String::MkPasswd is with the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install String::MkPasswd

If you use the CPAN shell, dependencies will be automatically downloaded for you.

Alternatively, download the module. Once you’ve extracted the files in the download archive to a temporary directory, run this sequence of commands:

shell> perl Makefile.PL
shell> make
shell> make install

If Perl finds missing dependencies, it will abort the process with an error; you should then install the missing files and try again. If all required files are in place, the commands above will compile and install the module to your Perl modules directory.

2. Decide password attributes

Before writing any code, you must decide some important password attributes. String::MkPasswd lets you control:

  • the length of the password;
  • the number of numerals;
  • the number of upper- and lower-case characters;
  • the number of special punctuation characters;

In most cases, you will want a password is between 8 and 15 characters long, with a judicious number of digits and special characters mixed in with traditional alphabetic characters. Remember, the more random the password is, the harder it is to break!

3. Generate the password

Once you’ve decided what your password should look like, generate it by calling the String::MkPasswd module’s mkpasswd() function, as shown in below

In this case, the password generated will be 13 characters long, containing 4 numbers, 4 lower-case characters, 2 upper-case characters and 3 punctuation characters. Here’s an example of the output:


Each time you call mkpasswd, you will get a different result. So, if you have a large number of passwords to generate, you can simply place the call to mkpasswd in a loop and process the result of each run.

You can also call mkpasswd() without any parameters for a default 9-character password.

The Crypt::RandPasswd module

The String::MkPasswd module creates secure, but not very easy-to-remember, passwords. If you’d prefer to create passwords that are pronounceable and hence easy to remember, consider using the Crypt::RandPasswd module instead. This module is an implementation of the Automated Password Generator, and you can use it as described below:

1. Install the module

You can install Crypt::RandPasswd with the CPAN shell, as follows:

shell> perl -MCPAN -e shell
cpan> install Crypt::RandPasswd

Alternatively, you can download the module, and install it using the following commands:

shell> perl Makefile.PL
shell> make
shell> make install

2. Generate the password

The Crypt::RandPasswd module comes with a word() function, which generates a pronounceable random password. Following is an example of how to use it

The word() function accepts two arguments: lower and upper limits for the password length. Here’s an example of its output:


You can create a password consisting of a series of random alphabetic characters, by calling the letters() method instead of the words() method.

Using either of these two modules for your user passwords will improve the security of your networked system. Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.

Why Perl Scripts are Super Fast? Benchmark Perl Scripts

Its quite evident that Perl Scripts runs super fast when it comes to handling regular expressions and text processing. Programmer usually argue over which programming language is fastest or better or supports more features but we need proofs and evidences to support any sort of claims. Lets try to determine why Perl Scripts runs extremely fast?
We can use Perl Benchmarking Module which let us test the speed of a Perl script.

Calculating differences in script execution time
Ideally we test speed by start time (When the script starts) and end time (When the script finished) and take the difference between the two values. This will become our script execution time. In Perl these time values are obtained with the built-in time() function:

While this is fine for basic use, it becomes complicated if what you really want is to compare the times of different scripts, or run arbitrary pieces of code for fixed time intervals. For these uses, the Benchmark module is more appropriate. This module comes bundled with Perl, and can be imported into your Perl script through the “use” command. Take a look at the next example, which rewrites the previous one to use Benchmark instead of time().

Every time you create a new Benchmark object with new(), the current time is returned. The difference between the start and end times is calculated with the Benchmark module’s timediff() function, and the result is formatted for display with the timestr() function. Here’s the sample output of the script above:

Time taken was 2 wallclock secs ( 2.14 usr 0.00 sys + 0.00 cusr
0.00 csys = 2.14 CPU) seconds

As you can see, Benchmark returns a little more detail than the time() function.

Timing multiple runs of a script

Of course, a sample size of one is not necessarily representative of how fast your script is, especially on Web servers that are subject to varying loads. Therefore, what you really need is a way to run this script many times, and calculate the average time taken after compiling the data from each run. Luckily, Benchmark comes with a function to do this too. It’s called timethis(), and it’s demonstrated in the following example:

The timethis() function accepts two arguments: the number of times to run the code block, and the code block itself. This code block must be provided to timethis() in a format suitable to the eval() function.

Once the benchmark is complete, timethis() displays a report like this:

timethis 100000: 210 wallclock secs (209.37 usr + 0.00 sys = 209.37
CPU) @ 477.62/s (n=100000)

There are two pieces of useful data here: the number of CPU seconds, which tells you how long Perl takes to run the code N times, and the per-second data, which tells you how many runs take place per second. Obviously, the higher the second value, the faster your code is. Instead of a fixed number of iterations, now let’s see how to have timethis() run the code for a fixed period of time.

Hide Counting how often a script runs in a predefined time window

Instead of timing how long a piece of code takes to execute a fixed number of iterations, you can flip things around and have timethis() run the code for a fixed period of time to see how many iterations it completes in that time. You do this by using a negative value as the first argument. Consider the following example, which makes timethis() run the code for a minimum of 10 seconds:

The output will look something like this:

timethis for 10: 11 wallclock secs (10.93 usr + 0.00 sys = 10.93 CPU) @ 700.82/s (n=7660)

So in 11 seconds (well, 10.93 if you want to be difficult), Perl was able to execute the code 7660 times, or approximately 700 times per second. You can even create an interactive benchmarking tool with timethis(), by having the user enter the code and the number of iterations at the prompt:

Most of this is pretty simple, and should be clear to you if you understood the previous examples. The only item of note here is the alteration of the Perl input separator to the code END, so that the user can enter multi-line code blocks and terminate them with the statement END (the default separator is a carriage return, which would make Perl jump to the next statement as soon as the user pressed [Enter]).

Here’s an example of this script in action (lines beginning with a ‘>’ indicate output from the program, the rest are lines input by the user):

> Enter number of iterations:
> Enter your Perl code (end with END):
for ($a=1; $a<1001; $a++) { $value = $a ** 10; } END > Processing…
> timethis 500: 6 wallclock secs ( 5.72 usr + 0.00 sys = 5.72 CPU) @ 87.41/s (n=500)

Timing and comparing different techniques

If you’re the kind of Perl programmer who likes experimenting with different ways of accomplishing the same thing, you’re going to just love the next tool in Benchmark’s arsenal. The timethese() function allows you to time more than one code fragment at a time:

This example tries to calculate the sine of 5,000 numbers, using three different approaches. The first, named “huey”, uses a while() loop; “dewey” uses a for() loop; and “louie” uses a foreach() loop. Each of these code snippets is placed inside a single call to the timethese() function, which accepts two arguments: the number of iterations and a hash whose values are the code snippets to be tested (the keys of the hash contain the unique names for the code fragments). The timethese() function then internally calls timethis() for each hash element and returns the time taken for each option. Here’s a sample of the output:

Benchmark: timing 1000 iterations of dewey, huey, louie…
dewey: 92 wallclock secs (91.72 usr +  0.00 sys = 91.72 CPU) @ 10.90/s (n=1000)
huey: 160 wallclock secs (159.56 usr +  0.00 sys = 159.56 CPU) @ 6.27/s (n=1000)
louie: 45 wallclock secs (44.98 usr +  0.00 sys = 44.98 CPU) @ 22.23/s (n=1000)

It is clear from the output that the foreach() loop is the most efficient of the three alternatives, at least for this particular scenario. Another way to run this test is with the cmpthese() function, which internally calls timethese(), and accepts the same arguments as timethese(). The main advantage is that it formats the result better for comparison purposes:

Note the use of “use Benchmark qw (:all)” instead of just “use Benchmark.” This ensures all the methods in the Benchmark object get exported. The output of cmpthese() is a table which compares the speed of each option against the speed of its competition. Since this table contains summary percentage values, it is somewhat easier to understand than the output of timethese():

Rate louie  huey dewey
louie 14.1/s    —  -50%  -54%
huey  28.5/s  102%    —   -8%

Leave me a comment and let me hear your opinion. If you’ve got any thoughts, comments or suggestions for things we could add, leave a comment! Also please Subscribe to our RSS for latest tips, tricks and examples on cutting edge stuff.