In this part of the
Perl tutorial we are going to see how to
make sure we only have
distinct values in an array.
Perl 5 does not have a built in function to filter out duplicate values
from an array, but there are several solutions to the problem.
List::MoreUtils
Depending on your situation, probably the simplest way is to use the
uniq
function of the
List::MoreUtils module from CPAN.
- use List::MoreUtils qw(uniq);
-
- my @words = qw(foo bar baz foo zorg baz);
- my @unique_words = uniq @words;
A full example is this:
- use strict;
- use warnings;
- use 5.010;
-
- use List::MoreUtils qw(uniq);
- use Data::Dumper qw(Dumper);
-
- my @words = qw(foo bar baz foo zorg baz);
-
- my @unique_words = uniq @words;
-
- say Dumper \@unique_words;
The result is:
$VAR1 = [
'foo',
'bar',
'baz',
'zorg'
];
For added fun the same module also provides a function called
distinct,
which is just an alias of the
uniq function.
In order to use this module you'll have to install it from CPAN.
Home made uniq
If you cannot install the above module for whatever reason, or if you think
the overhead of loading it is too big, there is a very short expression
that will do the same:
- my @unique = do { my %seen; grep { !$seen{$_}++ } @data };
This, of course can look cryptic to someone who does not know it already,
so it is recommended to define your own
uniq subroutine,
and use that in the rest of the code:
- use strict;
- use warnings;
- use 5.010;
-
- use Data::Dumper qw(Dumper);
-
- my @words = qw(foo bar baz foo zorg baz);
-
- my @unique = uniq( @words );
-
- say Dumper \@unique_words;
-
- sub uniq {
- my %seen;
- return grep { !$seen{$_}++ } @_;
- }
Home made uniq explained
I can't just throw this example here and leave it like that. I'd better explain it.
Let's start with an easier version:
- my @unique;
- my %seen;
-
- foreach my $value (@words) {
- if (! $seen{$value}) {
- push @unique, $value;
- $seen{$value} = 1;
- }
- }
Here we are using a regular
foreach loop to go over the
values in the original array, one by one. We use a helper hash called
%seen.
The nice thing about the hashes is that their keys are
unique.
We start with an empty hash so when we encounter the first "foo",
$seen{"foo"}
does not exist and thus its value is
undef which is considered false in Perl.
Meaning we have not seen this value yet. We push the value to the end of the new
@uniq array where we are going to collect the distinct values.
We also set the value of
$seen{"foo"} to 1.
Actually any value would do as long as it is considered "true" by Perl.
The next time we encounter the same string we already have that key
in the
%seen hash and its value is true, so the
if condition
will fail, and we won't
push the duplicate value in the resulting array.
Shortening the home made unique function
First of all we replace the assignment of 1
$seen{$value} = 1; by the
post-increment operator
$seen{$value}++. This does not change the behavior
of the previous solution - any positive number is going to be evaluated as TRUE, but
it will allow us to include the setting of the "seen flag" within the
if
condition. It is important that this is a postfix increment (and not a prefix increment)
as this means the increment only takes place after the boolean expression was evaluated.
The first time we encounter a value the expression will be TRUE and the rest of the times
it will be FALSE.
- my @unique;
- my %seen;
-
- foreach my $value (@data) {
- if (! $seen{$value}++ ) {
- push @unique, $value;
- }
- }
This is shorter, but we can do even better.
Filtering duplicate values using grep
The
grep function in Perl is a generalized form of the well known grep command of Unix.
It is basically a
filter.
You provide an array on the right hand side and an expression in the block.
The
grep function will take each value of the array one-by-one, put it in
$_, the
default scalar variable of Perl
and then execute the block. If the block evaluates to TRUE, the value can pass.
If the block evaluates to FALSE the current value is filtered out.
That's how we got to this expression:
- my %seen;
- my @unique = grep { !$seen{$_}++ } @words;
Wrapping it in 'do' or in 'sub'
The last little thing we have to do, is wrapping the above two statements in either
a
do block
- my @unique = do { my %seen; grep { !$seen{$_}++ } @words };
or, better yet, in a function with an expressive name:
- sub uniq {
- my %seen;
- return grep { !$seen{$_}++ } @_;
- }
Home made uniq - round 2
Prakash Kailasa suggested an even shorted version of implementing uniq,
for perl version 5.14 and above, if there is no requirement to preserve the order of elements.
Inline:
- my @unique = keys { map { $_ => 1 } @data };
or within a subroutine:
- my @unique = uniq(@data);
- sub uniq { keys { map { $_ => 1 } @_ } };
Let's take this expression apart:
map has a similar syntax to
grep: a block and an array (or a list of values).
It goes over all the elements of the array, executes the block and passes the result to the left.
In our case, for every value in the array it will pass the value itself followed by the number 1.
Remember
=>, aka. fat comma, is just a comma. Assuming @data has ('a', 'b', 'a') in it,
this expression will return ('a', 1, 'b', 1, 'a', 1).
- map { $_ => 1 } @data
If we assigned that expression to a hash, we would get the original data as keys, and the number 1-es as
values. Try this:
- use strict;
- use warnings;
-
- use Data::Dumper;
-
- my @data = qw(a b a);
- my %h = map { $_ => 1 } @data;
- print Dumper \%h;
and you will get:
$VAR1 = {
'a' => 1,
'b' => 1
};
If, instead of assigning it to an array we wrap the above expression in curly braces, we will get a reference to an
anonymous hash.
- { map { $_ => 1 } @data }
Let's see it in action:
- use strict;
- use warnings;
-
- use Data::Dumper;
- my @data = qw(a b a);
- my $hr = { map { $_ => 1 } @data };
- print Dumper $hr;
Will print the same output as the previous one, barring any change in order in the dumping of the hash.
Finally, starting from perl version 5.14, we can call the
keys function on hash references as well.
Thus we can write:
- my @unique = keys { map { $_ => 1 } @data };
and we'll get back the unique values from
@data
Exercise
Given the following file print out the unique values:
input.txt:
foo Bar bar first second
Foo foo another foo
expected output:
foo Bar bar first second Foo another
Exercise 2
This time filter out duplicates regardless of case.
expected output:
foo Bar first second another
Registered people will be notified when a new article is published on the Perl Maven web site.
In the comments, please wrap your code snippets within <pre> </pre> tags and use spaces for indentation.
COMMENT: by Ken, January 15th, 2008
similarly, i bet it would be also helpful for some people to see how to install perl packages with FC8 using yum…say you want to install the Perl Frontier::Client package (and its dependencies)…
$ su root
# yum -y install perl-Frontier-RPC
…all you do is append a ‘perl-’ to the package name and substitute the ‘::’ for a ‘-’ and you should be all set…
COMMENT: by Alex, March 14th, 2008
COMMENT: by jpd, March 14th, 2008
Hope that helps
COMMENT: by Noemi Millman, March 16th, 2008
COMMENT: by Albert, March 18th, 2008
COMMENT: by Michele, July 7th, 2008
COMMENT: by Phoenix2Life, September 26th, 2008
COMMENT: by nobighair, January 14th, 2009
I had a list of modules to install. Plus there were a couple I needed to force install. So I found it easier to split it up:
> sudo perl -MCPAN -eshell
To get the CPAN shell. Then in the shell:
> install XML::Writer
or
> force install XML::Writer
Cheers
COMMENT: by Guizard Sébastien, May 5th, 2009
I had stop the process at the step 2 when you have to enter the adress of the miror cpan(for searching this adress).
Now, when I restart the command, it don’t ask me for the Cpan miror and the command make was not created. What can I do ? It’s my first macbook, I’ve bought it 3 day ago. I don’t know what can I do ! i’m thinking to re instal Mac OS X. I don’t it’s good idea ! if you can help me I will be very glad ! ! !
PS : I’m sory if my english is not very good, I’m learning right now in USA ^^
COMMENT: by Noemi Millman, May 6th, 2009
COMMENT: by Mac, June 20th, 2009
COMMENT: by Noemi Millman, June 23rd, 2009
COMMENT: by Nick, July 30th, 2009
COMMENT: by Simon, November 17th, 2009
COMMENT: by nod, January 4th, 2010
COMMENT: by Christian, June 8th, 2010
I followed these instructions and everything went well until I tried to install a module. Maybe I just misread the post, but instead of ‘install Module::Name’ I had to use ‘install Name’
COMMENT: by Noemi Millman, June 8th, 2010
COMMENT: by Christian, June 8th, 2010
COMMENT: by JM, August 31st, 2010
Like Christian though, I only had to use “sudo cpan ModuleName” at the Terminal prompt to install most of them. I think there was only one where I had to prefix the command with “Bundle::”.
This is on a 27″ iMac running Snow Leopard 10.6.4.
I also installed them on a ten-year-old G4 mini-tower running Tiger.
YMMV
Thanks again!
COMMENT: by Pierre, January 10th, 2011
COMMENT: by Dan, May 20th, 2011
COMMENT: by Richard Uschold, July 17th, 2011
I still get the error: “Can’t locate SOAP/Lite.pm in @INC (…)”
COMMENT: by Richard Uschold, July 17th, 2011
All is good, now!
COMMENT: by Noemi Millman, July 18th, 2011
COMMENT: by bert, November 4th, 2011
GAAS/libwww-perl-6.03.tar.gz
/usr/bin/make install — NOT OK
—-
You may have to su to root to install the package
(Or you may want to run something like
o conf make_install_make_command ‘sudo make’
to raise your permissions.Warning (usually harmless): ‘YAML’ not installed, will not store persistent state
COMMENT: by Noemi Millman, November 4th, 2011
COMMENT: by bert, November 6th, 2011
COMMENT: by Mandy, July 5th, 2012
COMMENT: by Cliff, July 9th, 2012
COMMENT: by John Wooten, Ph.D., July 10th, 2012
Has anyone installed PDL on OS X Lion 10.7.4? If so, how?
COMMENT: by Tom Marchioro, July 23rd, 2012
Really clear and useful instructions. You should be proud (and I’m an Eli of an age who doesn’t easily praise a Tiger
BUT, as John Wooten notes, the instructions need a slight updating for the current state of Appledom. I doubt this is Lion specififc, but the new XCode has turned into a standalone app that does NOT come with the standard command line tools by default, so your instructions should now be:
1. Make sure you have the Apple Developer Tools Installed.
2. Launch XCode and bring up the Preferences panel.
3. Click on the Downloads tab and then click to install the Command Line Tools (otherwise CPAN cannot access a working version of make).
After that I think Noemi’s instructions work perfectly (at least for DBI and LWP). thanks!
Hope this helps — tom
COMMENT: by Noemi Millman, July 23rd, 2012
COMMENT: by laura, July 30th, 2012
sudo perl -MCPAN -e ‘install Bundle::Name’
becomes this:
sudo env FTP_PASSIVE=1 perl -MCPAN -e ‘install Bundle::Name’
(tip found at http://hints.macworld.com/article.php?story=20090716132354455)
laura
COMMENT: by Noemi Millman, July 31st, 2012
COMMENT: by John, August 10th, 2012
COMMENT: by Anarcissiea, August 24th, 2012
COMMENT: by vogen, September 1st, 2012
Just installed Module Prima
Thanks heaps
COMMENT: by Avita, November 20th, 2012
COMMENT: by Bretfort, January 1st, 2013
COMMENT: by Ezmyrelda, March 5th, 2013
COMMENT: by Collin Dyer, May 13th, 2013
COMMENT: by Perl on Mac | BnafetS, May 27th, 2013
COMMENT: by Gaelle, June 26th, 2013
COMMENT: by Susanne, July 8th, 2013
COMMENT: by Ryan, July 21st, 2013
COMMENT: by Jack, November 8th, 2013
COMMENT: by Anonymous, December 17th, 2013
“t/lock.t …………… 1/4″
It’s been stuck on that for about 20 minutes now. Is this normal?
COMMENT: by Anonymous, December 17th, 2013
COMMENT: by Flo, June 18th, 2014