How to Parse Text Files With Perl

Instructions For Parsing Text Files Using Perl

Businessman using computer in office
Simon Potter/Cultura/Getty Images

Parsing text files is one of the reasons Perl makes a great data mining and scripting tool.

As you'll see below, Perl can be used to basically reformat a group of text. If you look down at the first chunk of text and then the last part at the bottom of the page, you can see that the code in the middle is what transforms the first set into the second.

How to Parse Text Files With Perl

As an example, let's build a little program that opens up a tab separated data file, and parses the columns into something we can use.

Say, as an example, that your boss hands you a file with a list of names, emails and phone numbers, and wants you to read the file and do something with the information, like put it into a database or just print it out in a nicely formatted report.

The file's columns are separated with the TAB character and would look something like this:

 Larry larry@example.com 111-1111
 Curly curly@example.com 222-2222
 Moe moe@example.com 333-3333 

Here's the full listing we'll be working with:

 #!/usr/bin/perl
 
 open (FILE, 'data.txt');
 while (<FILE>) {
 chomp;
 ($name, $email, $phone) = split("\t");
 print "Name: $name\n";
 print "Email: $email\n";
 print "Phone: $phone\n";
 print "---------\n";
 }
 close (FILE);
 exit;
 

Note: This pulls some code from the how to read and write files in Perl tutorial that I've already set up. Take a look at that if you need a refresher.

What it does first is opens a file called data.txt (that should reside in the same directory as the Perl script).

Then, it reads the file into the catchall variable $_ line by line. In this case, the $_ is implied and not actually used in the code.

After reading in a line, any whitespace is chomped off the end of it. Then, the split function is used to break the line on the tab character. In this case, the tab is represented by the code \t.

To the left of the split's sign, you'll see that I'm assigning a group of three different variables. These represent one for each column of the line.

Finally, each variable that has been split from the file's line is printed separately so that you can see how to access each column's data individually.

The output of the script should look something like this:

 Name: Larry
 Email: larry@example.com
 Phone: 111-1111
 ---------
 Name: Curly
 Email: curly@example.com
 Phone: 222-2222
 ---------
 Name: Moe
 Email: moe@example.com
 Phone: 333-3333
 --------- 

Although in this example we're just printing out the data, it would be trivially easy to store that same information parsed from a TSV or CSV file, in a full fledged database.

Format
mla apa chicago
Your Citation
Brown, Kirk. "How to Parse Text Files With Perl." ThoughtCo, Jan. 10, 2017, thoughtco.com/parsing-text-files-2641088. Brown, Kirk. (2017, January 10). How to Parse Text Files With Perl. Retrieved from https://www.thoughtco.com/parsing-text-files-2641088 Brown, Kirk. "How to Parse Text Files With Perl." ThoughtCo. https://www.thoughtco.com/parsing-text-files-2641088 (accessed November 25, 2017).