Working with CGI.pm In-Depth

Retrieving Environment Variables

Close-up of a businessman's hand using a laptop
Eric Audras/ONOKY/Getty Images
CGI environment variables are special bits of information about the server and client that you can access via any web programming language. They contain useful tidbits like the browser name and version, or the operating system your client uses, or the URL that referred your client in the first place. Just keep in mind that these are set dynamically and can easily be altered. Like any form of input, you should always sanitize it before using it in your CGI applications.

CGI.pm has some functions that can give you quick and simple access to your environment variables so that you can use them in your scripts.

For our examples, let's create a simple CGI script that we can run on our website to see the output it gives us:

 #!/usr/bin/perl -w
 use strict;
 use CGI qw/:standard/;
 my $cgi = new CGI;
 print $cgi->header();
 print 'user_agent(): ' . $cgi->user_agent() . '<br>';
 print 'remote_host(): ' . $cgi->remote_host() . '<br>';
 print 'script_name(): ' . $cgi->script_name() . '<br>';
 print 'referer(): ' . $cgi->referer() . '<br>';
 print 'request_method(): ' . $cgi->request_method() . '<br>'; 
Save that as env_var.cgi or something and put it in your cgi-bin directory on your web server. When you visit it in your browser, you should see something like this (although the information will be different depending on what browser / client you're using):
 user_agent(): Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.9) Gecko/20071025 Firefox/2.0.0.9
 remote_host(): 192.168.1.101
 script_name(): /cgi-bin/test.cgi
 referer():
 request_method(): GET 
The user_agent() function returns the mane and OS information of the client that's just hit the CGI script, so depending on what browser and computer you use to access it, it will read differently. You can parse and use this information to adjust your CGI based on the browser or OS of the client - really handy considering all the annoying browser differences!

remote_host() is filled in with the client's IP address - you could use this for tracking or banning, but again, given that it's so easy to fake, hide or proxy away IPs, it's not best to rely on this kind of data for mission critical systems.

If you need the full path to your script, script_name() will have it from the root of the site. This can easily be used to dynamically locate other directories, like a library or include directory. You can also use this to dynamically set the path to your script in a form or a link. The advantage to using script_name() or other similar environment variables comes when you move the script to a new location or server. If all the links and includes are created automatically, you won't have to update them!

The referer() variable will always be empty if you're hitting the script directly. If you'd like to see it in action, create another page elsewhere on your server and create a link to the script. You'll see that it now contains a link to the page you just came from in it. Collecting your referer data can be invaluable for spotting where your traffic is coming from.

The request_method() is typically used to determine what type of request is being sent to your script. Let's say that you have a script called contact_me.cgi that displays a contact form when you visit it in a browser. When you submit the form to the same script, the request method changes from a GET to a POST, but the URL stays the same.

Your script can check the request_method() for POST and deal with the data accordingly.