Building a Web Client in Python

01
of 09

Network Programming is not Voodoo

For many people, network programming is a black art, with network programmers being the witch-doctors of the Web. As with most things that relate to computers, it really is easy once you know a little bit about it. This tutorial will help you grasp the basics of client operations by building a simple web client.
02
of 09

Programming Networks in Python: the Basics

All network transactions happen between clients and servers. In most protocols , the clients ask a certain address and receive data. The servers watch a port and give information.

To affect a network connection you need to know the host, the port, and the actions allowed on that port. Each port is associated with a service. Each server watches a different port.

Most web servers run on port 80, though it may sometimes be 8080. FTP lives on port 21, and secure shell (SSH) is on 22. For email, POP, SMTP, and IMAP all live on different ports.

You should note that these addresses are the common port numbers for the different services. A network administrator can change them for his or her network. As long as the client asks for the correct service on the right port at the right address, communication will still happen. Google's mail service, for example, does not run on the common port numbers but, because they know how to access their accounts, users can still get their mail.

03
of 09

Importing Modules for Network Programming in Python

As usual, our program will be more flexible if we assign values dynamically instead of hard-coding values. Therefore, let's import the sys module so we can grab input from the command line.

Next, import the socket module. This is the bedrock of most network programming in Python. While different modules exist for the various protocols, the socket module allows you to access any port on any machine and read or write to it. Other modules are certainly more appropriate for their given tasks (e.g., httplib, ftplib, gopher, poplib, etc.), but socket is foundational to each of the others (the httplib module, for example, imports socket).

Next, we need to declare a few variables.

04
of 09

Giving Python the Internet Protocol Information

As I mentioned earlier, every network client needs to know the address of the machine, the port of the service, and the name of the file on which it is to operate. Theoretically, we could take the port number from the command line. However, because the port usually determines the service, it is safer to hardwire the port (and therefore the service) into the program. For a web service, we will look on port 80.

 import port = 80 

Next, let's take the server's address and the name of the file from the command line. Because we are working on the Web, we can make us of the DNS and allow for URLs.

 host = sys.argv[1] 
 filename = sys.argv[2] 
05
of 09

Creating a Socket With Python

In order to access the Internet, we need to create a socket. The syntax for this call is as follows:

 <variable> = socket.socket(<family>, <type>) </blockquote>  

The recognised socket families are:

  • AF_INET: IPv4 protocols (both TCP and UDP)
  • AF_INET6: IPv6 protocols (both TCP and UDP)
  • AF_UNIX: UNIX domain protocols
The first two are obviously internet protocols. Anything that travels over the internet can be accessed in these families. Many networks still do not run on IPv6. So, unless you know otherwise, it is safest to default to IPv4 and use AF_INET.

The socket type refers to the type of communication used through the socket. The five socket types are as follows:

  • SOCK_STREAM: a connection-oriented, TCP byte stream
  • SOCK_DGRAM: UDP transferral of datagrams (self-contained IP packets that do not rely on client-server confirmation)
  • SOCK_RAW: a raw socket
  • SOCK_RDM: for reliable datagrams
  • SOCK_SEQPACKET: sequential transfer of records over a connection
By far, the most common types are SOCK_STEAM and SOCK_DGRAM because they function on the two protocols of the IP suite. The latter three are much rarer and so may not always be supported.

Let's therefore create a socket and assign it to variable; here I use c (for connection).

 c = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
06
of 09

Connecting Sockets With Python

After creating the socket, we need to connect to it using the connect method of the socket object. The socket is essentially an opening in the networking capacity of the computer. In connecting, we give it a host and port for network communication.


c.connect((host, port))
It is worth noting that the address for the connection is given as a tuple (hence the double parentheses).

Before we can read from the socket, however, we need to make a file-like object from it. Remember, Python read and writes to file-like objects, not sockets. So, we essentially tell Python to view the socket as a file. All sockets have a function makefile that takes two arguments: the mode and the buffer size. In our case, we want a simple read mode that has no buffer.


fileobj = c.makefile('r', 0)
07
of 09

Python Asks the Web Server for the File

Now we get to communicate with the server. In retrieving information from a web server, web browsers send their requests in the following format:

<type of request> <file name> <protocol to use>

Our web client will ask the server to "GET" a given file using the HTTP 1.0 protocol. To communcate this, we write to the file-like object of the socket instance:


fileobj.write("GET "+filename+" HTTP/1.0\n\n")

The server will then respond with the data, and that data will be buffered as a file-like object. But our program will not get it unless we read it. So read the file object into another variable, here called buff.


buff = fileobj.readlines()
08
of 09

Printing the Web Page With Python

Now, all of the data is contained within buff. All we need to do is step through it, printing as we go.

 for line in buff: print line 

Now, you can save the program and call it with the name of the server and the name of the file you want to see. For example, try:

python simple_web_client.py python.about.com /

If you run your own server, you can access your web directory as follows:

python simple_web_client.py localhost / What you will receive is the raw HTML of the index file. You can then process it as plain text or parse it with urllib or urllib2.
09
of 09

The Python Code for a Simple Web Client Program

To ensure that you have all the lines needed for this program, here is the code for this web client.

 #!/usr/bin/env python 
 
 # import sys for handling command line argument 
 # import socket for network communications 
 import sys, socket 
 
 # hard-wire the port number for safety's sake 
 # then take the names of the host and file from the command line 
 port = 80 
 host = sys.argv[1] 
 filename = sys.argv[2] 
 
 # create a socket object called 'c' 
 c = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
 
 # connect to the socket 
 c.connect((host, port)) 
 
 # create a file-like object to read 
 fileobj = c.makefile('r', 0) 
 
 # Ask the server for the file 
 fileobj.write("GET "+filename+" HTTP/1.0\n\n") 
 
 # read the lines of the file object into a buffer, buff 
 buff = fileobj.readlines() 
 
 # step through the buffer, printing each line 
 for line in buff: 
 print line