Chapter # 9.
CGI Programming
Including a section on CGI in a database book can seem as odd as if it included a chapter on car repairs in a cookbook. Of course, in order to go to the grocery store, you need a working car, but is it appropriate to talk about it? A full introduction to CGI and web programming in general is beyond the scope of this book, but a brief introduction to these topics is enough to expand the presentation capabilities of MySQL and mSQL in the realm of the Web.
This chapter is primarily intended for those who are learning databases but would like to acquire some knowledge of Web programming as well. If your last name is Berners-Lee or Andreessen, you are unlikely to find something here that you do not already know. But even if you're not new to CGI, having a quick reference handy while diving into the secrets of MySQL and mSQL can be quite helpful.
What is CGI?
Like most acronyms, the Common Gateway Interface (CGI) says little in essence. Interface with what? Where is this gateway? What kind of community are we talking about? To answer these questions, let's go back a bit and take a look at the WWW in general.
Tim Berners-Lee, a physicist at CERN, invented the Web in 1990, although the plan dates back to 1988. The idea was to enable particle physics researchers to easily and quickly share multimedia data - text, images and sound - through the Internet. The WWW consisted of three main parts: HTML, URL, and HTTP. HTML - a formatting language used to present content on the Web. URL - this is the address used to retrieve HTML (or otherwise) content from a web server. And finally HTTP - it is a language that the web server understands and allows clients to request documents from the server.
The ability to send all types of information over the Internet was a revolution, but another possibility was soon discovered. If you can send any text via the Web, why can't you send a text created by a program, and not taken from a ready-made file? At the same time, a sea of possibilities opens up. A simple example: you can use a program that displays the current time so that the reader sees the correct time every time the page is viewed. A few clever heads at the National Center for Supercomputing Applications who were building a web server saw the opportunity and CGI soon came along.
CGI is a set of rules according to which programs on the server can send data to clients through the web server. The CGI specification has been accompanied by changes to HTML and HTTP, introducing a new characteristic known as forms.
While CGI allows programs to send data to a client, forms extend this capability by allowing the client to send data to that CGI program. Now the user can not only see the current time, but also set the clock! CGI forms have opened the door to true interactivity in the world of the Web. Common CGI applications include:
- Dynamic HTML. Entire sites can be generated by a single CGI program.
- Search engines that find documents with user-specified words.
- Guestbooks and message boards where users can add their posts.
- Forms of orders.
- Questionnaires.
- Retrieving information from a database hosted on a server.
In subsequent chapters, we will discuss all of these CGI applications, as well as a few others. They all provide excellent CGI database connectivity, which is what interests us in this section.
HTML forms
Before exploring the specifics of CGI, it is helpful to look at the most common way end users get to interface with CGI programs: HTML forms. Forms are part of the HTML language that provides various types of fields to the end user. The data entered in the fields can be forwarded to the web server. Fields can be used to enter text or be buttons that the user can click or tick. Here is an example HTML page containing a form:
<НТМL><НЕАD><ТITLЕ>My Forms Page
<р>This is a page with a form.
This form creates a 40 character string where the user can enter their name. Below the input line is a button that, when pressed, sends the form data to the server. The following are form-related tags supported by HTML 3.2, the most widely used standard today. Tag and attribute names can be entered in any case, but we adhere to the optional convention that opening tags are uppercase and closing tags lowercase.
The only input type we haven't used here is the IMAGE type for the tag ... One could use it as an alternative way to submit the form. However, the IMAGE type is rarely compatible with text-based and not very responsive browsers, so it is prudent to avoid it unless your site has a rich graphical style.
Now that you are familiar with the basics of HTML forms, you can start learning about CGI itself.
CGI specification
So what exactly is the “rule set” that allows a CGI program in, say, Batavia, Illinois, to communicate with a web browser in Outer Mongolia? The official CGI specification, along with a ton of other CGI information, can be found on the NCSA server at http: // hoohoo ... ncsa.uluc.edu/ cgi /. However, this chapter exists so that you do not have to travel long and look for it yourself.
There are four ways in which CGI transfers data between the CGI npor frame and the web server, and therefore the Web client:
- Environment variables.
- Command line.
- Standard input device.
- Standard output device.
With these four methods, the server forwards all the data sent by the client to the CGI program. The CGI program then does its magic and sends the output back to the server, which forwards it to the client.
This data is provided with an estimate for the Apache HTTP Server. Apache is the most widely used web server that runs on almost any platform, including Windows 9x and Windows NT. However, they can be applied to all HTTP servers that support CGI. Some proprietary servers, such as those from Microsoft and Netscape, may have additional functionality or work slightly differently. As the face of the Web continues to change at an incredible rate, standards are still evolving and there will no doubt be a change in the future. However, when it comes to CGI, it appears to be a well-established technology - the price to pay is that other technologies, such as applets, have pushed it back. Any CGI programs you write using this information will almost certainly be able to run for years to come on most web servers.
When a CGI program is invoked through a form, the most common interface, the browser sends a long string to the server, beginning with the path to the CGI program and its name. This is followed by various other data, called path information, and passed to the CGI program through the PATH_INFO environment variable (Figure 9-1). The path information is followed by a "?" Followed by the form data that is sent to the server using the HTTP GET method. This data is made available to the CGI program through the QUERY_STRING environment variable. Any data that the page sends using the HTTP POST method, which is the most commonly used method, will be passed to the CGI program through standard input. A typical string that a server can receive from a browser is shown in Figure 1. 9-1. Program named formread in the catalog cgi-bin called by the server with additional path information extra / information and choice = help request data - apparently as part of the original URL. Finally, the data of the form itself (the text "CGI programming" in the field "keywords") is sent via the HTTP POST method.
Environment variables
When the server executes a CGI program, it first of all passes some data to it to work in the form of environment variables. Seventeen variables are officially defined in the specification, but many more are used unofficially through a mechanism described below called HTTP_ / nec / zams / n. CGI program
has access to these variables in the same way as any shell environment variables when run from the command line. In a shell script, for example, the environment variable F00 can be accessed as $ F00; in Perl this call looks like $ ENV ("F00"); in C, getenv ("F00"); and so on. Table 9-1 lists the variables that are always set by the server — at least null. In addition to these variables, the data returned by the client in the request header is assigned to variables of the form HTTP_F00, where F00 is the header name. For example, most web browsers include version information in a header named USEfl_AGENT. Your CGI-npor-frame can get this data from the HTTP_USER_AGENT variable.
Table 9-1.CGI environment variables
Environment variable |
Description |
||
CONTENT_LENGTH |
Length of data sent by POST or PUT methods, in bytes. |
||
CONTENT_TYPE |
The MIME type of data attached using the POST or PUT methods. |
||
GATEWAY_INTERFACE |
The version number of the CGI specification supported by the server. |
||
PATH_INFO |
Additional path information supplied by the client. For example, to request http: //www.myserver.eom/test.cgi/this/is/a/ path? field = green the value of the variable PATH_ INFO will be / this / is / a / path. |
||
PATH_TRANSLATED |
Same as PATH_INFO, but the server produces all |
||
|
Possible translation, for example name expansion like "-account". " |
||
QUERY_STRING |
All data following the "?" in the url. This is also the data sent when the form's REQ-UEST_METHOD is GET. |
||
REMOTE_ADDR |
The IP address of the client making the request. |
||
REMOTE_HOST |
The hostname of the client machine, if available. |
||
REMOTE_IDENT |
If the web server and client support type authentication identd, then this is the username of the account that is making the request. |
||
REQUEST_METHOD |
The method used by the client for the request. For the CGI programs we are about to build, this will usually be POST or GET. |
||
SERVER_NAME | Hostname — or IP address if no name is available — of the machine on which the web server is running. | ||
SERVER_PORT | Port number used by the web server. | ||
SERVER_PROTOCOL |
The protocol used by the client to communicate with the server. In our case, this protocol is almost always HTTP. | ||
SERVER_SOFTWARE | Information about the version of the web server running the CGI program. | ||
SCRIPT_NAME |
The path to the script to execute as specified by the client. Can be used to reference a URL to itself, and so that scripts that are referenced in different places can execute differently depending on the place. |
||
Here's an example Perl CGI script that prints out all the environment variables set by the server, as well as all inherited variables, such as PATH, set by the shell that started the server.
#! / usr / bin / perl -w
print<< HTML;
Content-type: text / html \ n \ n
Html
foreach (keys% ENV) (print "$ _: $ ENV ($ _)
\ n ";)
print<
Html
All of these variables can be used and even modified by your CGI program. However, these changes do not affect the web server that launched the program.
Command line
CGI allows arguments to be passed to the CGI program as command line parameters, which is rarely used. It is rarely used because its practical applications are few, and we will not dwell on it in detail. The bottom line is that if the environment variable QUERY_STRING does not contain the "=" symbol, then the CGI program will be executed with the command line parameters taken from QUERY_STRING. For example, http://www.myserver.com/cgi- bin / finger? root will run finger root on www.myserver.com.
There are two main libraries that provide a CGI interface for Perl. The first one is cgi-lib.pl Utility cgi-lib.pl very common as it was the only large library available for a long time. It is designed to work in Perl 4, but it also works with Perl 5. The second library, CGI.pm, newer and in many ways superior cgi-lib.pl. CGI.pm written for Perl 5 and uses a fully object-oriented framework for working with CGI data. Module CGI.pm parses standard input and the QUERY_STRING variable and stores the data in a CGI object. Your program only needs to create a new CGI object and use simple methods like paramQ to retrieve the data you want. Example 9-2 serves as a short demonstration of how CGI.pm interprets the data. All Perl examples in this chapter will use CGI.pm.
Example 9-2.
Parsing CGI Data in Perl
#! / usr / bin / perl -w
use CGI qw (: standard);
# The CGI.pm module is used. qw (: standard) imports
# the namespace of standard CGI functions to get
# clearer code. This can be done if the script
# only one CGI object is used.
$ mycgi = new CGI; # Create a CGI object that will be the "gateway" to the form data
@fields = $ mycgi-> param; # Extract names of all filled form fields
print header, start_html ("CGI.pm test"); ft Methods "header" and "start_html",
# provided
# CGI.pm, make it easy to get HTML.
# "header" outputs the required HTTP header, a
# "start_html" displays the HTML title with the given name,
#a also tag
.print "<р>Form data:
";
foreach (@fields) (print $ _, ":", - $ mycgi-> param ($ _), "
"; }
# For each field, print the name and value obtained using
#
$ mycgi-> param ("fieldname").
print end_html; # Shorthand for outputting trailing tags "".
Processing input data in C
Since the core APIs for MySQL and mSQL are written in C, we will not ditch C entirely in favor of Perl, but where appropriate, we will provide some examples in C. There are three widely used C libraries for CGI programming: cgic Tom Boutell *; cgihtml Eugene Kim t and libcgi from EIT *. We believe that cgic is the most complete and easiest to use. It lacks, however, the ability to list all the form variables when you don't know them beforehand. In fact, it can be added with a simple patch, but that is beyond the scope of this chapter. Therefore, in example 9-3 we use the library cgihtml, to repeat the above Perl script in C.
Example 9-3.Parsing CGI Data in C
/*
cgihtmltest.c - Typical CGI program for displaying keys and their values
from the data received from the form * /
#include
#include "cgi-lib.h" / * This contains all definitions of CGI functions * /
#include "html-lib.h" / * This contains "all the HTML helper function definitions * /
void print_all (llist 1)
/ * These functions output the data supplied by the form in the same format as the above Perl script. Cgihtml provides also a built-in function
Print_entries (), which does the same using the HTML list format. * / (
node * window;
/ * The "node" type is defined in the cgihtml library and refers to a linked list that stores all the form data. * /
window = I.head; / * Sets a pointer to the beginning of the form data * /
while (window! = NULL) (/ * Step through the linked list to the last (first empty) element * /
printf ("% s:% s
\ n ", window-> entry. name, replace_ltgt (window-> entry.value));
/ * Print data. Replace__ltgt () is a function that understands the HTML encoding of text and ensures that it is displayed correctly on the client's browser. * /
window = window-> next; / * Move to the next item in the list. * /
} }
int main () (
llist entries; / * Pointer to the parsed data * /
int status; / * An integer representing the status * /
Html__header (); / * HTML helper function that outputs the HTML header * /
Html_begin ("cgihtml test");
/ * HTML helper function that displays the start of the HTML page with the specified title. * /
status = read_cgi_input (& entries); / * Performs input and parsing of form data * /
Printf ("<р>Form data:
");
Print_all (entries); / * Calls the print_all () function defined above. * /
html_end (); / * HTML helper function that prints the end of the HTML page. * /
List_clear (& entries); / * Releases the memory used by the form data. * /
return 0; )
Standard output device
The data sent by the CGI program to standard output is read by the web server and sent to the client. If the script name starts with nph-, then the data is sent directly to the client without any intervention from the web server. In this case, the CGI program must generate the correct HTTP header that the client can understand. Otherwise, let the web server generate the HTTP header for you.
Even if you don't use nph-script, the server needs to be given one directive, which will tell it information about your output. This is usually the Content-Type HTTP header, but it can also be a Location header. The heading must be followed by an empty line, that is, a line feed or a CR / LF combination.
The Content-Type header tells the server what type of data your CGI program is returning. If it is an HTML page, then the string must be Content-Type: text / html. The Location header tells the server a different URL — or a different path on the same server — where to point the client. The header should look like this: Location: http: // www. myserver. com / another / place /.
After the HTTP headers and a blank line, you can send the actual data your program produces — an HTML page, image, text, or whatever. Among the CGI programs that come with Apache are nph-test-cgi and test-cgi, which demonstrate well the difference between nph and non-nph headers, respectively.
In this section, we will use the libraries CGI.pm and cgic, which have functions to output both HTTP and HTML headers. This will allow you to focus on displaying the actual content. These helper functions are used in the examples earlier in this chapter.
Important Features of CGI Scripting
You already know basically how CGI works. The client submits data, usually via a form, to the web server. The server runs the CGI program, passing data to it. The CGI program does its processing and returns its output to the server, which forwards it to the client. Now, from understanding how CGI npor frames work, you need to move on to understanding why they are so widely used.
While you already know enough from this chapter to put together a simple, working CGI program, there are a few more important issues to be covered before you can write a really working MySQL or mSQL program. First, you need to learn how to work with multiple shapes. Next, you need to learn some security measures that will prevent attackers from gaining illegal access to your server's files or destroying them.
Remembering the state
Remembering state is a vital means of providing good service to your users, not just to fight hardened criminals as it might seem. The problem is caused by the fact that HTTP is a so-called "no memory" protocol. This means that the client sends data to the server, the server returns the data to the client, and then everyone goes their own way. The server does not store data about the client, which may be needed in subsequent operations. Likewise, there is no certainty that the client will save any data about the completed operation that can be used later. This imposes an immediate and significant restriction on the use of the World Wide Web.
CGI scripting with this protocol is analogous to not being able to remember a conversation. Whenever you talk to someone, no matter how often you've talked to them before, you have to introduce yourself and look for a common topic of conversation. Needless to say, this is not conducive to productivity. Figure 9-2 shows that whenever a request reaches the CGI program, it is a completely new instance of the program with no connection to the previous one.
On the client side, with the advent of Netscape Navigator, there was a hastily-looking solution called cookies. It consists of creating a new HTTP header that can be sent back and forth between the client and the server, similar to the Content-Type and Location headers. The client's browser, having received the cookie header, must store the data in the cookie, as well as the name of the domain in which this cookie is valid. Thereafter, whenever a URL within the specified domain is visited, the cookie header must be returned to the server for use in CGI programs on that server.
The cookie method is mainly used to store the user ID. The visitor information can be saved to a file on the server machine. The unique ID of this user can be sent as a cookie to the user's browser, after which each time the user visits the site, the browser automatically sends this ID to the server. The server passes the ID to the CGI program, which opens the corresponding file and gains access to all user data. All this happens in a way that is invisible to the user.
As useful as this method is, most large sites don't use it as their sole means of remembering state. There are a number of reasons for this. First, not all browsers support cookies. Until recently, the main browser for visually impaired people (not to mention people with insufficient internet connection speed) - Lynx - did not support cookies. He still does not "officially" support them, although some of his widely available "side branches" do. Second, and more importantly, cookies bind a user to a specific machine. One of the great things about the Web is that it is accessible from anywhere in the world. Regardless of where your web page was created or stored, it can be displayed from any machine connected to the Internet. However, if you try to access a cookie-enabled site from someone else's machine, all of your personal data maintained by the cookie will be lost.
Many sites still use cookies to personalize user pages, but most complement them with a traditional login / password interface. If the site is accessed from a browser that does not support cookies, then the page contains a form in which the user enters the registration name and password assigned to him when he first visited the site. Usually this form is small and modest, so as not to scare off most users who are not interested in any personalization, but simply want to go further. After the user enters the login and password in the form, the CGI finds a file with data about this user, as if the name was sent with a cookie. Using this method, a user can register with a personalized website from anywhere in the world.
In addition to the tasks of taking into account user preferences and long-term storage of information about him, one can give a more subtle example of storing state, which is given by popular search engines. When you search using services such as AltaVista or Yahoo, you usually get significantly more results than can be displayed in an easy-to-read format. This problem is solved by showing a small number of results — usually 10 or 20 — and giving some sort of navigation to view the next group of results. While this behavior may seem common and expected to the average Web traveler, the actual implementation is non-trivial and requires statefulness.
When a user first queries a search engine, the search engine collects all the results, possibly limited to some predefined limit. The trick is to produce small numbers of these results at the same time, while remembering which user requested these results and what chunk he expects next. Leaving aside the complexity of the search engine itself, we are faced with the problem of sequentially providing the user with some information on one page. Consider Example 9-4, which shows a CGI script that prints out ten lines of a file and lets it see the next or previous ten lines.
Example 9-4. Saving State in a CGI Script
#! / usr / bin / perl -w
use CGI;
Open (F, "/ usr / dict / words") or die ("I can't open! $!");
# This file to be output can be anything.
$ output = new CGI;
sub print_range (# This is the main function of the program, my $ start = shift;
# The starting line of the file, my $ count = 0;
# Pointer, my $ line = "";
# Current line of the file, print $ output-> header,
$ output-> start_html ("My Dictionary");
#
Creates HTML with title "My Dictionary", print "
while (($ count< $start) and ($line =
# Skip all lines before the start line, while (($ count< $start+10) and
($line ?
#
Print the next 10 lines.
my $ newnext = $ start + 10; my $ newprev = $ start-10;
# Set initial lines for URLs "Next" and "Previous",
print "
";
unless ($ start == 0) (# Include the "Previous" URL if only you
# is no longer at the beginning.
print qq% Previous%; )
unless (eof) (# Include "Next" URL if only you #
not at the end of the file.
print qq% Next%;
}
print "HTML;Html
exit (0); )
# If no data is available, start over,
if (not $ output-> param) (
& print_range (0); )
# Otherwise start from the line specified in the data.
& print_range ($ output-> param ("start"));
In this example, remembering the state is done using the simplest method. There is no problem with saving the data, since we keep it in a file on the server. We only need to know where to start the output, so the script simply includes the starting point for the next or previous group of lines in the URL — all that is needed to generate the next page.
However, if you need more than just flipping a "file, then relying on the URL can be cumbersome. This difficulty can be alleviated by using an HTML form and including state information in tags. type HIDDEN. This technique has been used successfully on many sites, allowing links to be made between related CGI programs or extending the use of a single CGI program, as in the previous example. Instead of linking to a specific object such as a start page, the URL data can point to an automatically generated user ID.
This is how AltaVista and other search engines work. On the first search, a user ID is generated and hidden in subsequent URLs. This ID is associated with one or more files containing the query results. Two more things are included in the URL: the current position in the results file and the direction in which you want to navigate further in it. These three values are all that is needed to run the powerful navigation systems of large search engines.
However, there is still something missing. The file used in our example / usr / diet / words very large. What if we leave it in the middle of reading, but want to come back to it later? Without remembering the URL of the next page, there is no way to go back, not even AltaVista will allow it. If you restart your computer or start working from a different one, you cannot return to your previous search results without re-entering your query. However, this long-term statefulness is at the heart of the website personalization we discussed above, and it's worth looking at how you can take advantage of it. Example 9-5 is a modified version of Example 9-4.
Example 9-5.
Stable memorization of the state
#! / usr / bin / perl -w
use CGI;
umask 0;
Open (F, "/ usr / dict / words") or die ("I can't open! $!");
Chdir ("users") or die ("I can't go to the $ directory!");
#
This is the directory where all data will be stored.
# about the user.
Soutput = new CGI;
if (not $ output-> param) (
print $ output-> header,
$ output-> start_html ("My Dictionary");
print "HTML;