Bathroom renovation portal. Useful Tips

Cgi examples. E-Commerce - what is it

Chapter # 9.

CGI Programming

Including a section on CGI in a database book can seem as odd as if it included a chapter on car repairs in a cookbook. Of course, in order to go to the grocery store, you need a working car, but is it appropriate to talk about it? A full introduction to CGI and web programming in general is beyond the scope of this book, but a brief introduction to these topics is enough to expand the presentation capabilities of MySQL and mSQL in the realm of the Web.

This chapter is primarily intended for those who are learning databases but would like to acquire some knowledge of Web programming as well. If your last name is Berners-Lee or Andreessen, you are unlikely to find something here that you do not already know. But even if you're not new to CGI, having a quick reference handy while diving into the secrets of MySQL and mSQL can be quite helpful.

What is CGI?

Like most acronyms, the Common Gateway Interface (CGI) says little in essence. Interface with what? Where is this gateway? What kind of community are we talking about? To answer these questions, let's go back a bit and take a look at the WWW in general.

Tim Berners-Lee, a physicist at CERN, invented the Web in 1990, although the plan dates back to 1988. The idea was to enable particle physics researchers to easily and quickly share multimedia data - text, images and sound - through the Internet. The WWW consisted of three main parts: HTML, URL, and HTTP. HTML - a formatting language used to present content on the Web. URL - this is the address used to retrieve HTML (or otherwise) content from a web server. And finally HTTP - it is a language that the web server understands and allows clients to request documents from the server.

The ability to send all types of information over the Internet was a revolution, but another possibility was soon discovered. If you can send any text via the Web, why can't you send a text created by a program, and not taken from a ready-made file? At the same time, a sea of ​​possibilities opens up. A simple example: you can use a program that displays the current time so that the reader sees the correct time every time the page is viewed. A few clever heads at the National Center for Supercomputing Applications who were building a web server saw the opportunity and CGI soon came along.

CGI is a set of rules according to which programs on the server can send data to clients through the web server. The CGI specification has been accompanied by changes to HTML and HTTP, introducing a new characteristic known as forms.

While CGI allows programs to send data to a client, forms extend this capability by allowing the client to send data to that CGI program. Now the user can not only see the current time, but also set the clock! CGI forms have opened the door to true interactivity in the world of the Web. Common CGI applications include:

  • Dynamic HTML. Entire sites can be generated by a single CGI program.
  • Search engines that find documents with user-specified words.
  • Guestbooks and message boards where users can add their posts.
  • Forms of orders.
  • Questionnaires.
  • Retrieving information from a database hosted on a server.

In subsequent chapters, we will discuss all of these CGI applications, as well as a few others. They all provide excellent CGI database connectivity, which is what interests us in this section.

HTML forms

Before exploring the specifics of CGI, it is helpful to look at the most common way end users get to interface with CGI programs: HTML forms. Forms are part of the HTML language that provides various types of fields to the end user. The data entered in the fields can be forwarded to the web server. Fields can be used to enter text or be buttons that the user can click or tick. Here is an example HTML page containing a form:

<НТМL><НЕАD><ТITLЕ>My Forms Page


<р>This is a page with a form.


Enter your name:



This form creates a 40 character string where the user can enter their name. Below the input line is a button that, when pressed, sends the form data to the server. The following are form-related tags supported by HTML 3.2, the most widely used standard today. Tag and attribute names can be entered in any case, but we adhere to the optional convention that opening tags are uppercase and closing tags lowercase.


This tag points to the beginning of the form. A closing tag is required at the end of the form

... Between tags
three attributes are allowed: ACTION specifies the URL or relative path to the CGI program to which the data will be sent; METHOD specifies the HTTP method through which the form will be submitted (this can be GET or h POST, but we'll almost always use POST); ENCTYPE specifies the method to encode the data (it should only be used with a clear understanding of what you are doing).


Provides the most flexible way of user input. There are actually nine different types of tag ... The type is specified by the TYPE attribute. The previous example uses two tags : one with the type SUBMIT and the other with the default type TEXT. The nine types are as follows:

TEXT

A field for the user to enter one line of text.

PASSWORD

Same as TEXT, but the text you enter is not displayed on the screen.

CHECKBOX

A flag that the user can set and clear.

RADIO

A radio button that must be combined with at least one more radio button. The user can select only one of them.

SUBMIT

The button that, when clicked, submits the form to the web server.

RESET

The button that restores the default values ​​in the form when clicked.

FILE

Similar to a text box, but assumes you enter the name of a file that will be sent to the server.

HIDDEN

An invisible field in which data can be stored.

IMAGE

It is similar to the SUBMIT button, but you can set a picture for the image on the button.

In addition to the TYPE attribute, the tags usually have a NAME attribute that associates the data entered in the field with some name. The name and data are passed to the server in the style value = value. In the previous example, the text field was named firstname. You can use the VALUE attribute to assign predefined values ​​to fields of type TEXT, PASSWORD, FILE, and HIDDEN. The same attribute, used with buttons such as SUBMIT or RESET, displays the specified text on them. Fields of type RADIO and CHECKBOX can be displayed as exposed using the CHECKED attribute without specifying a value.

The SIZE attribute is used to set the length of fields of type TEXT, PASSWORD, and FILE. The MAXLENGTH attribute can be used to limit the length of the entered text. The SRC attribute specifies the URL of the image used in the IMAGE type. Finally, the ALIGN attribute specifies how the image is aligned for the IMAGE type and can be TOP, MIDDLE, BOTTOM (default), LEFT, or RIGHT (up, middle, down, left, right).

.

Like tag , at the tag , and any text between the tags will be accepted as the default text. , similar to the VALUE attribute for the tag ... For tag

, giving space to enter an outline. The data is named "essay". A block of text 70 characters wide and 10 lines deep. Space between tags

can be used for sample sketch. ->

types "SUBMIT" and "RESET" respectively. The "SUBMIT" button has a redefined label "Enter data", and the "RESET" button has a default label (defined by the browser). By clicking on the "SUBMIT" button, you will send the data to the web server, the "RESET" button will restore the data R to its original state, deleting all the data entered by the user. ->


The only input type we haven't used here is the IMAGE type for the tag ... One could use it as an alternative way to submit the form. However, the IMAGE type is rarely compatible with text-based and not very responsive browsers, so it is prudent to avoid it unless your site has a rich graphical style.

Now that you are familiar with the basics of HTML forms, you can start learning about CGI itself.

CGI specification

So what exactly is the “rule set” that allows a CGI program in, say, Batavia, Illinois, to communicate with a web browser in Outer Mongolia? The official CGI specification, along with a ton of other CGI information, can be found on the NCSA server at http: // hoohoo ... ncsa.uluc.edu/ cgi /. However, this chapter exists so that you do not have to travel long and look for it yourself.

There are four ways in which CGI transfers data between the CGI npor frame and the web server, and therefore the Web client:

  • Environment variables.
  • Command line.
  • Standard input device.
  • Standard output device.

With these four methods, the server forwards all the data sent by the client to the CGI program. The CGI program then does its magic and sends the output back to the server, which forwards it to the client.

This data is provided with an estimate for the Apache HTTP Server. Apache is the most widely used web server that runs on almost any platform, including Windows 9x and Windows NT. However, they can be applied to all HTTP servers that support CGI. Some proprietary servers, such as those from Microsoft and Netscape, may have additional functionality or work slightly differently. As the face of the Web continues to change at an incredible rate, standards are still evolving and there will no doubt be a change in the future. However, when it comes to CGI, it appears to be a well-established technology - the price to pay is that other technologies, such as applets, have pushed it back. Any CGI programs you write using this information will almost certainly be able to run for years to come on most web servers.

When a CGI program is invoked through a form, the most common interface, the browser sends a long string to the server, beginning with the path to the CGI program and its name. This is followed by various other data, called path information, and passed to the CGI program through the PATH_INFO environment variable (Figure 9-1). The path information is followed by a "?" Followed by the form data that is sent to the server using the HTTP GET method. This data is made available to the CGI program through the QUERY_STRING environment variable. Any data that the page sends using the HTTP POST method, which is the most commonly used method, will be passed to the CGI program through standard input. A typical string that a server can receive from a browser is shown in Figure 1. 9-1. Program named formread in the catalog cgi-bin called by the server with additional path information extra / information and choice = help request data - apparently as part of the original URL. Finally, the data of the form itself (the text "CGI programming" in the field "keywords") is sent via the HTTP POST method.

Environment variables

When the server executes a CGI program, it first of all passes some data to it to work in the form of environment variables. Seventeen variables are officially defined in the specification, but many more are used unofficially through a mechanism described below called HTTP_ / nec / zams / n. CGI program

has access to these variables in the same way as any shell environment variables when run from the command line. In a shell script, for example, the environment variable F00 can be accessed as $ F00; in Perl this call looks like $ ENV ("F00"); in C, getenv ("F00"); and so on. Table 9-1 lists the variables that are always set by the server — at least null. In addition to these variables, the data returned by the client in the request header is assigned to variables of the form HTTP_F00, where F00 is the header name. For example, most web browsers include version information in a header named USEfl_AGENT. Your CGI-npor-frame can get this data from the HTTP_USER_AGENT variable.

Table 9-1.CGI environment variables

Environment variable

Description

CONTENT_LENGTH

Length of data sent by POST or PUT methods, in bytes.

CONTENT_TYPE

The MIME type of data attached using the POST or PUT methods.

GATEWAY_INTERFACE

The version number of the CGI specification supported by the server.

PATH_INFO

Additional path information supplied by the client. For example, to request http: //www.myserver.eom/test.cgi/this/is/a/ path? field = green the value of the variable PATH_ INFO will be / this / is / a / path.

PATH_TRANSLATED

Same as PATH_INFO, but the server produces all


Possible translation, for example name expansion like "-account". "

QUERY_STRING

All data following the "?" in the url. This is also the data sent when the form's REQ-UEST_METHOD is GET.

REMOTE_ADDR

The IP address of the client making the request.

REMOTE_HOST

The hostname of the client machine, if available.

REMOTE_IDENT

If the web server and client support type authentication identd, then this is the username of the account that is making the request.

REQUEST_METHOD

The method used by the client for the request. For the CGI programs we are about to build, this will usually be POST or GET.

SERVER_NAME Hostname — or IP address if no name is available — of the machine on which the web server is running.
SERVER_PORT Port number used by the web server.
SERVER_PROTOCOL
The protocol used by the client to communicate with the server. In our case, this protocol is almost always HTTP.
SERVER_SOFTWARE Information about the version of the web server running the CGI program.

SCRIPT_NAME

The path to the script to execute as specified by the client. Can be used to reference a URL to itself, and so that scripts that are referenced in different places can execute differently depending on the place.

Here's an example Perl CGI script that prints out all the environment variables set by the server, as well as all inherited variables, such as PATH, set by the shell that started the server.

#! / usr / bin / perl -w

print<< HTML;

Content-type: text / html \ n \ n

<р>Environment variables

Html

foreach (keys% ENV) (print "$ _: $ ENV ($ _)
\ n ";)

print<

Html

All of these variables can be used and even modified by your CGI program. However, these changes do not affect the web server that launched the program.

Command line

CGI allows arguments to be passed to the CGI program as command line parameters, which is rarely used. It is rarely used because its practical applications are few, and we will not dwell on it in detail. The bottom line is that if the environment variable QUERY_STRING does not contain the "=" symbol, then the CGI program will be executed with the command line parameters taken from QUERY_STRING. For example, http://www.myserver.com/cgi- bin / finger? root will run finger root on www.myserver.com.

There are two main libraries that provide a CGI interface for Perl. The first one is cgi-lib.pl Utility cgi-lib.pl very common as it was the only large library available for a long time. It is designed to work in Perl 4, but it also works with Perl 5. The second library, CGI.pm, newer and in many ways superior cgi-lib.pl. CGI.pm written for Perl 5 and uses a fully object-oriented framework for working with CGI data. Module CGI.pm parses standard input and the QUERY_STRING variable and stores the data in a CGI object. Your program only needs to create a new CGI object and use simple methods like paramQ to retrieve the data you want. Example 9-2 serves as a short demonstration of how CGI.pm interprets the data. All Perl examples in this chapter will use CGI.pm.

Example 9-2. Parsing CGI Data in Perl

#! / usr / bin / perl -w

use CGI qw (: standard);

# The CGI.pm module is used. qw (: standard) imports

# the namespace of standard CGI functions to get

# clearer code. This can be done if the script

# only one CGI object is used.

$ mycgi = new CGI; # Create a CGI object that will be the "gateway" to the form data

@fields = $ mycgi-> param; # Extract names of all filled form fields

print header, start_html ("CGI.pm test"); ft Methods "header" and "start_html",

# provided

# CGI.pm, make it easy to get HTML.

# "header" outputs the required HTTP header, a

# "start_html" displays the HTML title with the given name,

#a also tag .

print "<р>Form data:
";

foreach (@fields) (print $ _, ":", - $ mycgi-> param ($ _), "
"; }

# For each field, print the name and value obtained using

# $ mycgi-> param ("fieldname").

print end_html; # Shorthand for outputting trailing tags "".

Processing input data in C

Since the core APIs for MySQL and mSQL are written in C, we will not ditch C entirely in favor of Perl, but where appropriate, we will provide some examples in C. There are three widely used C libraries for CGI programming: cgic Tom Boutell *; cgihtml Eugene Kim t and libcgi from EIT *. We believe that cgic is the most complete and easiest to use. It lacks, however, the ability to list all the form variables when you don't know them beforehand. In fact, it can be added with a simple patch, but that is beyond the scope of this chapter. Therefore, in example 9-3 we use the library cgihtml, to repeat the above Perl script in C.

Example 9-3.Parsing CGI Data in C

/* cgihtmltest.c - Typical CGI program for displaying keys and their values

from the data received from the form * /

#include

#include "cgi-lib.h" / * This contains all definitions of CGI functions * /

#include "html-lib.h" / * This contains "all the HTML helper function definitions * /

void print_all (llist 1)

/ * These functions output the data supplied by the form in the same format as the above Perl script. Cgihtml provides also a built-in function

Print_entries (), which does the same using the HTML list format. * / (

node * window;

/ * The "node" type is defined in the cgihtml library and refers to a linked list that stores all the form data. * /

window = I.head; / * Sets a pointer to the beginning of the form data * /

while (window! = NULL) (/ * Step through the linked list to the last (first empty) element * /

printf ("% s:% s
\ n ", window-> entry. name, replace_ltgt (window-> entry.value));

/ * Print data. Replace__ltgt () is a function that understands the HTML encoding of text and ensures that it is displayed correctly on the client's browser. * /

window = window-> next; / * Move to the next item in the list. * /

} }

int main () (

llist entries; / * Pointer to the parsed data * /

int status; / * An integer representing the status * /

Html__header (); / * HTML helper function that outputs the HTML header * /

Html_begin ("cgihtml test");

/ * HTML helper function that displays the start of the HTML page with the specified title. * /

status = read_cgi_input (& entries); / * Performs input and parsing of form data * /

Printf ("<р>Form data:
");

Print_all (entries); / * Calls the print_all () function defined above. * /

html_end (); / * HTML helper function that prints the end of the HTML page. * /

List_clear (& entries); / * Releases the memory used by the form data. * /

return 0; )

Standard output device

The data sent by the CGI program to standard output is read by the web server and sent to the client. If the script name starts with nph-, then the data is sent directly to the client without any intervention from the web server. In this case, the CGI program must generate the correct HTTP header that the client can understand. Otherwise, let the web server generate the HTTP header for you.

Even if you don't use nph-script, the server needs to be given one directive, which will tell it information about your output. This is usually the Content-Type HTTP header, but it can also be a Location header. The heading must be followed by an empty line, that is, a line feed or a CR / LF combination.

The Content-Type header tells the server what type of data your CGI program is returning. If it is an HTML page, then the string must be Content-Type: text / html. The Location header tells the server a different URL — or a different path on the same server — where to point the client. The header should look like this: Location: http: // www. myserver. com / another / place /.

After the HTTP headers and a blank line, you can send the actual data your program produces — an HTML page, image, text, or whatever. Among the CGI programs that come with Apache are nph-test-cgi and test-cgi, which demonstrate well the difference between nph and non-nph headers, respectively.

In this section, we will use the libraries CGI.pm and cgic, which have functions to output both HTTP and HTML headers. This will allow you to focus on displaying the actual content. These helper functions are used in the examples earlier in this chapter.

Important Features of CGI Scripting

You already know basically how CGI works. The client submits data, usually via a form, to the web server. The server runs the CGI program, passing data to it. The CGI program does its processing and returns its output to the server, which forwards it to the client. Now, from understanding how CGI npor frames work, you need to move on to understanding why they are so widely used.

While you already know enough from this chapter to put together a simple, working CGI program, there are a few more important issues to be covered before you can write a really working MySQL or mSQL program. First, you need to learn how to work with multiple shapes. Next, you need to learn some security measures that will prevent attackers from gaining illegal access to your server's files or destroying them.

Remembering the state

Remembering state is a vital means of providing good service to your users, not just to fight hardened criminals as it might seem. The problem is caused by the fact that HTTP is a so-called "no memory" protocol. This means that the client sends data to the server, the server returns the data to the client, and then everyone goes their own way. The server does not store data about the client, which may be needed in subsequent operations. Likewise, there is no certainty that the client will save any data about the completed operation that can be used later. This imposes an immediate and significant restriction on the use of the World Wide Web.

CGI scripting with this protocol is analogous to not being able to remember a conversation. Whenever you talk to someone, no matter how often you've talked to them before, you have to introduce yourself and look for a common topic of conversation. Needless to say, this is not conducive to productivity. Figure 9-2 shows that whenever a request reaches the CGI program, it is a completely new instance of the program with no connection to the previous one.

On the client side, with the advent of Netscape Navigator, there was a hastily-looking solution called cookies. It consists of creating a new HTTP header that can be sent back and forth between the client and the server, similar to the Content-Type and Location headers. The client's browser, having received the cookie header, must store the data in the cookie, as well as the name of the domain in which this cookie is valid. Thereafter, whenever a URL within the specified domain is visited, the cookie header must be returned to the server for use in CGI programs on that server.

The cookie method is mainly used to store the user ID. The visitor information can be saved to a file on the server machine. The unique ID of this user can be sent as a cookie to the user's browser, after which each time the user visits the site, the browser automatically sends this ID to the server. The server passes the ID to the CGI program, which opens the corresponding file and gains access to all user data. All this happens in a way that is invisible to the user.

As useful as this method is, most large sites don't use it as their sole means of remembering state. There are a number of reasons for this. First, not all browsers support cookies. Until recently, the main browser for visually impaired people (not to mention people with insufficient internet connection speed) - Lynx - did not support cookies. He still does not "officially" support them, although some of his widely available "side branches" do. Second, and more importantly, cookies bind a user to a specific machine. One of the great things about the Web is that it is accessible from anywhere in the world. Regardless of where your web page was created or stored, it can be displayed from any machine connected to the Internet. However, if you try to access a cookie-enabled site from someone else's machine, all of your personal data maintained by the cookie will be lost.

Many sites still use cookies to personalize user pages, but most complement them with a traditional login / password interface. If the site is accessed from a browser that does not support cookies, then the page contains a form in which the user enters the registration name and password assigned to him when he first visited the site. Usually this form is small and modest, so as not to scare off most users who are not interested in any personalization, but simply want to go further. After the user enters the login and password in the form, the CGI finds a file with data about this user, as if the name was sent with a cookie. Using this method, a user can register with a personalized website from anywhere in the world.

In addition to the tasks of taking into account user preferences and long-term storage of information about him, one can give a more subtle example of storing state, which is given by popular search engines. When you search using services such as AltaVista or Yahoo, you usually get significantly more results than can be displayed in an easy-to-read format. This problem is solved by showing a small number of results — usually 10 or 20 — and giving some sort of navigation to view the next group of results. While this behavior may seem common and expected to the average Web traveler, the actual implementation is non-trivial and requires statefulness.

When a user first queries a search engine, the search engine collects all the results, possibly limited to some predefined limit. The trick is to produce small numbers of these results at the same time, while remembering which user requested these results and what chunk he expects next. Leaving aside the complexity of the search engine itself, we are faced with the problem of sequentially providing the user with some information on one page. Consider Example 9-4, which shows a CGI script that prints out ten lines of a file and lets it see the next or previous ten lines.

Example 9-4. Saving State in a CGI Script

#! / usr / bin / perl -w

use CGI;

Open (F, "/ usr / dict / words") or die ("I can't open! $!");

# This file to be output can be anything.

$ output = new CGI;

sub print_range (# This is the main function of the program, my $ start = shift;

# The starting line of the file, my $ count = 0;

# Pointer, my $ line = "";

# Current line of the file, print $ output-> header,

$ output-> start_html ("My Dictionary");

# Creates HTML with title "My Dictionary", print " \ n ";

while (($ count< $start) and ($line = )) ($ count ++;)

# Skip all lines before the start line, while (($ count< $start+10) and ($line ? )) (print $ line; $ count ++;)

# Print the next 10 lines.

my $ newnext = $ start + 10; my $ newprev = $ start-10;

# Set initial lines for URLs "Next" and "Previous",

print "

";

unless ($ start == 0) (# Include the "Previous" URL if only you

# is no longer at the beginning.

print qq% Previous%; )

unless (eof) (# Include "Next" URL if only you # not at the end of the file.

print qq% Next%;

}

print "HTML;Html

exit (0); )

# If no data is available, start over,

if (not $ output-> param) (

& print_range (0); )

# Otherwise start from the line specified in the data.

& print_range ($ output-> param ("start"));

In this example, remembering the state is done using the simplest method. There is no problem with saving the data, since we keep it in a file on the server. We only need to know where to start the output, so the script simply includes the starting point for the next or previous group of lines in the URL — all that is needed to generate the next page.

However, if you need more than just flipping a "file, then relying on the URL can be cumbersome. This difficulty can be alleviated by using an HTML form and including state information in tags. type HIDDEN. This technique has been used successfully on many sites, allowing links to be made between related CGI programs or extending the use of a single CGI program, as in the previous example. Instead of linking to a specific object such as a start page, the URL data can point to an automatically generated user ID.

This is how AltaVista and other search engines work. On the first search, a user ID is generated and hidden in subsequent URLs. This ID is associated with one or more files containing the query results. Two more things are included in the URL: the current position in the results file and the direction in which you want to navigate further in it. These three values ​​are all that is needed to run the powerful navigation systems of large search engines.

However, there is still something missing. The file used in our example / usr / diet / words very large. What if we leave it in the middle of reading, but want to come back to it later? Without remembering the URL of the next page, there is no way to go back, not even AltaVista will allow it. If you restart your computer or start working from a different one, you cannot return to your previous search results without re-entering your query. However, this long-term statefulness is at the heart of the website personalization we discussed above, and it's worth looking at how you can take advantage of it. Example 9-5 is a modified version of Example 9-4.

Example 9-5. Stable memorization of the state

#! / usr / bin / perl -w

use CGI;

umask 0;

Open (F, "/ usr / dict / words") or die ("I can't open! $!");

Chdir ("users") or die ("I can't go to the $ directory!");

# This is the directory where all data will be stored.

# about the user.

Soutput = new CGI;

if (not $ output-> param) (

print $ output-> header,

$ output-> start_html ("My Dictionary");

print "HTML;


<р>Enter your username:


Html

exit (0); )

$ user = $ output-> param ("username");

## If there is no user file, create it and install

## initial value at "0"

if (not -e "$ user") (

open (U, "> $ user") or die ("I can't open! $!");

print U "0 \ n";

close U;

& print_range ("0");

## if the user exists and is not specified in the URL

## start value, read the last value and start from there.

) elsif (not $ output-> param ("start")) (

Open (U, "Suser") or die ("Can't open user! $!");

$ start = ; close U;

chomp $ starl;

uprint range ($ start);

## If the user exists and is not specified in the URL

## initial value, write initial value

## to the user file and start outputting.

) else (

Open (U, "> $ user") or die ("I can't open the user for writing! $!");

print U $ output-> param ("start"), "\ n";

close U;

& print_range ($ output-> param ("start 1));)

sub print_range (

my $ start = shift;

my $ count = 0;

my $ line = ""

print $ output-> header,

$ output-> start_html ("My Dictionary");

print "

\ n "; 

while (($ count< $start) and ($line = )) ($ count ++;)

while (($ count< $start+10) and ($line = ))

print $ line; $ count ++;

my $ newnext = $ start + 10;

my $ newprev = $ start-10;

print "

unless (Sstart == 0)

{

print

qq%

Previous%;

}

unless (eof) (print qq% Next%;

# Note that the username "username" is appended to the URL.

# Otherwise the CGI will forget which user it was dealing with.

}

print $ output-> end_html;

exit (0 ") ;

}

Security measures

When running servers on the Internet, whether they are HTTP or other servers, security is a major concern. The exchange of data between the client and the server, performed within the framework

The CGI raises a number of important data protection issues. The CGI protocol itself is fairly secure. The CGI program receives data from the server through standard input or environment variables, both of which are safe. But once the CGI program gets control of the data, its actions are unlimited. A poorly written CGI program could allow an attacker to gain access to the server system. Consider the following example CGI program:

#! / usr / bin / perl -w

use CGI;

my $ output = new CGI;

my $ username = $ output "param (" username ");

print $ output-> header, $ output-> start_html ("Finger Output"),

"

"," finger $ username ","
", $ output-> end_html;

This program provides a valid CGI interface to the command finger. If you run the program just like finger.cgi, it will list all current users on the server. If you run it like finger.cgi? username = fred, then it will print information about the user "fred" on the server. You can even run it like finger. oo.com to display information about a remote user. However, if you run it like finger.cgi? username = fred; unwanted things may happen. The backslash operator "" "" in Perl spawns a shell process and executes a command that returns a result. In this program " finger $ username * is used as an easy way to finger and get its output. However, most shells allow multiple commands to be concatenated on a single line. For example, any processor like the Bourne processor does this using the “; ". That's why"finger fred; mail hack will run the command first finger, and then the command mail which can send the entire server password file to an unwanted user.

One solution is to parse the form data to find malicious content. You can, say, search for the ";" and remove all following characters. It is possible to make such an attack impossible using alternative methods. The above CGI program can be rewritten like this:

#! / usr / local / bin / perl -w

use CGI;

my $ output = new CGI;

my $ username = $ output-> param ("username");

$|++;

# Disable buffering in order to send all data to the client,

print $ output-> header, $ putput-> start_html ("Finger Output"), "

\ n "; 

$ pid = open (C_OUT, "- |"); # This Perl idiom spawns a child process and opens

# channel between parent and child processes,

if ($ pid) (# This is the parent process.

print ; ft Print the output of the child process.

print "

", $ output-> end_html;

exit (O); ft End the program. )

elsif (defined $ pid) (# This is a child process.

$ | ++; # Disable buffering.

exec ("/ usr / bin / finger", $ username) or die ("exec () call failed.");

# Runs the finger program with Susername as the only one
# command line argument. ) else (die ("failed fork ()");)

# Error checking.

As you can see, this is not a much more complicated program. But if you run it like finger.cgi? username = fred; then the finger program will be executed with the argument fred; mail as one username.

As an added security measure, this script runs finger explicitly as / usr / bin / finger. In the unlikely event that the web server gives your CGI program an unusual PATH, simply running finger might cause the wrong program to execute. Another safety measure can be taken by examining the PATH environment variable and making sure it has an acceptable value. It is a good idea to remove the current working directory from PATH, unless you are sure that this is not the case when you really need to execute the program in it.

Another important security consideration is related to user rights. By default, the web server runs the CGI program as the user who started the server itself. This is usually a pseudo-user such as "nobody" with limited privileges, so the CGI program also has few privileges. This is usually a good thing, because if an attacker can gain access to the server through the CGI program, he will not be able to do much harm. The sample password stealing program shows what can be done, but the actual damage to the system is usually limited.

However, working as a limited user also limits the capabilities of CGI. If a CGI program needs to read or write files, it can only do so where it has permission. For example, in the second example of remembering state, a file is maintained for each user. The CGI program must have read / write permission on the directory containing these files, not to mention the files themselves. This can be done by creating the directory as the same user as the server, with read / write access for that user only. However, for a user like "nobody", only root has this capability. If you are not a superuser, then you will have to talk to your system administrator every time you change the CGI.

Another way is to make the directory free for reading and writing, effectively removing all protection from it. Since these files can only be accessed from the outside world through your program, the danger is not as great as it might seem. However, if a hole is found in the program, the remote user will have full access to all files, including the ability to destroy them. In addition, legitimate users working on the server will also be able to modify these files. If you are going to use this method, then all users of the server must be trustworthy. Also, use the open directory only for files that the CGI program needs; in other words, don't put unnecessary files at risk.

If this is your first foray into CGI programming, there are several ways to explore further. Dozens of books have been written on this subject, many of which do not imply any familiarity with programming. "CGI Programming on the World Wide Web" from simple scripts in multiple languages ​​to really amazing tricks and gimmicks from O "Reilly and Associates. Public information is also plentiful on the WWW. CGI Made Really Easy(Really simple about CGI) at http://www.jmarshall.com/easy/cgi/ .

CGI and databases

Since the beginning of the Internet era, databases have interacted with the development of the World Wide Web. In practice, many view the Web as just one giant multimedia database.

Search engines provide an everyday example of the benefits of databases. A search engine is not sent to roam the entire Internet looking for keywords at the moment you search for them. Instead, the site developers use other programs to create a giant index that serves as a database from which the search engine retrieves the records. Databases store information in a way that allows fast retrieval with random access.

Because of their mutability, databases give the Web even more power: they turn it into a potential interface for anything. For example, system administration can be performed remotely via a web interface instead of requiring an administrator to register on the desired system. Database connectivity to the Web is at the heart of a new level of interactivity on the Internet.

One of the reasons why databases are connected to the Web regularly makes itself felt: a significant portion of the world's information is already in databases. Databases that existed before the Web are called legacy databases (as opposed to the unconnected databases created recently, which should be called a "bad idea"). Many corporations (and even individuals) now face the challenge of providing access to these legacy databases over the Web. Unless your legacy database is MySQL or mSQL, this topic is outside the scope of this book.

As stated earlier, only your imagination can limit the possibilities for communication between databases and the Web. There are now thousands of unique and useful databases that are accessible from the Web. The types of databases operating outside of these applications are very different. Some of them use CGI programs to interface with a database server such as MySQL or mSQL. These types are of the greatest interest to us. Others use commercial applications to interact with popular desktop databases such as Microsoft Access and Claris FileMaker Pro. Others simply work with flat text files, which are the simplest databases possible.

With these three types of databases, you can develop useful websites of any size and complexity. One of our challenges over the next few chapters will be to apply the power of MySQL mSQL to the Web using CGI programming.

Thanks to the World Wide Web, almost anyone can provide information on the Internet in a form that is pleasing to the eye and suitable for wide dissemination. You've undoubtedly surfed the Internet and seen other sites, and by now you probably know that intimidating abbreviations like "HTTP" and "HTML" are just shorthand for "Web" and "the way of expressing information on the Internet." You may already have some experience of presenting information on the Internet.

The Internet has proven to be an ideal medium for the distribution of information, as seen in its immense popularity and widespread development. While some have questioned the usefulness of the Internet, and attribute its widespread development and popularity mainly to intrusive advertising, the Internet is undeniably an important vehicle for presenting all kinds of information. Not only are there many services for providing the latest information (news, weather, real-time sports events) and reference materials in electronic form, there is also a significant amount of data of another kind. The IRS, which circulated all of its 1995 tax return forms and other information via the World Wide Web, recently admitted to receiving letters from fans of its Web site. Who would have thought that the IRS would ever receive fan mail? This was not because his website was well designed, but because it turned out to be a truly useful tool for thousands, if not millions of people.

What makes the Web unique and such an attractive information service? First of all, in that it provides a hypermedia interface for data. Think of your computer's hard disk drive. Typically, data is expressed in a linear fashion, similar to a file system. For example, you have a number of folders, and inside each folder are either documents or other folders. The web uses a different paradigm to express information called a hypermedia. A hypertext interface consists of a document and links. Links are words that are clicked to see other documents or find other types of information. The web extends the concept of hypertext to include other types of media such as graphics, sounds, video (hence the name "hypermedia"). Highlighting text or graphics on a document allows you to see related information about the highlighted item in any number of forms.

Almost everyone can benefit from this simple and unique way of presenting and distributing information, from academics who want to immediately share data with their peers to business people who share information about their company with everyone. However, while it is extremely important to provide information, in the past few years many have felt that obtaining information is an equally important process.

While the Web provides a unique hypermedia interface for information, there are many other efficient ways to distribute data. For example, network services such as File Transfer Protocol (FTP) and the "Gopher" newsgroup existed long before the advent of the World Wide Web. Electronic mail has been the primary medium for communication and information exchange over the Internet and most other networks almost from the very beginning of these networks. Why has the Internet become such a popular way of distributing information? The multimedia aspect of the Internet has made a tangible contribution to its unprecedented success, but for the Internet to be most effective it must be interactive.

Without the ability to receive input from users and provide information, the Web would be a completely static environment. The information would only be available in the format specified by the author. This would undermine one of the computational possibilities in general: interactive information. For example, instead of forcing the user to view multiple documents as if he or she were viewing a book or dictionary, it would be better to let the user identify keywords on a topic of interest to them. Users can customize the presentation of the data rather than relying on a rigid structure defined by the content provider.

The term "web server" can be misleading because it can refer to both the physical machine and the software it uses to communicate with Internet browsers. When a browser requests a given Web address, it first connects to the machine over the Internet, sending a request for a document to the Web server software. This software runs continuously, waiting for such requests and responding accordingly.

Although servers can send and receive data, the server itself has limited functionality. For example, the most primitive server can only send the required file to the browser. The server usually does not know what to do with this or that additional input. If the ISP does not tell the server how to handle this additional information, the server will most likely ignore the input.

In order for the server to be able to perform other operations besides searching and sending files to the Internet browser, you need to know how to extend the functionality of the server. For example, a Web server cannot search a database based on a keyword entered by a user and return multiple matching documents unless such capability has been programmed for the server in some way.

What is CGI?

The Common Gateway Interface (CGI) is an interface to the server that allows you to extend the functionality of the server. Using CGI, you can work interactively with users who visit your site. On a theoretical level, CGI allows you to extend the server's ability to parse (interpret) browser input and return information based on user input. On a practical level, CGI is an interface that allows a programmer to write programs that easily communicate with a server.

Typically, to extend the capabilities of the server, you would have to modify the server yourself. This solution is not desirable because it requires an understanding of the lower layer of Internet Protocol network programming. It would also require editing and recompiling the server source or writing a custom server for each task. Let's say you want to extend the capabilities of your server so that it acts as a Web-to-e-mail (Web to email) gateway, pulling user-entered information from the browser and emailing it to another user. The server would have to insert code to parse the input from the browser, email it to another user, and send the response back to the browser over the network connection.

First, such a task requires access to the server code, which is not always possible.

Secondly, it is difficult and requires extensive technical knowledge.

Third, this is only applicable for a specific server. If you need to move your server to another platform, you will have to get up and running, or at least spend a lot of time porting your code to that platform.

Why CGI?

CGI offers a portable and simple solution to these problems. The CGI protocol defines a standard way for programs to communicate with a Web server. Without any special knowledge, you can write a program in any machine language that interfaces and communicates with the Web server. This program will work with all web servers that understand the CGI protocol.

CGI communication is done with standard input and output, which means that if you know how to print and read data using your programming language, you can write a Web server application. Apart from parsing input and output, programming CGI applications is almost equivalent to programming any other application. For example, to program the "Hello, World!" Program, you use your language's print functions and the format defined for CGI programs to print the appropriate message.

Choice of programming language

Since CGI is a generic interface, you are not limited to any particular machine language. An important question is often asked: What programming languages ​​can you use for CGI programming? You can use any language that allows you to:

  • Print to standard output
  • Read from stdin
  • Read from variable modes

Almost all programming languages ​​and many scripting languages ​​do these three things, and you can use any of them.

Languages ​​fall into one of the following two classes: translatable and interpreted. A translated language - for example, C or C ++ is usually smaller and faster, while interpreted languages ​​like Perl or Rexx sometimes require a large interpreter to be loaded upon startup. Additionally, you can distribute binaries (code translated into machine language) without source code, if your language is translatable. Distribution of interpreted scripts usually means distribution of source code.

Before choosing a language, you first need to consider your priorities. You need to appreciate the benefits of the speed and efficiency of one programming language with the ease of programming another. If you have a desire to learn another language, instead of using the one you already know, carefully weigh the advantages and disadvantages of both languages.

The two most commonly used languages ​​for CGI programming are C and Perl (both of which are covered in this book). Both have clear advantages and disadvantages. Perl is a very high-level language, and at the same time a powerful language, especially suitable for parsing text. While its ease of use, flexibility, and power make it an attractive language for CGI programming, its relatively large size and slower performance sometimes make it unsuitable for some applications. C programs are smaller, more efficient, and provide lower-level system control, but are more difficult to program, do not have lightweight text-processing routines built in, and are more difficult to debug.

What is the most suitable language for CGI programming? The one that you think is more convenient for yourself in terms of programming. Both are equally effective for programming CGI applications, and with proper libraries, both have similar capabilities. However, if you have a hard-to-reach server, you can use smaller compiled C programs. If you have to quickly write an application that requires a lot of word processing work, you can use Perl instead.

Caveats

There are some important alternatives to CGI applications. Many servers now include API programming, which makes it easier to program direct server extensions as opposed to standalone CGI applications. API servers are usually more efficient than CGI programs. Other servers include built-in functionality that can handle special non-CGI elements, such as database interfacing. Finally, some applications can be rendered by some new client-side (rather than server-side) technologies like Java. Will CGI quickly become obsolete in the face of such rapid changes in technology?

Unlikely. CGI has several advantages over newer technologies.

  • It is versatile and portable. You can write a CGI application using almost any programming language on any platform. Some of the alternatives, such as the server API, limit you to some languages ​​and are much more difficult to learn.
  • It is unlikely that client technologies such as Java will replace CGI because there are some applications for which server applications are much better suited to run.
  • Many of the limitations of CGI are HTML or HTTP limitations. As the standards of the Internet in general evolve, so do the capabilities of CGI.

Summary

A common gateway is a protocol by which programs interact with Web servers. The versatility of CGI gives programmers the ability to write gateway programs in almost any language, although there are many trade-offs associated with different languages. Without this ability, creating interactive Web pages would be difficult, at best, server modifications would be required, and interactivity would be inaccessible to most users who are not site administrators.

Chapter 2. Basics

Several years ago, I created a page for college at Harvard where I could submit my comments about them. At the time, the Internet was young and documentation was scarce. I, like many others, relied on concise documentation and a programming system created by others to learn CGI programming. Although this method of study required some research, a lot of experimentation, and a lot of questions, it was very effective. This chapter is the fruit of my early work with CGI (with a few clarifications, of course).

Although it takes some time to fully understand the general gateway interface and become proficient with it, the protocol itself is quite simple. Anyone who has some basic programming skills and is familiar with the Web can quickly learn how to program fairly complex CGI applications just as I and others learned it a few years ago.

The purpose of this chapter is to present the basics of CGI in a comprehensive, albeit concise, form. Each concept discussed here is detailed in subsequent chapters. However, after completing this chapter, you can start programming CGI applications right away. Once you reach this level, you can learn the intricacies of CGI, either by reading the rest of this book, or just experimenting on your own.

You can boil down CGI programming to two tasks: getting information from the web browser and sending the information back to the browser. This is fairly intuitive once you have mastered the usual use of CGI applications. Often the user is asked to fill out a form, for example, insert his name. As soon as the user fills out the form and presses Enter, this information is sent to the CGI program. The CGI program must then translate this information into what it understands, process it appropriately, and then send it back to the browser, be it a simple confirmation or a search result in a multipurpose database.

In other words, CGI programming requires understanding how to receive input from the Internet browser and how to send output back. What happens between the input and output stages of a CGI program depends on the purpose of the developer. You will find that the main difficulty in CGI programming lies in this intermediate stage; once you learn how to work with input and output, it is essentially enough to become a CGI developer.

In this chapter, you will learn the principles behind CGI input and output, as well as other basic skills required to write and use CGI, including such things as creating HTML forms and naming your CGI programs. This chapter covers the following topics:

  • Traditional program "Hello, World!";
  • CGI Output: Send information back for display in an Internet browser;
  • Configuring, installing, and running the application. You will learn about the various platforms and servers on the Web;
  • CGI Input: Interpreting information sent by the Web browser. Familiarization with some useful programming libraries for parsing such input;
  • A simple example: it covers all the lessons in this chapter;
  • Programming strategy.

Due to the nature of this chapter, I will only touch on a few topics. Do not worry; all of these topics are dealt with much deeper in other chapters.

Hello, World!

You start with a traditional introductory programming task. You will write a program that displays "Hello, World!" on your web browser. Before writing this program, you must understand what information the Web browser expects to receive from CGI programs. You also need to know how to execute this program in order to see it in action.

CGI is language independent, so you can implement this program in any language. Several different languages ​​are used here to demonstrate the independence of each language. In Perl, the "Hello, World!" shown in Listing 2.1.

Listing 2.1. Hello, World! in Perl. #! / usr / local / bin / perl # Hello.cgi - My first CGI program print "Content-Type: text / html \ n \ n"; print " \ n "; print" Hello, World!"; print"\ n "; print" \ n "; print"

Hello, World!

\ n "; print" \ n ";

Save this program as hello.cgi and install it in the appropriate location. (If you're not sure where it is, don't worry; you'll find out in the "Installing and Running a CGI Program" section later in this chapter.) For most servers, the directory you need is called cgi-bin. Now, invoke the program from your Web browser. For most, this means opening the following Uniform Resource Locator (URL):

http: //hostname/directoryname/hello.cgi

Hostname is the name of your Web server, and directoryname is the directory where you put hello.cgi (probably cgi-bin).

Splitting hello.cgi

There are a few things to note about hello.cgi.

First, you use simple print commands. CGI programs do not require any special file descriptors or output descriptors. To send the output to the browser, simply print to stdout.

Second, please note that the content of the first print statement (Content-Type: text / html) does not appear on your web browser. You can send any information you want back to the browser (HTML page, graphics or sound), but first, you need to tell the browser what kind of data you are sending it. This line tells the browser what kind of information to expect — in this case, an HTML page.

Third, the program is called hello.cgi. You don't always need to use the .cgi extension with the name of your CGI program. Although the source code for many languages ​​also uses the .cgi extension, it is not used to denote the type of language, but is a way for the server to identify a file as an executable file rather than a graphic file, HTML file, or text file. Servers are often configured to only try to execute those files that have this extension, displaying the contents of all others. While using the .cgi extension is optional, it is still considered good practice.

In general, hello.cgi has two main parts:

  • tells the browser what information to expect (Content-Type: text / html)
  • tells the browser what to display (Hello, World!)

Hello, World! in C

To illustrate the language independence of CGI programs, Listing 2.2 shows the equivalent of the hello.cgi program written in C.

Listing 2.2. Hello, World! in C. / * hello.cgi.c - Hello, World CGI * / #include int main () (printf ("Content-Type: text / html \ r \ n \ r \ n"); printf (" \ n "); printf (" Hello, World!\ n "); printf ("\ n "); printf (" \ n "); printf ("

Hello, World!

\ n "); printf (" \ n ");)

Note

Note that the Perl version of hello.cgi uses Content-Type print ": text / html \ n \ n"; Whereas the C version uses Printf ("Content-Type: text / html \ r \ n \ r \ n");

Why does Perl print a statement end with two newlines (\ n), while C printf ends with two carriage returns and a newline (\ r \ n)?

Formally, headers (all output before a blank line) are supposed to be separated by a carriage return and newline character. Unfortunately, on DOS and Windows machines, Perl translates \ r as a different newline rather than a carriage return.

While the \ rs exception in Perl is technically wrong, it will work in almost all protocols and will be carried across all platforms as well. Therefore, in all of the Perl examples in this book, I use newlines, separating headers, not carriage returns and newlines.

An appropriate solution to this problem is presented in Chapter 4, Conclusion.

Neither the web server nor the browser cares what language is used to write the program. While each language has advantages and disadvantages as a CGI programming language, it is best to use the language you are comfortable with. (The choice of programming language is discussed in more detail in Chapter 1, "Common Gateway Interface (CGI)").

CGI derivation

Now you can take a closer look at the issue of sending information to the Web browser. From the Hello, World! Example, you can see that web browsers expect two sets of data: a header that contains information such as what information to display (eg Content-Type: line) and actual information (what is displayed on a web browser). These two blocks of information are separated by a blank line.

The header is called the HTTP header. It provides important information about the information that the browser is about to receive. There are several different types of HTTP headers, and the most versatile is the one you used before: Content-Type: header. You can use different combinations of HTTP headers, separating them with carriage return and newline characters (\ r \ n). The blank line separating the header from the data also consists of a carriage return and a newline (why both are needed is briefly described in the preceding note and in detail in Chapter 4). You will learn about other HTTP headers in Chapter 4; you are currently working on the Content-Type: header.

Content-Type: The header describes the type of data that the CGI returns. The appropriate format for this header is:

Content-Type: subtype / type

Where subtype / type is the correct Multipurpose Internet Mail Extensions (MIME) type. The most common MIME type is HTML: text / html. Table 2.1 lists a few more common MIME types that will be discussed; a more complete listing and analysis of MIME types is provided in Chapter 4.

Note

MIME was originally invented to describe the content of the bodies of mail messages. It has become a fairly common way of representing Content-Type information. You can read more about MIME in RFC1521. RFCs on the Internet stand for "Requests for Comment", which are summaries of decisions made by groups on the Internet trying to set standards. The RFC1521 results can be viewed at the following URL: http://andrew2.andrew.cmu.edu/rfc/rfc1521.html

Table 2.1. Some common MIME types. MIME type Description Text / html Hypertext Markup Language (HTML) Text / plain Plain text files Image / gif Graphic files GIF Image / jpeg Compressed graphic files JPEG Audio / basic Sun audio files * .au Audio / x-wav Windows *. wav

After the heading and a blank line, you simply print the data in the form you need. If you are sending HTML, then print HTML tags and data to stdout after the header. You can also send graphics, sound, and other binaries by simply printing the contents of the file to stdout. Some examples of this are given in Chapter 4.

Installing and running a CGI program

This section deviates somewhat from CGI programming and talks about configuring your Web server to use CGI, installing and running programs. You will familiarize yourself with different servers for different platforms in more or less detail, but you will have to study your server documentation deeper in order to find the best option.

All servers require space for server files and space for HTML documents. In this book, the server area is called ServerRoot and the document area is called DocumentRoot. On UNIX machines, ServerRoot is usually in / usr / local / etc / httpd / and DocumentRoot is usually in / usr / local / etc / httpd / htdocs /. However, this does not make any difference to your system, so replace all references to ServerRoot and DocumentRoot with your own ServerRoot and DocumentRoot.

When you access files using your Web browser, you specify the file in the URL relative to the DocumentRoot. For example, if the address of your server is mymachine.org, then you refer to this file with the following URL: http://mymachine.org/index.html

Server configuration for CGI

Most Web servers are preconfigured to use CGI programs. Usually two parameters tell the server whether the file is a CGI application or not:

  • The designated directory. Some servers allow you to specify that all files in a designated directory (usually called cgi-bin by default) are CGI.
  • File name extensions. Many servers are preconfigured to define all files ending in .cgi as CGI.

The designated directory method is somewhat of a relic of the past (very early servers used it as the only method to determine which files were CGI programs), but it has several advantages.

  • It keeps CGI programs centralized, preventing other directories from cluttering up.
  • You are not limited to any particular filename extension, so you can name the files whatever you want. Some servers allow you to designate several different directories as CGI directories.
  • It also gives you more control over who can record CGI. For example, if you have a server and maintain a system with multiple users and do not want them to use their own CGI scripts without first revising the program for security reasons, you can designate only those files in a limited, centralized directory as CGI. Users will then need to provide you with a CGI program to install and you can first revise the code to make sure the program does not have any major security issues.

CGI notation via filename extension can be useful due to its flexibility. You are not limited to one single directory for CGI programs. Most servers can be configured to recognize CGI through filename extension, although not all are configured this way by default.

Warning

Remember the importance of security considerations when configuring your server for CGI. Some of the tips will be covered here, and Chapter 9, "Securing CGI", covers these aspects in more detail.

Installing CGI on UNIX Servers

Regardless of how your UNIX server is configured, there are several steps to take to ensure that your CGI applications run as expected. Your web server will usually run as a non-existent user (that is, the UNIX user nobody is an account that does not have permission to access the file and cannot be registered). CGI scripts (written in Perl, the Bourne shell, or another scripting language) must be world-wide, executable, and readable.

prompt

To make your files readable and executable worldwide, use the following UNIX command permissions: chmod 755 the filename.

If you are using a scripting language like Perl or Tcl, include the full path of your interpreter on the first line of your script. For example, a Perl script using perl in the / usr / local / bin directory must begin with the following line:

#! / usr / local / bin / perl

Warning

Never put an interpreter (perl, or the Tcl Wish binary in / cgi-bin. This poses a security risk on your system. See Chapter 9 for details).

Some generic UNIX servers

NCSA and Apache servers have similar configuration files because Apache was originally based on NCSA code. By default, they are configured so that any file in the cgi-bin directory (located by default in ServerRoot) is a CGI program. To change the location of the cgi-bin directory, you can edit the conf / srm.conf configuration file. The format for configuring this directory is

ScriptAlias ​​fakedirectoryname realdirectoryname

where fakedirectoryname is the pseudo-name of the directory (/ cgi-bin), and realdirectoryname is the full path where the CGI programs are actually stored. You can configure more than one ScriptAlias ​​by adding more ScriptAlias ​​lines.

The default configuration is sufficient for the needs of most users. You need to edit the line in the srm.conf file anyway to determine the correct realdirectoryname. If, for example, your CGI programs are located in / usr / local / etc / httpd / cgi-bin, the ScriptAlias ​​line in your srm.conf file should look like this:

ScriptAlias ​​/ cgi-bin / / usr / local / etc / httpd / cgi-bin /

The following URL is used to access or link to the CGI programs located in this directory:

Http: // hostname / cgi-bin / programname

Where hostname is the hostname of your Web server and programname is the name of your CGI.

For example, suppose you copied the hello.cgi program into your cgi-bin directory (for example, / usr / local / etc / httpd / cgi-bin) on your web server called www.company.com. To access your CGI use the following URL: http://www.company.com/cgi-bin/hello.cgi

If you want to configure your NCSA or Apache server to recognize any .cgi file as a CGI, you need to edit two configuration files. First, in your srm.conf file, leave the following line uncommented:

AddType application / x-httpd-cgi .cgi

This will bind the MIME type CGI to the .cgi extension. Now, you need to modify the access.conf file so that you can execute CGI in any directory. To do this, add the ExecCGI option to the Option line. It will look something like the following line:

Option Indexes FollowSymLinks ExecCGI

Now, any file with the .cgi extension is considered a CGI; access it as you would access any file on your server.

The CERN server is configured in the same way as the Apache and NCSA servers. Instead of ScriptAlias, the CERN server uses the Exec command. For example, in the httpd.conf file, you will see the following line:

Exec / cgi-bin / * / usr / local / etc / httpd / cgi-bin / *

Other UNIX servers can be configured in the same way; see the server documentation for more details.

Installing CGI on Windows

Most of the servers available for Windows 3.1, Windows 95, and Windows NT are configured with the filename extension method for CGI recognition. In general, changing the configuration of a Windows-based server simply requires running the server configuration program and making the appropriate changes.

Configuring a server to run a script (such as Perl) is sometimes tricky. In DOS or Windows, you cannot define an interpreter on the first line of a script, as you would with UNIX. Some servers are preconfigured to associate certain filename extensions with the interpreter. For example, many Windows Web servers assume that files ending in .pl are Perl scripts.

If the server does not perform this type of file association, you can define a wrapper batch file that invokes both an interpreter and a script. As with a UNIX server, do not install the interpreter in the cgi-bin directory or any Web accessible directory.

Installing CGI on Macintosh

The two most well-known server options for Macintosh are WebStar StarNine and its predecessor MacHTTP. Both recognize CGI by the filename extension.

MacHTTP understands two different extensions: .cgi and .acgi, which stands for Asynchronous CGI. Regular CGI programs installed on the Macintosh (with the .cgi extension) will keep the Web server busy until the CGI finishes, forcing the server to suspend all other requests. Asynchronous CGI, on the other hand, allows the server to accept requests even while it is running.

A Macintosh CGI developer using any of these Web servers should, whenever possible, simply use the .acgi extension rather than the .cgi extension. It should work with most CGI programs; if it doesn't work, rename the program to .cgi.

CGI execution

Once you have installed the CGI, there are several ways to accomplish it. If your output-only CGI program is like a Hello, World! Program, then you can execute it simply by accessing its URL.

Most programs run as a server-side application to an HTML form. Before you learn how to get information from these forms, first read a short introduction to creating such forms.

Quick tutorial on HTML forms

The two most important tags in HTML form are tags and ... You can create most HTML forms using just these two tags. In this chapter, you will explore these tags and a small subset of the possible types or attributes. ... For a complete tutorial and reference to HTML forms, see Chapter 3, HTML and Forms.

Tag

Tag is used to determine which part of the HTML file should be used for user-entered information. This refers to how most HTML pages call the CGI program. The tag attributes define the program name and location, either locally or as a full URL, the type of encoding used, and the method of moving data used by the program.

The next line shows the specifications for the tag :

< ACTION FORM = "url" METHOD = ENCTYPE = "..." >

The ENCTYPE attribute does not play a special role and is usually not included with the tag ... Details on the ENCTYPE tag are given in Chapter 3. One way to use ENCTYPEs is shown in Chapter 14, "Proprietary Extensions."

The ACTION attribute refers to the URL of the CGI program. After the user fills out the form and provides information, all information is encoded and transmitted to the CGI program. The CGI program itself solves the question of decoding and processing information; this is discussed in “Accepting Input from the Browser,” later in this chapter.

Finally, the METHOD attribute describes how the CGI program should receive input. These two methods - GET and POST - differ in how to pass information to the CGI program. Both are discussed in "Accepting Input from the Browser."

For the browser to be able to allow user input, all form tags and information must be surrounded by a tag ... Don't forget the final tag

to indicate the end of the form. You cannot have a form within a form, although you can set up a form that allows you to present pieces of information in different places; this aspect is discussed extensively in Chapter 3.

Tag

You can create text input strips, radio buttons, checkboxes, and other means of accepting input using the tag ... This section only covers text input fields. To implement this field, use the tag with the following attributes:

< INPUT TYPE=text NAME = "... " VALUE = "... " SIZE = MAXLENGTH = >

NAME is the symbolic name of the variable that contains the value entered by the user. If you include text in the VALUE attribute, that text will be placed as default in the text input field. The SIZE attribute allows you to define the horizontal length of the input field as it will appear in the browser window. Finally, MAXLENGTH defines the maximum number of characters that the user can enter into the field. Note that the VALUE, SIZE, MAXLENGTH attributes are optional.

Form submission

If you only have one text field within a form, the user can submit the form by simply typing information on the keyboard and pressing Enter. Otherwise, there must be some other way for the user to present the information. The user submits information using a button to submit with the following tag:

< Input type=submit >

This tag creates a Submit button inside your form. When the user finishes filling out the form, he or she can submit its content to the URL specified by the ACTION attribute of the form by clicking the Submit button.

Accepting browser input

Above were examples of writing a CGI program that sends information from the server to the browser. In reality, a CGI program that only outputs data does not have many applications (some examples are given in Chapter 4). The more important ability of CGI is to retrieve information from the browser — a feature that makes the Web interactive.

The CGI program receives two kinds of information from the browser.

  • First, it receives various pieces of information about the browser (its type, what it can view, the host's host, and so on), the server (its name and version, its execution port, and so on), and directly about the CGI program ( the name of the program and where it is located). The server gives all this information to the CGI program through environment variables.
  • Second, the CGI program can receive information entered by the user. This information, after being encoded by the browser, is sent either through an environment variable (GET method) or through standard input (stdin-POST method).

Environment Variables

It is useful to know which environment variables are available to the CGI program, both during training and for debugging. Table 2.2 lists some of the available CGI environment variables. You can also write a CGI program that outputs environment variables and their values ​​to the Web browser.

Table 2.2. Some Important CGI Environment Variables Environment Variable Purpose REMOTE_ADDR The IP address of the client machine. REMOTE_HOST Host host of the client machine. HTTP _ACCEPT Lists the MIME data types that the browser can interpret. HTTP _USER_AGENT Browser information (browser type, version number, operating system, etc.). REQUEST_METHOD GET or POST. CONTENT_LENGTH The size of the input if sent via POST. If there is no input, or if the GET method is used, this parameter is undefined. QUERY_STRING Contains the information to enter when passed using the GET method. PATH_INFO Allows the user to specify the path from the CGI command line (for example, http: // hostname / cgi-bin / programname / path). PATH_TRANSLATED Translates the relative path in PATH_INFO to the actual path on the system.

To write a CGI application that displays environment variables, you need to know how to do two things:

  • Define all environment variables and their corresponding values.
  • Display results for the browser.

You already know how to perform the last operation. In Perl, environment variables are stored in an associative array of% ENV, which is entered by the name of the environment variable. Listing 2.3 contains env.cgi, a Perl program that serves our purpose.

Listing 2.3. A Perl program, env.cgi, that prints out all of the CGI environment variables.

#! / usr / local / bin / perl print "Content-type: text / html \ n \ n"; print " \ n "; print" CGI Environment\ n "; print"\ n "; print" \ n "; print"

CGI Environment

\ n "; foreach $ env_var (keys% ENV) (print" $ env_var= $ ENV ($ env_var)
\ n ";) print" \ n ";

A similar program could be written in C; the complete code is in Listing 2.4.

Listing 2.4. Env.cgi.c to C. / * env.cgi.c * / #include extern char ** environ; int main () (char ** p = environ; printf ("Content-Type: text / html \ r \ n \ r \ n"); printf (" \ n "); printf (" CGI Environment\ n "); printf ("\ n "); printf (" \ n "); printf ("

CGI Environment

\ n "); while (* p! = NULL) printf ("% s
\ n ", * p ++); printf (" \ n ");)

GET or POST?

What's the difference between GET and POST methods? GET passes the encoded input string through the QUERY_STRING environment variable, while POST passes it through stdin. POST is the preferred method, especially for forms with a large amount of data, because there are no restrictions on the amount of information sent, and with the GET method, the amount of environment space is limited. GET has a certain useful property, however; this is covered in detail in Chapter 5, Input.

To determine which method is used, the CGI program examines the environment variable REQUEST_METHOD, which will be set to either GET or POST. If set to POST, the length of the encoded information is stored in the CONTENT_LENGTH environment variable.

Encoded Input

When the user submits the form, the browser encodes the information first before sending it to the server and then to the CGI application. When you use the tag , each field is assigned a symbolic name. The value entered by the user is represented as the value of a variable.

To determine this, the browser uses the URL encoding specification, which can be described as follows:

  • Separates various fields with an ampersand (&).
  • Separates name and values ​​with equal signs (=), with name on the left and value on the right.
  • Replaces spaces with plus signs (+).
  • Replaces all "abnormal" characters with a percent sign (%) followed by a two-digit hexadecimal character code.

Your final encoded string will look like the following:

Name1 = value1 & name2 = value2 & name3 = value3 ...

Note: The specifications for URL encoding are in RFC1738.

For example, suppose you had a form that asked for a name and age. The HTML code that was used to render this form is shown in Listing 2.5.

Listing 2.5. HTML code to display name and age form.

Name and Age

Enter your name:

Enter your age:

Suppose the user enters Joe Schmoe in the name field and 20 in the age field. The input will be encoded in the input string.

Name = Joe + Schmoe & age = 20

Parsing Input

For this information to be useful, you need to use the information for something that can be used by your CGI programs. The strategies for parsing input are covered in Chapter 5. In practice, you never have to think about how to parse input, because several experts have already written publicly available libraries that do the parsing. Two such libraries are presented in this chapter in the following sections: cgi -lib.pl for Perl (written by Steve Brenner) and cgihtml for C (written by me).

The common goal of most libraries written in different languages ​​is to parse the encoded string and put name and value pairs into a data structure. There is a clear advantage to using a language that has built-in data structures like Perl; however, most libraries for lower-level languages ​​like C and C ++ include data structure and subroutine execution.

It is not necessary to achieve a complete understanding of the libraries; it is much more important to learn how to use them as a tool to simplify the work of the CGI programmer.

Cgi -lib.pl

Cgi -lib.pl uses Perl associative arrays. The & ReadParse function parses the input string and enters each name / value pair by name. For example, the corresponding Perl strings needed to decode the input string name / age just presented would be

& ReadParse (* input);

Now, to see the value entered for "name", you can refer to the associative array $ input ("name"). Likewise, to access the "age" value, look at the variable $ input ("age").

Cgihtml

C does not have any built-in data structures, so cgihtml implements its own list of bindings for use with its CGI parsing routines. This defines the entrytype structure as follows:

Typedef struct (Char * name; Char * value;) Entrytype;

To parse the input string "name / age" in C using cgihtml, use the following:

/ * declare a linked list called input * / Llist input; / * parse input and place in linked list * / read_cgi_input (& input);

To access age information, you can either parse the list manually or use the existing cgi _val () function.

#include #include Char * age = malloc (sizeof (char) * strlen (cgi _val (input, "age")) + 1); Strcpy (age, cgi _val (input, "age"));

The value for "age" is now stored in the age line.

Note: Instead of using a simple array (like char age;), I am dynamically allocating memory space for the age string. While this complicates programming, it is nonetheless important from a security point of view. This is discussed in more detail in Chapter 9.

Simple CGI program

You are about to write a CGI program called nameage.cgi that processes the name / age form. Data processing (what I usually call "intermediate stuff") is minimal. Nameage.cgi just decodes the input and displays the username and age. While not particularly useful, such a tool demonstrates the most critical aspect of CGI programming: input and output.

You use the same form as described above by calling the name and age fields. Don't worry about robustness and efficiency yet; solve the existing problem in the simplest way. The Perl and C solutions are shown in Listings 2.6 and 2.7, respectively.

Listing 2.6. Nameage.cgi in Perl

#! / usr / local / bin / perl # nameage.cgi require "cgi-lib.pl" & ReadParse (* input); print "Content-Type: text / html \ r \ n \ r \ n"; print " \ n "; print" Name and Age\ n "; print"\ n "; print" \ n "; print" Hello, ". $ input (" name ").". You are \ n "; print $ input (" age ")." Years old.

\ n "; print" \ n ";

Listing 2.7. nameage.cgi in C

/ * nameage.cgi.c * / #include #include "cgi-lib.h" int main () (llist input; read_cgi_input (& input); printf ("Content-Type: text / html \ r \ n \ r \ n"); printf (" \ n "); printf (" Name and Age\ n "); printf ("\ n "); printf (" \ n "); printf (" Hello,% s. You are \ n ", cgi_val (input," name ")); printf ("% s years old.

\ n ", cgi_val (input," age ")); printf (" \ n ");)

Note that the two programs are nearly equivalent. They both contain parsing routines that take only one line and process all input (thanks to their respective library routines). The output is essentially a modified version of your main Hello, World! Program.

Try running the program by filling out the form and clicking the Submit button.

General programming strategy

You now know all the basic principles required for CGI programming. Once you understand how the CGI receives information and how it sends it back to the browser, the actual quality of your final product depends on your general programming ability. Namely, when you program CGI (or anything, for that matter), keep the following qualities in mind:

  • Simplicity
  • Efficiency
  • Versatility

The first two qualities are fairly common: try to make your code as readable and efficient as possible. Versatility applies more to CGI programs than other applications. When you start developing your own CGI programs, you will find that there are several basic applications that everyone wants to do. For example, one of the most common and obvious tasks of a CGI program is to process a form and email the results to a specific recipient. You could have several separate processed forms, each with a different recipient. Instead of writing a CGI program for each individual form, you can save time by writing a more general CGI program that works for all forms.

Having covered all the basic aspects of CGI, I have provided you with enough information to get started programming CGI. However, to become an effective CGI developer, you need to have a deeper understanding of how CGI communicates with the server and browser. The remainder of this book covers in detail the issues that have been mentioned in passing in this chapter, as well as the application development strategy, advantages, and limitations of the protocol.

Summary

This chapter has briefly covered the basics of CGI programming. You create output by formatting your data correctly and printing to stdout. Getting CGI input is somewhat more difficult because it must be parsed before it can be used. Fortunately, there are already several libraries out there that do the parsing.

By now, you should be fairly comfortable programming CGI applications. The remainder of this book is devoted to a more detailed specification, hints, and programming strategy for more advanced and complex applications.

Page 1 of 30

Today, such things as a guestbook, server search, and a form for sending messages are an essential attribute of almost any serious site. The problem of introducing these and other bells and whistles, of course, excites the imagination of a novice webmaster in every possible way, depriving him of sleep, appetite and craving for beer. Unfortunately, studying the HTML sources of the competitors' pages gives nothing but links to a certain "cgi-bin", and even in newsgroups you sometimes come across a mention of some cgi-scripts. This article is devoted to the basics of using these same cgi scripts for the glory and prosperity of your site.

To begin with, I think we need to understand the concepts. A CGI script is a program that runs on a Web server at the request of a client (that is, a Web site visitor). This program is fundamentally no different from the usual programs that are installed on your computer - be it MS Word or a Quake game. CGI is not a programming language in which the script is written, but the Common Gateway Interface - a special interface through which the script is launched and interacted with.

A short lyrical digression about CGI

So what is CGI- scripts and similar things in general. Let's start with the fact that your browser (when you typed Url) connects using the protocol HTTP with the specified server and asks him for the required file, something like this:

GET /~paaa/cgi-bin/guestbbok.cgi HTTP / 1.0-This is the most important thing in the request

Well, if a simple file is requested for example .html then if there is such a file, the server will send a response to the browser:

HTTP / 1.0 200 Okay
Content-Type: text / html

Further after an empty line (it is needed to separate heading from body) comes information from the very Url"a ...
That's basically the whole Www.... go from link to link ....
And what if you need to bring something into this dull process for real interactive, dynamic, beautiful and gorgeous ....? Well, there is an answer to this question. Just what if the requested Url specify a special program ( CGI,program Common Gateway Inteface - Common Gateway Interface) and the fact that this program will give you something and send it to the browser .... The server starts .cgi the program and it, for example, having processed the form data, enters you somewhere in its database, and it will tell you that you are great :)
Well I hope I intrigued you ......?

Brief information about what you need to know to write CGI scripts: Well, first of all, you need to know what is the Internet and how does it work (do you know? ;))) ) Well, a little bit of programming skills (this is the most important thing)
Let's write some simple script together and then I'll tell you where the dog rummaged around here ...
Well first in your home directory create a directory cgi-bin:

cd public_html
mkdir cgi-bin
chmod 0777 cgi-bin

The last line will be very important.
Take an editor and type: #! / usr / bin / perl
# first.cgi
print "Content-Type: text / html \ n \ n";
print " ";
print "

Hello you !!!

";
print "";

Save it in the directory cgi-bin under the name first.cgi How did you save it?
Now make it executable (it's a program):

chmod + x first.cgi

Well, we come to the solemn moment .... type in the browser line http://www.uic.nnov.ru/~your_login/cgi-bin/first.cgi
and see what will happen. There will be one of two things, or the script will work and you will see the page generated by it (congratulations, it has arrived in our regiment!) Or Internal Server Error-then do not be discouraged, you did something wrong. The flea catching guide will come in handy then. Well, first of all, the syntax check can be done as follows:

perl -c first.cgi

Perl will immediately give you either error messages (well, it happens, the semicolon was missed, the brackets or quotes were forgotten to close ...) this can be fixed right along the way.
Logically grosser is to skip the output of the empty line that separates the header from the body:
print "Content-Type: text / html \ n \ n"; # All Right
print "Content-Type: text / html \ n"; #ERROR!!!

Let's analyze the script:
First line #! / usr / bin / perl Simply indicates where Perl is located on the system. The second is just a comment - you can poke anything after the sign #
Then comes print "Content-Type: text / html \ n \ n"; This is a header indicating the type of content. Everything that the script prints to its standard STDOUT output goes for processing to the server. An empty line separates the title from the body, which in our case is

Hello you !!!



The server will process the response of the script and, based on it, will form and send the response to the browser. (The server usually does not change the message body, it only supplements the header with the fields required for the HTTP protocol)

Well, the basics have already been mastered, everything is not as difficult and depressing as it might seem at the first time
Now you can practice yourself in writing such simple scripts to get your hands on it.

Owners of online stores are familiar with the concept of "e-commerce" firsthand, they certainly know the answer to the question "e-commerce - what is it". But if you look at the essence, then many nuances emerge and this term acquires a broader meaning.

E-commerce: what is it?

The general concept is as follows: e-commerce means a certain approach to doing business, which involves the inclusion of a number of operations that use digital data transmission in the provision of goods or the provision of services / works, including using the Internet.

Thus, this is any commercial transaction that is carried out using an electronic means of communication.

The scheme of work is arranged as follows:

  • anyone can be a blogger or any other owner of their own website) registers in this system;
  • gets its own link;
  • places a special code on its web page - an advertisement of the selected official partner of the e-Commerce Partners Network appears;
  • monitors the conversion of the site;
  • earns a certain percentage for every purchase made by a visitor to his site who clicked on an affiliate link.

WP e-Commerce

A large number of people are now passionate about e-commerce, primarily because of the desire to create their own website, a unique online store for selling their own products. To meet this growing demand, the developers have concentrated on creating an e-commerce template. What it is, we will consider further.

One such template example is WordPress e-commerce. It is a shopping cart plugin for WordPress (one of the most famous web resource management systems), intended primarily for creating and organizing blogs). It is provided completely free of charge and allows site visitors to make purchases on the Internet page.

In other words, this plugin allows you to create an online store (based on WordPress). This e-commerce plugin has all the necessary tools, settings and options to meet modern needs.