Web Access

The CS 240 Utilities include several classes that can be used to download web documents.

The URLConnection class is the main interface for accessing web documents. It has one static method named Open that takes the URL of the document to be acccessed, and returns an InputStream object that can be used to read the contents of the document. The URL can be either an HTTP URL or a file URL. Examples are "http://www.cnn.com/index.html" and "file:/public_html/index.html".

The InputStream object returned by URLConnection::Open provides methods for reading the data stored in the web document, and to close the stream when the program has finished reading the document's data. The following code fragment shows how to read a web document and print its contents to standard output.

    #include "URLConnection.h"
    
    InputStream * doc = URLConnection::Open("http://www.cnn.com/index.html");

    while (!doc->IsDone()) {
       char c = doc->Read();
       cout << c;
    }

    doc->Close();
    delete doc;
InputStream is an abstract interface for reading data from a stream. There are two classes that implement the InputStream interface, FileInputStream and HTTPInputStream. If the URL passed to URLConnection::Open is a file URL, the InputStream object returned is a FileInputStream object. If the URL is an HTTP URL, the InputStream object returned is an HTTPInputStream object. You need not be concerned directly with the FileInputStream and HTTPInputStream classes. URLConnection::Open takes care of deciding which kind of object to create. Your code need not even know what kind of URL is being processed, file or HTTP. It just needs to make sure that the InputStream returned by URLConnection::Open is eventually closed and deleted.

There are many kinds of error conditions that can occur when a program accesses documents on the web. The methods on the web access classes throw exceptions when errors occur. Examples of exceptions that might be thrown are: InvalidURLException, FileException, NetworkException, and IllegalStateException. Your code must be prepared to handle these exceptions when they occur, or your program will terminate abnormally.

The classes that provide the web access functions are described below.


class InputStream

#include "InputStream.h"

InputStream is an abstract interface for reading the stream of bytes that are stored in the document being accessed. This abstract interface is implemented by the FileInputStream and HTTPInputStream classes.

Methods

virtual bool IsOpen()

This method returns true if the stream is open, and false if it is closed.

virtual bool IsDone()

This method returns true if the end of the stream has been reached, and false if there are still more bytes to be read.

virtual char Read()

This method returns the next byte of data from the document.

If the stream is closed, an IllegalStateException is thrown.

If the last byte has already been read from the stream, an IllegalStateException is thrown.

Depending on the type of the InputStream, this method could throw any of the following exceptions: FileException, NetworkException, IllegalStateException.

virtual void Close()

This method closes the stream if it is not already closed. All system resources used by the stream are released.


class URLConnection

#include "URLConnection.h"

The URLConnection class is used to open an InputStream that can be used to read the contents of a web document.

Methods

static InputStream * Open(const string & url)

This method takes the URL of the document to be downloaded as its only parameter, and returns an open InputStream object that can be used to read the contents of the document. The file must be either a file URL or an HTTP URL. The caller must call the Close method on the returned InputStream when they are finished reading data from the document, and then delete the object.

Any of the following exceptions may be thrown by this method: InvalidURLException, FileException, NetworkException, IllegalStateException.


Ken Rodham