Overview of the Internet

The Internet is a collection of computers connected by network cables or through satellite links. Rather than connecting every computer on the Internet with every other computer, individual computers in an organization are normally connected in a local area network (LAN). One node on this local area network is physically connected to the Internet. So the Internet is a network of networks . There are millions of computing devices that are connected to this network either permanently or for a short duration. These devices run network applications that communicate through copper or fiber optic cables, radio or satellite transmission. The communication is governed by protocols established by an international body - Internet Engineering Task Force. .

The Internet connectivity is provided by Internet Service Providers (ISP). These corporations dedicate computers to act as servers - that is they make information (such as Web pages or e-mail) available to users of the Internet.

One can look upon the structure of the Internet as having an edge and a core. At the edge are host systems that run application or client programs or provide a service through server programs. At the core of the Internet is a network of routers.

Data Transfer on the Internet

There are two ways in which data can be transferred over communication links - circuit switching or packet switching. In circuit switching there is a dedicated circuit for each communication. A good example of circuit switching is the telephone network. In packet switching data is sent over the network in small chunks that are sent to routers that in turn are send to other routers until (hopefully) all the chunks arrive at their destination.

The advantage of circuit switching is guaranteed performance. There is a dedicated link between users that is not shared. The disadvantage is that this link is idle if not being used. In packet switching different users share the same network resources. Network resources are used as needed. However, if there are many packets in the network then there is a problem of congestion when packets are placed in a queue waiting for a link to free up. Packets do get dropped which makes for lossy transmission of information.

There are several sources of packet delay:

The total nodal delay is the sum of all the delay times: dnodal = dproc + dqueue + dtrans + dprop.

Layers of the Internet

Physical Layer: This is the layer that actually transmits the bits of information through copper cable, fiber optic cable, satellites, and wireless transmitters (electromagnetic radiation).

Network Interface Layer: This is implemented in software and provides the protocols to interpret the physical bit stream. Each network card has a 6-byte hardware (MAC) address. Data moves in the network in packets. Each packet has a header and data. The protocols of this layer coordinates transmission so that fast computers cannot overwhelm slower computers. It deals with simultaneous transfers. This layer is concerned with how you communicate with your local network, whether it is Ethernet or token ring. If your computer is connected to a local ISP via telephone, the link uses Point-to-Point Protocol.

Internetwork Layer: The protocol governing this layer is Internet Protocol (IP). Each computer has a four byte address. There is a move to IPv6 that uses six byte addresses. Each packet header has the IP address of the sender and the destination.

This layer is concerned with sending the packets from the sender to the destination. Routers are are used on the way. Routers are computers having IP routing software. Each router has a buffer to store packets on a temporary basis. Routers can assemble information about routes on the network that can be used to perform route selection when forwarding packets.

The IP does not guarantee delivery. Each packet has Time To Live (TTL) in its header. TTL is an integer that specifies the maximum number of jumps that a packet can make before it is discarded. A packet can travel through various networks to reach its final destination.

Transport Layer: The protocol governing the transport layer is Transmission Control Protocol (TCP). This layer controls the transmission of a file (say) between two hosts. At the client end TCP sends a request to a server for a connection. If the server responds affirmatively a socket connection is established.

The TCP segments the file into packets, stamps each packet with a sequence number and a checksum. The checksum is the count of the bits in the packet. The packets are then sent to IP for internetwork routing. As the IP receives the packets they are sent to the TCP for checking. The client then sends an acknowledgment to the server as to the result.

Application Layer: Some examples:

Basic HTTP

Hypertext Transfer Protocol (HTTP) is a protocol for transferring files or other data (called resources) through TCP/IP sockets. A browser is an HTTP client for HTTP servers that listen on port 80.

An HTTP client opens a socket and sends a request to the HTTP server, the server then sends the message and closes the connection. No connection information is maintained between transactions. The request line has three parts, separated by spaces: a method name, the path to the requested resource, and the version of the HTTP:

  GET  /path/to/file/index.html  HTTP/1.0
The initial response line has three parts: the HTTP version, response status code, and the English description of this code:
  HTTP/1.0  200  OK

or

  HTTP/1.0  404  Not Found

A complete response would like as follows:
  HTTP/1.1  200  OK
  Date: Wed, 26 Jan 2005 16:23:15 GMT
  Content-Type: text/html
  Content-Length: 1354

  <html>
  <body>
  ...
  ...
  </body>
  </html>

The GET transaction also allows a query string to be sent to the web server. The query string is placed into one of the Web server's environment variables. Here is an example of some of the environment variables on the server side.

The POST request is used to send data to the server that has to be processed in some way, like a CGI script. The most common use of POST is to submit HTML form data to CGI scripts. The block of data is sent in the message body. There are extra headers to describe the data like Content-Type and Content-Length. Here is an example of how the POST method is used. The data is sent to a script post.cgi which just returns the data. Note that the QUERY_STRING environment variable is empty.

World Wide Web

In 1989, Tim Berners-Lee at CERN proposed a protocol to exchange documents with colleagues around the world. The idea was that users could search for and retrieve any document on the Internet. The form of the documents was hypertext. This meant any given document could have links to other documents on the Internet. In a strict sense, the World Wide Web was this interconnected system of documents.

Web Browsers

A browser is a special software program also known as a client that requests servers for a specific web document and renders it on the computer terminal for the user. One of the oldest browsers is Lynx. This is a text only browser. The earliest graphical browser was Mosaic developed at the National Center for Supercomputer Applications (NCSA) at the University of Illinois. The developers of Mosaic latter formed a company that produced Netscape. Microsoft has its own browser called the Internet Explorer that comes with the Windows operating system. There are other browsers available - Mozilla, Opera, Safari. The browsers communicate with the servers using the standard Hypertext Transfer Protocol (HTTP).

Web Servers

A web server is a software program that provides documents to browsers. Apache is the most widely used web server with 68% of the market share. Second is Microsoft's Internet Information Server (IIS) with about 21% of the market share. And the remainder is spread over a large number of other servers. A web browser initiates a request with a server by sending it the URL of a document. The server searches, retreives and sends the document.

URL

Uniform (or universal) resource locators are used to identify documents or resources on the Internet. URLs have the same general format:

scheme:object-address

The scheme is the communcation protocol. These protocols include - http, ftp, telnet, mailto. Different schemes use different object addresses. HTTP uses an object address of the following form:

//fully-qualified-domain-name/path-to-document

Domain Names

Every computer on the Internet has a unique IP address. An IP address is of the form: octet.octet.octet.octet. An octet is a sequence of 8 binary digits. In decimal the value of an octet would range from 0 through 255. An IP address has two parts. The first part identifies the network and the second part identifies the host computer.

It is difficult to remember the numerical values of an IP address. In 1983 the concept of Domain names was introduced. The Domain Name System (DNS) maps text names to IP addresses. Top level domain names are - .org, .edu, .gov, .net, .com. With every top level domain there are millions of second level domains. Each name at the second level domain must be unique. A DNS server gets requests to translate a domain name into an IP address. A DNS server can either return the IP address if it is in its database, query another DNS server if it does not have knowledge of that domain, or return the result that that domain does not exist.

When a browser contacts a DNS server to resolve a domain name that DNS server may not have the IP address. It may redirect that query to a root DNS server. The root DNS server will send the IP address of the DNS server that should be able to resolve that request. There are multiple DNS servers at each level. So that if one fails there other servers that can handle the requests.

DNS and URL

Domain Name Service is an application layer service provided by the to actual IP address. You can own your own domain name and port your web site (the actual files) from one web server to another.