Basic concepts of Web

Web applications follow the schema pattern of the client/server (C/S). Where the client is called the Web client, the server is called the Web server. A typical Web application is usually a three-tier edud architecture, i.e. the data layer, the logical layer, and the presentation layer, as shown in Fig. 3 . In many cases, a Web client is a Web browser. Therefore, the web application architecture is also known as the browser/server (B/S) architecture.

The basic workflow for web apps is as follows:

  • Users using a Web client (often a Web browser) to make a request to a Web server by entering the URL (Uniform Resource Locationor; Unified Resource Locator) address of a network resource on the address bar, or by clicking on a URL link on a Web page;

  • After the Web server receives the request, search the file or script stored on the server and return the results of the file or execution script to the Web client;

  • The Web client receives the result of the response and renders and displays the returned results.

../../_images/web-layers.png

Fig. 3 Three layer architecture of basic web application

HTTP, URL, and HTML are the three cornerstones of Web technology. These protocols were proposed by Tim Bemers-Lee and have now become international standards. They are managed and maintained by the W3C (World Wide Web Consortium), which are briefly described below.

HTTP

Most web applications involve two or more components, so a set of specifications should be customized for each component. As a protocol, HTIP defines a set of specifications that Web servers and clients should follow when making requests and responses. For example, the HTTP request and response information includes a message header and a message body. According to the HTTP protocol, the web server knows which information to put in the message header and which information to put in the message body. The web client also knows that it should go to the message header and the message. What information is read in the body.

HTTP is an object-oriented protocol that belongs to the application layer. It transfers data (HTML files, image files, query results, etc.) based on the TCP / IP communication protocol. Due to its simple and fast way, it is suitable for distributed hypermedia information systems. It was proposed in 1990. After years of use and development, it has been continuously improved and expanded.

The main features of the HTTP protocol include:

  • Simple: Type a URL or click a link to execute.

  • No memory state: After the server responds to the client, the connection between them will be immediately cancelled, and the server will not retain the client-related information to reduce the load on the server.

  • Flexible: rich content types that can be transmitted, including parameters, pictures, PDF, audio and video.

HTTP status code

HTTP Status Code (English: HTTP Status Code) is a three-digit code used to indicate the response status of the web server’s Hypertext Transfer Protocol. It is defined by the RFC 2616 specification and is extended by specifications such as RFC 2518, RFC 2817, RFC 2295, RFC 2774, and RFC 4918. The first digit of all status codes represents one of the five states of the response. The message phrase shown is typical, but can provide any readable alternative. Unless otherwise stated, status codes are part of the HTTP / 1.1 standard (RFC 7231).

Common HTTP status codes, such as Table 1:

Table 1 List of status codes

Status code

Features

200 OK

Client request succeeded

400 Bad Request

The client request has a syntax error and cannot be understood by the server

401 Unauthorized

The request is unauthorized. This status code must be used with the WWW-Authenticate header field.

403 Forbidden

The server received the request but refused to provide service

404 Not Found

The requested resource does not exist, eg: the wrong URL was entered and the correct address was not queried

500 Internal Server Error

Unexpected server error

503 Server Unavailable

The server is currently unable to process client requests and may return to normal after a period of time

HTTP defines eight request sq variedity, namely GET, POST, HEAD, PUT, DELETE, TRACE’OPTIONS, and CONNECT, of which GET and POST are the most commonly used. The HTTP header contains cache control information, content type, and status code, and so on.

Take the request http://webgis.cn web site, for example:

Request header:

GET / HTTP/1.1
Host: webgis.cn

Response header:

HTTP/1.1 200 OK
Date: Wed, 06 Nov 2019 00:57:18 GMT
Server: Apache/2.4.25 (Debian)
Last-Modified: Sun, 03 Nov 2019 09:14:41 GMT
ETag: "4bdd-5966da115de5d-gzip"
Accept-Ranges: bytes
Vary: Accept-Encoding
Content-Encoding: gzip
Content-Length: 6631
Keep-Alive: timeout=5, max=100
Connection: Keep-Alive
Content-Type: text/html

The response header contains server information. GET and POST are the most commonly used HTTP request methods.

The data requested when GET is submitted will be appended to the URL (that is, the data is placed in the HTTP protocol header). The URL is divided by ``? ‘’ And the data is transmitted. Multiple parameters are connected with `` & ``; if the data is in English The letter / number is sent as it is. If it is a space, it is converted to `` + ‘’. If it is Chinese or other characters, the string is directly encoded in BASE64 mode.

In a GET request, the browser will limit the length of the URL. The following are the URL length limits of several common browsers: (unit: byte)

Table 2 Browser URL limit

Browser

Length limit: (unit: bytes)

IE

2803

Firefox

65536

Chrome

8182

Safari

80000

Opera

190000

POST submission puts the submitted data in the body of an HTTP packet. The data submitted by GET will be displayed in the address bar, and the address bar of POST submission will not change, so the submission method of POST is more secure than GET. Since POST submission does not pass values through URLs, theoretically the data is not limited. But the amount of data that POST can actually pass depends on the server’s settings and memory size.

HTTPS (Secure Hypertext Transfer Protocol) is a secure version of HTTP built on top of the secure socket layer. In a normal HTTP connection, data transmitted between the server and the client can be intercepted; HTTPS uses encryption to prevent data eavesdropping, and is usually used to transmit sensitive data, such as user personal information, login passwords, and credit card information. Websites using the HTTPS protocol need to install a certificate file issued by a certification authority on the server.

URL

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource. The full name is UniformResourceLocator, or Uniform Resource Locator in Chinese. It is the address used to identify a resource on the Internet. In simple terms, a URL is a web address, commonly known as a “web address.” Each web page has a globally unique URL identifier. Just as street addresses are used to locate home addresses in real life, web addresses are used to locate hundreds of millions of web pages on the World Wide Web. Without a URL, a Web client cannot find resources on a Web server; without a URL, resources on the Internet cannot connect to each other to form the World Wide Web.

HTTP supports two request methods: GET and POST. Traditional MapServer requests, or requests based on the OGC standard both define HTTP GETs used to invoke operations. The online resource URL used for HTTP GET requests is actually just a URL prefix. In order to establish a valid operation request, additional parameters have been added afterwards. The URL prefix is defined as an opaque string, which includes the protocol, host name, port number (optional), path, and a question mark ``? ‘’. It can also include one or more parameters for a specific server and End with `` & ‘’.

The basic format of a URL is, such as Table 3

Protocol://hostname[port]/filepathname?query_string
Table 3 What the fields of the URL represent

URL fields

significance

Protocol

Refers to the transmission protocol between the client and the server. HTTP, HTTPS, FTP, and MMS are commonly used protocols.

hostname (host name) or IP address

Refers to the server that holds the resources.

Port (port number)

When omitted, the system will use the default port. For example, the default port for HTTP is 80; the default port for HTTPS is 443.

path (file path)

Represents the directory and file name where resources are stored on the Web server (these directory and file names can be virtual, that is, not real file paths).

query_string (query string)

Optional, used to send HTTP request parameters to the web server.

The URL specification preserves certain characters and gives them the necessary meaning, such as Table 4:

Table 4 Specific characters

character

significance

?

The delimiter at the beginning of the query.

&

The delimiter between query statement parameters.

`` = ‘’

The separator between the parameter name and parameter value.

/

The delimiter between MIME type subtypes in the format parameter value.

:

The separator between the namespace and the identifier in the SRS parameter value.

,

Delimiters for single values in list-type parameters (such as BBOX, LAYERS, and STYLES in GETMAP requests).

HTML

HTML is called Hypertext Marked Language. The full name of English is Hypertext Marked Language, which is an identifying language. It includes a series of tags. Through these tags, you can unify the format of documents on the network, so that scattered Internet resources are connected as a logical whole. HTML text is descriptive text composed of HTML commands. HTML commands can describe text, graphics, animations, sounds, tables, links, etc.

HTML is the main language used to create web pages, and most web page source code currently uses this format. Similar to Word documents, HTML also contains content, layout, and formatting information. When a web page is loaded into a web browser, the web browser interprets the HTML code and displays the content of the web page in the format specified therein. As a markup language, HTML is a plain text file identified by a set of tags, such as `` head``, `` body``, `` table``, `` center``, and `` fonts``, etc. label. In addition, the appearance and layout information of HTML can be defined by CSS (cascading style sheets). CSS can be included directly in the HTML file or stored in a separate text file and referenced by the HTML.

To this day, the HTML standard will gradually be replaced by HTML5. HTML5 is a set of technologies including HTML, CSS, and JavaScript. It hopes to enable rich Internet applications without relying on plug-ins such as Adobe Flash and Microsoft Silverlight. HTML5 adds many new syntax features, including ``

The production of Hypertext Markup Language documents is not very complicated, but it is powerful and supports file embedding in different data formats. Its main features are as follows:

  • Simplicity: Super text markup language version upgrade adopts superset method, which is more flexible and convenient.

  • Extensibility: The extensive application of Hypertext Markup Language has brought about enhanced functions and increased requirements for identifiers. Hypertext Markup Language uses subclass elements to ensure system expansion.

  • Platform independence: Hypertext Markup Language can be used on a wide range of platforms

  • Versatility: In addition, HTML is the universal language of the web, a simple, general-purpose full markup language. It allows web page makers to create complex pages that combine text and pictures, and these pages can be viewed by anyone on the Internet, regardless of the type of computer or browser used.

Open the URL http://webgis.cn in Firefox, right-click in the blank space of the page, and click “View Source” in the pop-up context menu to see the HTML content behind the web page.