# Bit world

## Bit world

Geographic information systems represent the natural world in the digital world. The relationship between the real world and the mathematical model is shown in Figure 2-15.

In computer, the real world is expressed and recorded in various of symbolic forms. When the computer operates on the symbols of numbers and symbols, it expresses them in binary form (bit world). Therefore, the computer-based geographic information system can not directly act on the real world, it must go through the step of describing the real world data. The model is a simplified expression of the real world. It is a concise reflection of the various elements of the system through appropriate screening and describing with certain rules of expression.

The relationship between the real world and mathematical models

A map is a symbolic model, because it is a simplified description of the real world through cartographer processing; computer files storing digital maps are also a symbolic model, which represents graphic symbols in digital code. The generation of a digital map needs not only to select the objects to be represented, but also to consider how to organize the data to express them. If the rules of data organization are not well established, a digital map is useless to others besides the individuals or organizations that produce the data.

Data is a digital symbol record of the real world situation. Information is reorganized to reveal the inherent mechanism of the real world and is conducive to research data. If the data is not organized by spatial attribute tables, it is difficult to extract spatial information from spatial data. Because of the digitalization characteristics of computers, data items must be discrete in order to facilitate digital processing and operation, therefore, geographic space must also be discrete expression. Data modeling refers to the process of organizing real-world data into useful data sets that can reflect real information. The logical organization of data based on certain schemes is called data model. The data modeling process is divided into three steps: first, a data model is selected to organize the real world data; then, a data structure is selected to express the data model; finally, a file format suitable for recording the data structure is selected. It can be seen that a spatial data modeling may have several optional data models, and each data structure may have a variety of file formats for storage. Spatial data can be organized by different data models according to their acquisition methods, storage methods and use targets. For example, vector model and raster model are the most commonly used data organization methods in GIS. In the vector model, the world is represented by points, lines and surfaces, and in the grid model, the world is represented by spatial units or pixels.

## The Role of Models

In GIS, models, especially mathematical models, play an important role. Because the model is an abstraction or simulation of the laws or processes on which practical problems are solved in the objective world, it can effectively help people to find out the causal relationship or relationship between various factors, which is conducive to the solution of problems. The establishment of model is a mathematical or technical problem, but it must be based on extensive and in-depth professional research. The depth of professional research determines the quality and effect of the model, and the quality and quantity of the model determines the efficiency and depth of data use in the system. The development and application of a large number of models, in fact, centralize and validate the experience and knowledge of many experts in the application field, which undoubtedly becomes the basis for the development of general geographic information systems to expert systems.

## GIS Spatial Data Modeling

GIS is an information system specially used for collecting, storing, managing, analyzing and expressing spatial data. It is not only a tool for representing, simulating the real space world and processing and analyzing spatial data, but also a science and technology about spatial information processing and analysis. As far as the tool characteristics of GIS are concerned, it provides a series of spatial operation and analysis functions for people to express and analyze the real space world in digital form, including comprehensive storage and management of all kinds of spatial data needed by people to study and solve space problems; query relevant spatial distribution information according to user’s requirements, carry out various statistical calculations and tabular mapping; and according to planning. The need of management and production, the comprehensive study of multi-factors, the simulation and optimization of decision-making schemes, etc. Therefore, on the one hand, GIS should provide users with the means of spatial data modeling and analysis operation for digital representation and analysis of spatial phenomena or problems, on the other hand, it should provide users with a friendly user interface for spatial data modeling, query and analysis.

The basic task of spatial data modeling is to describe the spatial data organization of GIS and design the spatial database model of GIS, which includes defining spatial entities and their relationships, determining data entities or objects and their relationships, designing physical organization, storage path and database structure in computer. This work is guided by the theory of spatial data model. Spatial data model is a concept about spatial entities and their relationships in the real world. It provides a basic method for describing spatial data organization and designing spatial database patterns.

Generally speaking, the spatial data model of GIS is composed of conceptual data model, logical data model and physical data model. Conceptual data model is an abstract set of concepts about the relationship between entities. Logical data model expresses data entities (or records) and their relationships in conceptual data model. Physical data model describes the physical organization, storage path and database structure of data in computer. The relationships among them are shown in Figure 2-16.

Three levels of spatial data model

## Concept and Classification of GIS Spatial Data Model

### GIS Spatial Conceptual Data Model

Because of the differences in occupations and specialties, people have different concerns, research objects and expected results, so the description and abstraction of the real world are different, forming different user views, called external models. The conceptual model of GIS spatial data model is to describe, synthesize and integrate user views in a unified language, taking into account the common needs of users. At present, the widely used data model is vector data model based on plane graph and raster data model based on continuous paving.

### Spatial Logical Data Model

Logical data model is the content of spatial database information (spatial entities and their relationships) determined according to the conceptual data model mentioned above. It specifically expresses the relationship between data items, records and so on. Therefore, it can be implemented in several different ways. Generally speaking, spatial logical data model can be divided into two categories: structured model and operation-oriented model.

1） Structured Logical Data Model

Structured model is a tree structure that explicitly expresses the relationship between data entities. The hierarchical data model organizes data records according to tree structure to reflect the membership or hierarchical relationship between data. Network data model is a generalized form of hierarchical data model, which is the combination of several hierarchical structures. Its advantage is that it can reflect the most common many-to-many relationship in the real world, but its disadvantage is complexity. Generally speaking, structured models can directly reflect the relationship between spatial entities in the real world.

2） Operation-oriented logical data model

Relational data model is an operation-oriented logical data model, which uses two-dimensional tables to express the relationship between data entities and extracts or queries the relationship between data entities by relational operations. It has the advantage of flexibility and simplicity, but it is more difficult than other data models to express complex relationships. When data constitute multi-layer connections, the efficiency of storage space utilization is low. One of the current trends is to combine the advantages of the two to form new or improved logical data models, such as extended network models.

### Physical Data Model

Logical data model does not involve the lowest physical implementation details, but the computer processes binary data. It must be transformed into physical data model, that is, to design the physical organization of spatial data, spatial access methods, the overall storage structure of the database and so on.

1）Physical representation and organization

The physical representation methods of the hierarchical logical data model mainly include physical adjacency method, table structure method and catalogue method. The physical representation methods of network data model include variable length pointer table, bitmap method, catalog method and so on. The physical representation of the relational data model is performed using relational tables. Physical organization mainly considers how to store data in the optimal form on external storage, usually considering operational efficiency, response time, space utilization and total cost.

2）Spatial Data Access

The “storage” of database refers to writing a piece from memory to external memory, and “fetching” refers to writing a paragraph from external memory to internal memory. Common access methods are：

The first is file structure method, which includes sequential structure (such as binary search, interpolation search), table structure (linear table, inverted table) and random structure.

Second, index file: it is the basic method to improve the efficiency of data access. The insertion and deletion of index only involve the index record itself, while the operation of data record depends on the specific data organization strategy. If the index itself is very large, it is necessary to re-index the index files and establish multi-level indexes, such as B-tree, B+tree, etc. B-tree is an index based on primary keyword. To index according to sub-keyword, inverted index table must be established. However, if this sub-keyword-based search is the main operation, this kind of index is not suitable.

Third, the point index structure: Because B-tree is not suitable for searching based on sub-keywords, spatial location data and its attributes are regarded as points in multi-dimensional space, and multi-dimensional point index structures such as raster index, KD tree, quadtree, R tree are used for indexing. At present, spatial access methods and query optimization are still an important subject in GIS research.