Metadata of spatial data

Metadata can be translated into metadata and is data that describes the data. In geospatial data, metadata is background information that describes the content, quality, condition, and other related characteristics of the data. Metadata is not a new concept. In fact, traditional library cards, copyright descriptions of published books, and disk labels are all metadata. The metadata of the paper map is mainly represented by the map type and the map legend, including the map name, the spatial reference system and the map coordinates, the map content description, the scale and precision, the preparation of the publishing unit and the date or update date, and the sales information. In this form, the metadata is readable, and it is easy for the producer and the user to communicate, and the user can easily determine whether the book or map can meet the needs of its application.

With the development of computer technology and GIS technology, especially the development of network communication technology, spatial data sharing is becoming more and more popular. The complexity of managing and accessing large data sets is becoming a prominent issue for data producers and users. Data producers need effective data management and maintenance methods; users need to find faster, more comprehensive, and effective ways to discover, access, acquire, and use geospatial data that is current, accurate, manageable, and accessible. In this case, metadata information such as content, quality, and status of spatial data becomes more important, and becomes an important means for effective management and application of information resources. Geographic information metadata standards and operational tools have become an important part of the national spatial data infrastructure [2]_.

In GIS applications, the main role of metadata can be summarized as follows:

  1. Help data production units to effectively manage and maintain spatial data, establish data files, and ensure that even when their main staff members leave, they will not lose their knowledge of the data;

  2. Provide information about data storage, data classification, data content, data quality, data exchange network and data sales of data production units, so as to facilitate users to search and retrieve geospatial data;

  3. Help users to understand the data, so as to make a correct judgment on whether the data can meet their needs;

  4. Provide relevant information for the user to process and convert useful data.

It can be seen that metadata is one of the important conditions for data to play a full role, which can be used in many aspects, including data document establishment, data publishing, data browsing, data conversion and so on. Metadata plays an important role in promoting data management, use and sharing.

The concept and type of metadata

The concept of metadata

Metadata is descriptive data information about data, which should reflect the characteristics of data sets as much as possible, so that users can develop and utilize data sets accurately, efficiently and adequately, the contents of metadata in different fields of databases will vary greatly. The metadata can be used to retrieve and access the database, and the system resources of the computer can be effectively utilized, and the data can be processed and redeveloped.

So far, the common point of the scientific community about metadata recognition is that the purpose of metadata is to promote the efficient use of data sets and to serve computer-aided software engineering (CASE). The contents of the metadata include:

  1. Description of data set, description of data items, data sources, data owners and data sequence (data production history) in data set;

  2. Description of data quality, such as data accuracy, logical consistency, data integrity, resolution, scale of metadata, etc.

  3. Explanation of data processing information, such as dimension conversion, etc.

  4. Description of data conversion methods;

  5. Description of database updating and integration.

Types of metadata

The purpose of the classification study of metadata is to fully understand and better use metadata. The classification principle is different, and the classification system and content of metadata will be very different.

1) Classification according to the content of metadata

Since the metadata content required by data of different natures and different fields is different, and the metadata content of the database constructed for different application purposes is greatly different, the metadata is divided into three types:

(1.1) Scientific metadata: Its main objective is to help users obtain data from various sources and related information, which includes not only traditional, book-managed metadata such as data source name, author, subject content, but also data topology. The task of this type of metadata is to help researchers efficiently access the data they need.

(1.2) Evaluative metadata: It mainly serves the evaluation of data utilization, including data initial collection, instruments used for data collection, methods and basis for data acquisition, data processing process and algorithm, data quality control, sampling methods, data accuracy, data credibility, data potential application areas, etc.

(1.3) Model metadata: Metadata used to describe data model is similar in structure to metadata used to describe data. Its content includes model name, model type, modeling process, model parameters, boundary conditions, author, reference model description, modeling software, model output, etc.

2) Classification by metadata description objects

(1.1) Data layer metadata: Metadata describing each data in the data set, including date postmark, location postmark, dimension, annotation, error identification, abbreviation identification, problem identification, data processing, etc.

(1.2) Attribute metadata: Metadata about attribute data, including data dictionary, data processing rules (protocols) for expressing data and its meaning, such as sampling instructions, data transmission lines and algebraic coding, etc.

(1.3) Entity metadata: Metadata describing the whole data set, including the principle of regional sampling of data set, the validity period of database, the time span of data, etc.

3) Classification based on the role of metadata in the system

(1.1) System-level metadata: Refers to information used to implement file system characteristics or manage data in file system, such as access time, data size, current location in storage level, how to store data blocks to ensure quality of service control, etc.

(1.2) Application layer metadata: Refers to information related to data users that helps users find, evaluate, access and manage data, such as summary information of text file contents, graphic snapshots, descriptions and other data files. information. It is often used for high-level data management, where users can quickly get the right data.

4) Classification based on the role of metadata

(4.1) Description Metadata: Metadata for the use of data services for users. It is generally expressed in natural language, such as the spatial coverage of source data, the projection mode and scale of source data graph, data set description file, etc., this kind of metadata is mostly descriptive information, focusing on the description of database.

(4.2) Control metadata: Metadata used for computer operation flow control, this type of metadata is implemented by certain keywords and specific syntax. Its contents include data storage and retrieval files, retrieval and target matching methods, target retrieval and display, analysis of query result arrangement display, modification of the original internal order in the database, data conversion method, spatial data and attribute data integration according to user requirements, according to the index item, the data is drawn into a graph, the construction and utilization of the data model, and the like. This type of metadata is primarily a method related to database operations.

Concepts used in spatial data metadata:

Geospatial Data: Information used to determine the geographic location, attributes, and boundaries of geographic entities with natural or artificial architectural characteristics;

Type: In metadata standards, a data type refers to the type of value that the data can receive.

Object: Digital representation of part or whole of a geographical entity;

Entity Type: Definition and description of sets of geographic entities with similar geographic characteristics;

Point: A zero-dimensional geographic object for location determination;

Node: One-dimensional object that topologically connects two or more chains or rings;

Label Point: A reference point for feature identification when displaying a chart after a map;

Line: A general term for a one-dimensional object;

Line Segment: Line segment between two points.

String: A sequence consisting of a series of interconnected segments without branches, which can be tangent to itself or to other lines.

Arc: An arc curve consisting of a set of points determined by mathematical expressions.

Link: The topological relationship between two nodes;

Chain: Directional unbranched sequence consisting of non-tangent segments or arcs distinguished by nodes.

Ring: Closed distangent chain or arc sequence;

Ploygon: An area enclosed by a closed arc in a two-dimensional plane.

Universe Polygon: The outermost polygon in the data coverage area whose area is the sum of the areas of other polygons;

Interior Area: An area excluding its boundaries;

Grid: A set of checkerboard-like mosaic surfaces with regular or approximate rules, or a set of checkerboard-like mosaic points with regular or approximate rules.

Grid Cell: A two-dimensional object representing the smallest element of a grid.

Vector: Combination of directional lines;

Raster: One or more overlays of the same grid or digital image;

Pixel: Two-dimensional graphic element, which is the smallest element of mathematical image;

Raster Object: One or more images or grids. Each image or grid represents a data layer. The corresponding grid cells or cells between the layers are consistent and register with each other.

Graph: A set of objects with topologically related zero dimension (such as node), one dimension (link or chain) and two dimension (T polygon) consistent with predefined restriction rules;

Layer: An integrated area-based distributed spatial data set, which is used to represent an entity in an entity, or a union of spatial objects with a common attribute or attribute value.

Stratum: Data layer, level or gradient sequence in an ordered system;

Latitude: Measured on the central longitude, the distance away from the equator in angular units;

Longitude: The angular distance from the meridian plane to the central meridian plane of Greenwich.

Meridian: A large circle of the earth passing through the poles of the earth.

Ordinate: Coordinate values measured in Cartesian coordinates parallel to the X and Y axes;

Projection: A mathematical transformation method used to transform the spatial features (sets) of the Earth’s spherical coordinates into the plane coordinate system.

Projection Parameters: Reference features used to control projection errors and actual distribution of deformation when projecting data sets;

Map: The spatial representation of spatial phenomena, usually expressed in plane graphics.

Phenomenon: facts, events, states, etc.

Resolution: The minimum difference between two independent measurements or calculated values that can be distinguished by the measuring tools or analytical methods involved or used;

Quality: The basic or unique nature of data that meets certain usage requirements;

Explicit: A method of directly describing horizontal and three-dimensional positions by one logarithm or three numbers, respectively.

Media: A physical device for recording, storing, or transferring data.

Standards for spatial data metadata

Compared with the data structure types used in physics and chemistry, spatial data is a relatively complex data type, which involves the description of spatial features, as well as descriptions of attribute features and their relationships, so the establishment of spatial data metadata standards is a complex task; and for various reasons, the spatial data metadata standards developed by some data organizations or data users are difficult to be widely accepted by the academic community. However, the establishment of spatial data metadata standards is the premise and guarantee of spatial data standardization, only the establishment of standardized spatial data metadata can effectively utilize spatial data. At present, some regional or sectoral standards have been formed for spatial data metadata.

Application of spatial data metadata

Help users get data

Through metadata, users can browse, search and research spatial databases. A complete geo-database should provide not only spatial data and attribute data, but also abundant guidance information, as well as analysis, summary and index obtained from pure data. Through this information, users can understand a series of questions, such as “What data are these?” “Is this database useful?” and so on.

Spatial data quality control

There are data precision problems in both statistical data and spatial data, there are two main reasons for the accuracy of spatial data: one is the accuracy of the source data; the other is the control of the precision quality in the data processing and processing engineering. The spatial data quality control content includes: (1) an accurately defined data dictionary to explain the composition of the data, the name of each part, the content of the representation, etc.; (2) Ensure that the data is logically scientifically integrated, such as the combination of different sub-categories in the vegetation database into large categories, which requires effective combination of data in a certain logical relationship; (3) There is enough information to explain the source of the data, the processing and processing of the data, and the interpretation of the data. These requirements can be achieved through metadata, which is often acquired by geoscientists and computer scientists.The expression of data logic in the data should be designed by geoscientists, the coding of spatial database requires a certain foundation of geology, the control of data quality and the improvement of the staff should have the background knowledge of data input, data error detection and data processing, data reproduction should be achieved by people with better computer foundations. All the metadata in this aspect are integrated into the database to form the metadata information system of the database according to a certain organizational structure to achieve the above functions.

Application of data integration

Metadata at the dataset level records information such as data format, spatial coordinate system, data representation, and data type; metadata at the system level and application level records information such as hardware and software environment, data usage specifications, and data standards. This information is necessary in a series of data integration processes, such as data space matching, attribute consistency processing, and conversion of data between platforms. This information enables the system to effectively control the flow of data in the system.

Reasons for using metadata in GIS

The use of metadata in geographic information systems is conducive to the management and sharing of spatial data, which is conducive to the realization of some specific functions, for the development of GIS software, the efficiency and quality of development can be improved.

Performance reasons

1) Completeness

One of the goals of object-oriented geographic information systems and spatial databases is to represent the data of things in the form of classes, which also include the class itself, i.e. the complex class structure. This requires a mechanism to support the verification and operation between classes, and metadata can help to implement this mechanism.

2) Extensibility

It is useful to deliberately extend the semantics of a computer language or database feature, such as adding the generated results of tracking or engine information to the operation request, which can be achieved by dynamically changing metadata information.

3) Specialization

Inheritance mechanism is realized by dynamically connecting operation requests and operators, the language and database transmit the operation request to the operator in a context related to the structure and semantic information, and the information can be expressed by metadata.

4) Safety

Both well-classified languages and databases support dynamic type detection, class information is expressed as metadata, which can be accessed by class detectors when the system runs.

Functional reasons

1) Debugging

The use of metadata information in error detection helps to detect the interpretation and modification status of runnable applications.

2) Browsing

When developing browsers for control classes of data, in order to display data, it is necessary to be able to interpret the structure of data, which is expressed by metadata.

3) Program Generation

If access to metadata is allowed, information about structure can be used to automatically generate programs, such as optimization of database queries and generation of remote procedure call residues (or stubs).

Acquisition and management of spatial data metadata

Acquisition of spatial data metadata

The acquisition of spatial data metadata is a complicated process. Compared with the time of formation of basic data, its acquisition can be divided into three stages: before data collection, during data collection and after data collection. For model metadata, the three stages are before model formation, after model formation and after model formation.

The metadata of the first stage is designed according to the contents of the database to be built, which includes: General Metadata and specific metadata; the metadata of the second stage is generated synchronously with the formation of data; and the metadata of the third stage is generated after the above-mentioned data is collected, according to the needs, including the description of data processing process, data utilization and data quality. Assessment, browsing file formation, topological relationship, index volume and index of image data, data set size, data storage path, etc.

There are five main methods to acquire spatial data metadata: keyboard input, Association table, measurement, calculation and reasoning. Keyboard input is usually heavy workload and error-prone; association table method is to obtain relevant data from existing metadata or data through common items (fields); measurement method is easy to use and less error-prone, such as measuring the location of data space points with GPS; calculation method refers to metadata calculated from other metadata or data, such as horizontal position can be set by instrument and so on. Time calculation; inference method refers to the acquisition of metadata according to the characteristics of data. At different stages of metadata acquisition, the methods used are also different. In the first stage, the main methods are typing method and correlation table method; in the second stage, the main sampling method; and in the third stage, the main methods are calculation and reference method.

Management of spatial data metadata

The theory and method of spatial data metadata involve database and metadata. Because of the difference of content and form of metadata, the management of metadata is related to the field of data. It is realized by metadata information system based on different data fields. In the metadata management information system, the physical layer stores data and metadata, which is associated with the logical layer by some software through certain logical relations. Many concepts, such as entity names, aliases and so on, are defined in the conceptual level by descriptive languages and models. Through these concepts and their restrictive features, metadata and data of physical layer can be acquired and updated by association with logical layer.

Metadata storage and functional implementation

Metadata system is used for database management, which can avoid duplicate storage of data. Logical data index established by metadata can efficiently query and retrieve any physically stored data in distributed database. Reduce the time for data users to query the database and obtain data, thus reducing the cost of the database. The construction and management cost of database is the reflection of the overall performance of database. The rational allocation of expenditure on database design and utilization of system resources can be realized by metadata. Many functions of database (such as database retrieval, data conversion, data analysis, etc.) are realized by the development of system resources. Therefore, the development and utilization of such metadata will be greatly enhanced. Enhance the function of database and reduce the cost of database construction.

With the deepening understanding of the importance of digital geographic information, metadata standardization has gradually become a hotspot of sharing geographic information. To study metadata system, we must first have a correct analysis of the theoretical basis of metadata. In fact, metadata standard relies on the theory of information sharing standard. It is interdisciplinary with many disciplines in natural science. It involves almost all aspects of mathematics, physics and chemistry, and depends on the development of modern science and technology. Computers are its basic platform and networks are its communication basis. Without mathematical models and comprehensive understanding of various disciplines, it is impossible to study the earth’s mechanism by remote sensing and other technologies. Therefore, from the macro point of view, geographic information standardization involves many fields, and its theory seems to be numerous; but from the micro point of view, the sharing system theory of digital geographic information research mainly includes the theory of geographic information model establishment and representation, spatial reference system theory, quality system theory and computer communication technology, etc. They are numbers. The basis of data sharing system. Of course, other theories that can promote geographic information sharing will also become a powerful pillar of the metadata system based on digital earth.

Quality system of geographic information

Quality evaluation process

The quality of geospatial data is a very important consideration factor for both data producers and users. It enables data producers to correctly describe the degree to which their data sets conform to production specifications, and it is also the basis for users to decide whether the data sets conform to their application purposes. Therefore, exploring the theoretical issues of data quality has become an important part of geospatial data standardization. In the metadata standard, quality information mainly appears in the metadata parts such as identification information, data quality information and data inheritance relationship. The main elements involved are the completeness of data sets, logical consistency, location accuracy, time accuracy, topic accuracy, and so on. Each element has its own sub-elements. Because users need different levels of data quality, some users need high-precision information, while others have lower levels of accuracy to meet their needs, so there are different evaluation criteria for the quality of data sets. However, as an evaluation of geometric accuracy in quality, the geometric accuracy of data sets can be obtained through certain calculation formulas and corresponding accuracy indicators.

Basis of multi-scale evaluation

In the process of quality assessment, generally speaking, the higher the accuracy or accuracy of data, the better, but in practical application, it can not be generalized regardless of the object. In fact, some data are of great significance in application (such as geodetic control points) and their own accuracy can also reach a high level, so the accuracy requirements of these data are very high; while other data itself can not be very accurate, such as the area of different soil types, because the boundaries between them are blurred, so the area is relative, so the accuracy requirements are not. It can be very high; some data accuracy can be very high, but it requires a lot of manpower, material resources and time, and production or application does not need to be very high. Therefore, the quality of data should be evaluated according to specific needs in practical application. The earth is a complex system, many objects have the characteristics of uncertainty or fuzziness. Some objects do not have clear boundaries themselves, they are transitional gradually, it is difficult to determine their boundary line in the process of quantitative change to qualitative change; some objects have clear definitions, but difficult to operate; some data are dynamic, even instantaneous. Through the above analysis, we should dialectically analyze the accuracy problem in the description of geographic information, not only to pursue high data accuracy, but also to avoid “redundant” accuracy, so as to avoid the waste of accuracy.

Effectiveness and uniqueness of Data

Some of the Earth system data have obvious aging (time) characteristics, while others respond slowly to time. For example, the timeliness of land use maps varies significantly with time; relatively speaking, geological maps and topographic maps have no obvious timeliness.

In the timeliness of graphics, generally data with dynamic features, their time validity is shorter, and vice versa. But from the perspective of studying historical changes or development processes, data at any time is useful, therefore, different time-sensitive data sets will determine their importance according to their different roles, and these factors should be reflected in the metadata system.

In addition, the Earth system data has derivable data and non-derived data, and derived data should be avoided in the data set description. For example, in meteorological and hydrological data, daily rainfall is basic data, or non-derived data, and monthly average rainfall is derived. Therefore, in the metadata description, it should be limited to the underlying data, and should not include derivable data. In this way, special elements are needed to describe these characteristics of the data set, and different calculation formulas and the like need to be elaborated.

Data accuracy testing and reporting

For users and data producers, the quality of data they care about is somewhat related. Data set producers must satisfy the mapping specifications of the data set produced, while users determine whether the data set meets their application requirements based on the quality information of the data set. Therefore, the data set information provided by data set producers should be the information concerned by users. Therefore, the accuracy test method and its results should be included in the data set report.