This project has received funding from the European Community's Seventh Framework Program (FP7/2007-2013) under grant agreement nº265178

Quality metadata status

All the clearinghouse metadata has been automatically extracted and analyzed in terms of their quality information, it has then been imported into a database. Afterwards, data was analyzed and results summarized. The total number of GEOSS clearinghouse metadata records harvested has been 97203.

In ISO19115 data quality information relevant for the project are the quality indicator (DQ_Element), the lineage (LI_Lineage) and the usage information (MD_Usage). Quality indicators are conceptually classified into completeness, logical consistency, positional accuracy, thematic accuracy and temporal accuracy. Each quality indicator can have one or more measure methods and result can be expressed by a numerical value, as well as a conformance declaration with a methodology.

The overall number of metadata records with quality indicators is 19107 (only a 19.66%, a figure far away from the ideal situation mentioned above). These metadata records contain 52187 quality indicators which represents a mean of 2.7 quality indicators per document.

The results show that, among the generic indicators, positional accuracy and completeness are the most quality indicators used, having a quite similar importance (36.65% and 35.72%, respectively) and reaching a 72.37% of the total. When facing specific indicators, absolute external positional accuracy is clearly the most used one (34.04%). Nevertheless, one important thing to highlight is the diversity of quality indicators used: there is a representation of all the 5 generic indicators. A more detailed analysis reveals that there are 25944 measures (sometimes a quality indicator can be expressed in more than one measure) which can be classified in numerical measures (22275-85.86%); in conformance declaration to a methodology (mainly to INSPIRE) flag (3669-14.14%); and in a coverage grid, as specified in the ISO 19115-2 extension [6] (5- 0.02%) (see figure 1).

Referring to lineage, there are 3771 (3.88%) metadata records containing a direct list of the data sources, 9261 (9.53%) metadata records containing a direct list of the processes, and 1226 (1.26%) metadata records that links data sources to each process (complete provenance). The usage information provides a way for producers to describe basic information about specific applications for which the dataset has been or is being used (sometimes known because users have reported back to them). There is 1133 records (1.17%) containing usage information, but only the mandatory specificUsage and userContactInfo elements were described and all the records were provided by the same institution, a very poor scenario.


