|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
MetadataFor the page on metadata in Wikipedia, see Wikipedia:Metadata.
Metadata is loosely defined as data about data. Metadata is traditionally found in the card catalogues of libraries and is today commonly used to describe three aspects of digital documents and data: 1) definition, 2) structure and 3) administration. By describing the contents and context of data files, the quality of the original data/files is greatly increased. For example, a webpage may include metadata specifying what language it's written in, what tools were used to create it, and where to go for more on the subject, allowing browsers to automatically improve the experience of users.
[edit] DefinitionMetadata is defined as data providing information about one or more other pieces of data, such as:
For example, a digital image may include metadata that describes how large the picture is, the color depth, the image resolution, when the image was created, and other data. A text document's metadata may contain information about how long the document is, who the author is, when the document was written, and a short summary of the document. Metadata is data. As such, metadata can be stored and managed in a database, often called a registry or repository. However, it is impossible to identify metadata just by looking at it because a user would not know when data is metadata or just data.[1] [edit] LibrariesMetadata has been used in various forms as a means of cataloging archived information. The Dewey Decimal System employed by libraries for the classification of library materials is an early example of metadata usage. Library catalogues used small 3x5 inch cards to display a book's title, author, subject matter, and a brief plot synopsis along with an abbreviated alpha-numeric identification system which indicated the physical location of the book within the library's shelves. Such data helps classify, aggregate, identify, and locate a particular book. Another form of older metadata collection is the use by US Census Bureau of what is known as the "Long Form." The Long Form asks questions that are used to create demographic data to create patterns and to find patterns of distribution. [2] The term was coined in 1968 by Philip Bagley, one of the pioneers of computerized document retrieval.[3][4] Since then the fields of information management, information science, information technology, librarianship and GIS have widely adopted the term. In these fields the word metadata is defined as "data about data".[5] While this is the generally accepted definition, various disciplines have adopted their own more specific explanation and uses of the term. For the purposes of this article, an "object" refers to any of the following:
[edit] PhotographsMetadata may be written into a digital photo file that will identify who owns it, copyright & contact information, what camera created the file, along with exposure information and descriptive information such as keywords about the photo, making the file searchable on the computer and/or the Internet. Some metadata is written by the camera and some is input by the photographer and/or software after downloading to a computer. Photographic Metadata Standards are governed by organizations that develop the following standards. They include, but are not limited to:
[edit] VideoMetadata is particularly useful in video, where information about its contents (such as transcripts of conversations and text descriptions of its scenes) are not directly understandable by a computer, but where efficient search is desirable. [edit] Web pagesWeb pages often include metadata in the form of meta tags. Description and keywords meta tags are commonly used to describe the Web page's content. Most search engines use this data when adding pages to their search index. [edit] Creation of metadataMetadata can be created either by automated information processing or by manual work. Elementary metadata captured by computers can include information about when a file was created, who created it, when it was last updated, file size and file extension. [edit] Metadata structuresMetadata is typically structured according to a standardised concept using a well defined metadata scheme, including: metadata standards and metadata models. Tools such as controlled vocabularies, taxonomies, thesauri, data dictionaries and metadata registries can be used to apply further standardisation to the metadata. [edit] Metadata syntaxMetadata syntax refers to the rules created to structure the fields or elements of metadata.[6] A single metadata scheme may be expressed in a number of different markup or programming languages, each of which requires a different syntax. For example, Dublin Core may be expressed in plain text, HTML, XML and RDF.[7] [edit] Metadata typesThe metadata application is manifold covering a large variety of fields of application there are nothing but specialised and well accepted models to specify types of metadata. Bretheron & Singley (1994) distinguish between two distinct classes: structural/control metadata and guide metadata.[8] Structural metadata is used to describe the structure of computer systems such as tables, columns and indexes. Guide metadata is used to help humans find specific items and is usually expressed as a set of keywords in a natural language. According to Ralph Kimball metadata can be divided into 2 similar categories'Technical metadata and Business metadata. Technical metadata correspond to internal metadata, business metadata to external metadata. Kimball adds a third category named Process metadata. On the other hand, NISO distinguishes between three types of metadata: descriptive, structural and administrative. [5] Descriptive metadata is the information used to search and locate an object such as title, author, subjects, keywords, publisher; structural metadata gives a description of how the components of the object are organised; and administrative metadata refers to the technical information including file type. Two sub-types of administrative metadata are rights management metadata and preservation metadata. [edit] Hierarchical, linear and planar schemataMetadata schemas can be hierarchical in nature where relationships exist between metadata elements and elements are nested so that parent-child relationships exist between the elements. An example of a hierarchical metadata schema is the IEEE LOM schema where metadata elements may belong to a parent metadata element. Metadata schemas can also be one dimensional, or linear, where each element is completely discrete from other elements and classified according to one dimension only. An example of a linear metadata schema is Dublin Core schema which is one dimensional. Metadata schemas are often two dimensional, or planar, where each element is completely discrete from other elements but classified according to two orthogonal dimensions.[9] [edit] Metadata hypermappingIn all cases where the metadata schemata exceed the planar depiction, some type of hypermapping is required to enable display and view of metadata according to chosen aspect and to serve special views. Hypermapping frequently applies to layering of geographical and geological information overlays.[10] [edit] GranularityGranularity is a term that applies to data as well as to metadata. The degree to which metadata is structured is referred to as its granularity. Metadata with a high granularity allows for deeper structured information and enables greater levels of technical manipulation however, a lower level of granularity means that metadata can be created for considerably lower costs but will not provide as detailed information. The major impact of granularity is not only on creation and capture, but moreover on maintenance. As soon as the metadata structures get outdated, the access to the referred data will get outdated. Hence granularity shall take into account the effort to create as well as the effort to maintain. [edit] Metadata standardsInternational standards apply to metadata. Much work is being accomplished in the national and international standards communities, especially ANSI (American National Standards Institute) and ISO (International Organization for Standardization) to reach consensus on standardizing metadata and registries. The core standard is ISO/IEC 11179-1:2004 [11] and subsequent standards (see ISO/IEC 11179). All yet published registrations according to this standard cover just the definition of metadata and do not serve the structuring of metadata storage or retrieval neither any administrative standardisation. [edit] Metadata usage[edit] Data VirtualizationMain article: Data Virtualization
Data Virtualization has emerged as the new software technology to complete the virtualization stack in the enterprise. Metadata is used in Data Virtualization servers which are enterprise infrastructure components, along side with Database and Application servers. Metadata in these servers is saved as persistent repository and describes business objects in various enterprise systems and applications. [edit] Statistics and census servicesStandardisation work has had a large impact on efforts to build metadata systems in the statistical community. Several metadata standards are described, and their importance to statistical agencies is discussed. Applications of the standards at the Census Bureau, Environmental Protection Agency, Bureau of Labor Statistics, Statistics Canada, and many others are described. Emphasis is on the impact a metadata registry can have in a statistical agency. [edit] Library and information scienceLibraries employ metadata in library catalogues, most commonly as part of an Integrated Library Management System. Metadata is obtained by cataloguing resources such as books, periodicals, DVDs, web pages or digital images. This data is stored in the integrated library management system, ILMS, using the MARC metadata standard. The purpose is to direct patrons to the physical or electronic location of items or areas they seek as well as to provide a description of the item/s in question. More recent and specialised instances of library metadata include the establishment of digital libraries including e-print repositories and digital image libraries. While often based on library principles the focus on non-librarian use, espcially in providing metadata means they do not follow traditional or common cataloguing approaches. Given the custom nature of included materials metadata fields are often specially created e.g. taxonomic classification fields, location fields, keywords or copyright statement. Standard file information such as filesize and format are usually automatically included. Standardisation for library operation has been a key topic in international standardisation (ISO) for decades. Standards for metadata in digital libraries include Dublin Core, METS, MODS, DDI, ISO standard Digital Object Identifier (DOI), ISO standard Uniform Resource Name (URN), PREMIS schema, and OAI-PMH. Leading libraries in the world give hints on their metadata standards strategies.[12][13] [edit] Metadata and the law[edit] United StatesProblems involving metadata in litigation in the United States are becoming widespread.[when?] Courts have looked at various questions involving metadata, including the discoverability of metadata by parties. Although the Federal Rules of Civil Procedure have only specified rules about electronic documents, subsequent case law has elaborated on the requirement of parties to reveal metadata.[14] In October 2009, the Arizona Supreme Court has ruled that metadata records are public record.[15] Document Metadata has proven particularly important in legal environments in which litigation has requested metadata, which can include sensitive information detrimental to a party in court. Using metadata removal tools to "clean" documents can mitigate the risks of unwittingly sending sensitive data. This process partially (see Data remanence) protects law firms from potentially damaging leaking of sensitive data through Electronic Discovery. [edit] Metadata in healthcareAustralian researches in medicine started a lot of metadata definition for applications in health care. That approach offers the first recognised attempt to adhere to international standards in medical sciences instead of defining a proprietary standard under the WHO umbrella first. The medical community yet did not approve the need to follow metadata standards despite respective research.[16] [edit] Metadata and data warehousingData warehouse (DW) is a repository of an organization's electronically stored data. Data warehouses are designed to manage and store the data whereas the Business Intelligence (BI) focuses on the usage of data to facilitate reporting and analysis.[17] The purpose of a data warehouse is to house standardized, structured, consistent, integrated, correct, cleansed and timely data, extracted from various operational systems in an organization. The extracted data is integrated in the data warehouse environment in order to provide an enterprise wide perspective, one version of the truth. Data is structured in a way to specifically address the reporting and analytic requirements. An essential component of a data warehouse/business intelligence system is the metadata and tools to manage and retrieve metadata. Ralph Kimball[18] describes metadata as the DNA of the data warehouse as metadata defines the elements of the data warehouse and how they work together. Kimball et al.[19] refers to three main categories of metadata: Technical metadata, business metadata and process metadata. Technical metadata is primarily definitional while business metadata and process metadata are primarily descriptive. Keep in mind that the categories sometimes overlap.
[edit] Metadata on the InternetThe HTML format used to define web pages allows for the inclusion of a variety of types of metadata, from basic descriptive text, dates and keywords to further advanced metadata schemes such as the Dublin Core, e-GMS, and AGLS[20] standards. Pages can also be geotagged with coordinates. Metadata may be included in the page's header or in a separate file. Microformats allow metadata to be added to on-page data in a way that users don't see, but computers can readily access. Interestingly, many search engines are cautious about using metadata in their ranking algorithms due to exploitation of metadata and the practice of search engine optimization, SEO, to improve rankings. See Meta element article for further discussion. [edit] Geospatial metadataMetadata that describe geographic objects (such as datasets, maps, features, or simply documents with a geospatial component) have a history dating back to at least 1994 (refer MIT Library page on FGDC Metadata). This class of metadata is described more fully on the Geospatial metadata page. [edit] Metadata on CDs and DVDsCDs such as recordings of music will carry a layer of metadata about the recordings such as dates, artist, genre, copyright owner, etc. The metadata, not normally displayed by CD players, can be accessed and displayed by specialized music playback and/or editing applications. [edit] Cloud applicationsWith the availability of Cloud applications, which include those to add metadata to content, metadata is increasingly available over the Internet. [edit] Metadata administration and management[edit] Metadata storage
Metadata can be stored either internally, in the same file as the data, or externally, in a separate file. Metadata that is embedded with content is called embedded metadata. A data repository typically stores the metadata detached from the data. Both ways have advantages and disadvantages:
Moreover, there is the question of data format: storing metadata in a human-readable format such as XML can be useful because users can understand and edit it without specialized tools. On the other hand, these formats are not optimized for storage capacity; it may be useful to store metadata in a binary, non-human-readable format instead to speed up transfer and save memory. [edit] Database managementEach relational database system has its own mechanisms for storing metadata. Examples of relational-database metadata include:
In database terminology, this set of metadata is referred to as the catalog. The SQL standard specifies a uniform means to access the catalog, called the [edit] See also[edit] References
[edit] External links
This article is based on one or more articles in Wikipedia, with modifications and
additional content by SOURCES editors. This article is covered by a Creative Commons
Attribution-Sharealike 3.0 License (CC-BY-SA) and the GNU Free Documentation License
(GFDL). The remainder of the content of this website, except where otherwise indicated,
is copyright SOURCES and may not be reproduced without written permission.
(For information use the
Contact form.)
SOURCES.COM is an online portal and directory for journalists, news media, researchers and anyone seeking experts, spokespersons, and reliable information resources. Use SOURCES.COM to find experts, media contacts, news releases, background information, scientists, officials, speakers, newsmakers, spokespeople, talk show guests, story ideas, research studies, databases, universities, associations and NGOs, businesses, government spokespeople. Indexing and search applications by Ulli Diemer and Chris DeFreitas. For information about being included in SOURCES as a expert or spokesperson see the FAQ or use the online membership form. Check here for information about becoming an affiliate. For partnerships, content and applications, and domain name opportunities contact us. |