Join | Renew | Donate
Stay on top of all DAMA-RMC news and announcements here.
Another advanced architectural approach is bi-directional Metadata Architecture, which allows Metadata to change in any part of the architecture (source, data integration, user interface), and then feedback is coordinated from the repository (broker) into its original source.
Various challenges are apparent in this approach. The design forces the Metadata repository to contain the latest version of the Metadata source and forces it to manage changes to the source, as well. Changes must be trapped systematically, and then resolved. Additional sets of process interfaces to tie the repository back to the Metadata source(s) must be built and maintained.
This figure illustrates how common Metadata from different sources is collected in a centralized Metadata store. Users submit their queries to the Metadata portal, which passes the request to a centralized repository. The centralized repository will try to fulfill the user request from the common Metadata collected initially from the various sources. As the request becomes more specific or the user needs more detailed Metadata then the centralized repository will delegate down to the specific source to research the specific details. Global search across the various tools is available due to the common Metadata collected in the centralized repository.
A completely distributed architecture maintains a single access point. The Metadata retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository. In this architecture, the Metadata management environment maintains the necessary source system catalogs and lookup information needed to process user queries and searches effectively. A common object request broker or similar middleware protocol accesses these source systems.
Advantages of distributed Metadata Architecture include:
Distributed architectures also have limitations:
This figure illustrates a distributed Metadata Architecture. There is no centralized Metadata repository store and the portal passes the users’ requests to the appropriate tool to execute. As there is no centralized store for the Metadata to be collected from the various tools, every request has to be delegated down to the sources; hence, no capability exist for a global search across the various Metadata sources.
A WARM DAMA Rocky Mountain Chapter welcome to our new members who joined in Q3! We wouldn't be the organization we are without you!
Professional Members:
Corporate Members:
Guest Members:
Thank you also to everyone who renewed their membership in Q3!
November 2024 Newsletter.pdf
Mandi Albano joins the DAMA-RMC board as the new VP of Data.
Amanda (Mandi) Albano is a seasoned software and database expert with a passion for leveraging technology to drive business success and improve lives. With a foundation in complex system design and data management, she began her career at StarTek Inc., developing performance-enhancing software supporting major telecommunications companies. Amanda then transitioned to consulting at Sogeti USA, where she led projects for the State of Wyoming, focusing on data integration and reporting. For the past 14 years at Market Perceptions, Inc., she has specialized in creating data-driven solutions centered around strategic and operational insights based on marking research data. Amanda combines her technical expertise with a commitment to building strong, trust-based partnerships, aiming to deliver best-in-class solutions that foster customer growth and advancement.
Please give Mandi a warm DAMA-RMC welcome.
Mandi Albano Linked In
A centralized architecture consists of a single Metadata repository that contains copies of Metadata from the various sources. Organizations with limited IT resources or those seeking to automate as much as possible, may choose to avoid this architecture option. Organizations seeking a high degree of consistency within the common Metadata repository can benefit from a centralized architecture.
Advantages of a centralized repository include:
Some limitations of the centralized approach include:
This figure shows how Metadata is collected in a standalone Metadata repository with its own internal Metadata store. The internal store is populated through a scheduled import (arrows) of the Metadata from the various tools. In turn, the centralized repository exposes a portal for the end users to submit their queries. The Metadata portal passes the request to the centralized Metadata repository. The centralized repository will fulfill the request from the collected Metadata. In this type of implementation, the capability to pass the request from the user to various tools directly is not supported. Global search across the Metadata collected from the various tool is possible due to the collection of various Metadata in the centralized repository.
The most common definition of Metadata, “data about data,” is misleadingly simple. The kind of information that can be classified as Metadata is wide-ranging. Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures. It describes the data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g., business processes, application systems, software code, technology infrastructure), and the connections (relationships) between the data and concepts. Metadata helps an organization understand its data, its systems, and its workflows. It enables Data Quality assessment and is integral to the management of databases and other applications. It contributes to the ability to process, maintain, integrate, secure, audit, and govern other data.
To understand Metadata’s vital role in data management, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog. Without a card catalog, readers might not even know how to start looking for a specific book or even a specific topic. The card catalog not only provides the necessary information (which books and materials the library owns and where they are shelved) it also enables patrons to find materials using different starting points (subject area, author, or title). Without the catalog, finding a specific book would be difficult if not impossible. An organization without Metadata is like a library without a card catalog.
Metadata is essential to data management as well as data usage (see multiple references to Metadata throughout the DAMA-DMBOK). All large organizations produce and use a lot of data. Across an organization, different individuals will have different levels of data knowledge, but no individual will know everything about the data. This information must be documented or the organization risks losing valuable knowledge about itself. Metadata provides the primary means of capturing and managing organizational knowledge about data. However, Metadata management is not only a knowledge management challenge; it is also a risk management necessity. Metadata is necessary to ensure an organization can identify private or sensitive data and that it can manage the data lifecycle for its own benefit and in order to meet compliance requirements and minimize risk exposure.
Without reliable Metadata, an organization does not know what data it has, what the data represents, where it originates, how it moves through systems, who has access to it, or what it means for the data to be of high quality. Without Metadata, an organization cannot manage its data as an asset. Indeed, without Metadata, an organization may not be able to manage its data at all. As technology has evolved, the speed at which data is generated has also increased. Technical Metadata has become integral to the way in which data is moved and integrated. ISO’s Metadata Registry Standard, ISO/IEC 11179, is intended to enable Metadata-driven exchange of data in a heterogeneous environment, based on exact definitions of data. Metadata present in XML and other formats enables use of the data. Other types of Metadata tagging allow data to be exchanged while retaining signifiers of ownership, security requirements, etc. (See Chapter 8.)
Like other data, Metadata requires management. As the capacity of organizations to collect and store data increases, the role of Metadata in data management grows in importance. To be data-driven, an organization must be Metadata-driven.
Release Management is critical to an incremental development processes that grows new capabilities, enhances the production deployment, and ensures provision of regular maintenance across the deployed assets. This process will keep the warehouse up-to-date, clean, and operating at its best. However, this process requires the same alignment between IT and Business as between the Data Warehouse model and the BI capabilities. It is a continual improvement effort.
This Figure illustrates an example release process, based on a quarterly schedule. Over the year, there are three business-driven releases and one technology-based release (to address requirements internal to the warehouse). The process should enable incremental development of the warehouse and management of the backlog of requirements.
The data warehouse environment includes a collection of architectural components that need to be organized to meet the needs of the enterprise. Figure 82 depicts the architectural components of the DW/BI and Big Data Environment discussed in this section. The evolution of Big Data has changed the DW/BI landscape by adding another path through which data may be brought into an enterprise.
This Figure also depicts aspects of the data lifecycle. Data moves from source systems into a staging area where it may be cleansed and enriched as it is integrated and stored in the DW and/or an ODS. From the DW, it may be accessed via marts or cubes and used for various kinds of reporting. Big Data goes through a similar process but with a significant difference: while most warehouses integrate data before landing it in tables, Big Data solutions ingest data before integrating it. Big Data BI may include predictive analytics and data mining, as well as more traditional forms of reporting. (See Chapter 14.)
Source Systems, on the left side of this Figure, include the operational systems and external data to be brought into the DW/BI environment. These typically include operational systems such as CRM, Accounting, and Human Resources applications, as well as operational systems that differ based on industry. Data from vendors and external sources may also be included, as may DaaS, web content, and any Big Data computation results.
Data integration covers Extract, Transform, and Load (ETL), data virtualization, and other techniques of getting data into a common form and location. In a SOA environment, the data services layers are part of this component. In this Figure, all the arrows represent data integration processes. (See Chapter 8.)
Kimball’s Dimensional Data Warehouse is the other primary pattern for DW development. Kimball defines a data warehouse simply as “a copy of transaction data specifically structured for query and analysis” (Kimball, 2002). The ‘copy’ is not exact, however. Warehouse data is stored in a dimensional data model. The dimensional model is designed to enable data consumers to understand and use the data, while also enabling query performance. It is not normalized in the way an entity relationship model is.
Often referred to as Star Schema, dimensional models are comprised of facts, which contain quantitative data about business processes (e.g., sales numbers), and dimensions, which store descriptive attributes related to fact data and allow data consumers to answer questions about the facts (e.g., how many units of product X were sold this quarter?) A fact table joins with many dimension tables, and when viewed as a diagram, appears as a star. (See Chapter 5.) Multiple fact tables will share the common, or conformed, dimensions via a ‘bus’, similar to a bus in a computer. Multiple data marts can be integrated at an enterprise level by plugging into the bus of conformed dimensions.
The DW bus matrix shows the intersection of business processes that generate fact data and data subject areas that represent dimensions. Opportunities for conformed dimensions exist where multiple processes use the same data. Table 27 is a sample bus matrix. In this example, the business processes for Sales, Inventory, and Orders all require Date and Product data. Sales and Inventory both require Store data, while Inventory and Orders require Vendor data. Date, Product, Store, and Vendor are all candidates for conformed dimensions. In contrast, Warehouse is not shared; it is used only by Inventory.
The enterprise DW bus matrix can be used to represent the long-term data content requirements for the DW/BI system, independent of technology. This tool enables an organization to scope manageable development efforts. Each implementation builds an increment of the overall architecture. At some point, enough dimensional schemas exist to make good on the promise of an integrated enterprise data warehouse environment. This figure represents Kimball’s Data Warehouse Chess Pieces view of DW/BI architecture. Note that Kimball’s Data Warehouse is more expansive than Inmon’s. The DW encompasses all components in the data staging and data presentation areas.
Featured articles coming soon!
About us| Events | Learn | Join DAMA-RMC| Contacts
© DAMA-RMC 2022