DAMA - Rocky Mountain Chapter

Welcome New Board Member - Mandi Albano

11/21/2024 7:00 AM | Anonymous member (Administrator)

Mandi Albano joins the DAMA-RMC board as the new VP of Data.

Amanda (Mandi) Albano is a seasoned software and database expert with a passion for leveraging technology to drive business success and improve lives. With a foundation in complex system design and data management, she began her career at StarTek Inc., developing performance-enhancing software supporting major telecommunications companies. Amanda then transitioned to consulting at Sogeti USA, where she led projects for the State of Wyoming, focusing on data integration and reporting. For the past 14 years at Market Perceptions, Inc., she has specialized in creating data-driven solutions centered around strategic and operational insights based on marking research data. Amanda combines her technical expertise with a commitment to building strong, trust-based partnerships, aiming to deliver best-in-class solutions that foster customer growth and advancement.

Please give Mandi a warm DAMA-RMC welcome.

Mandi Albano Linked In

DMBok Figure 85 Centralized Metadata Architecture

11/20/2024 7:00 AM | Anonymous member (Administrator)

A centralized architecture consists of a single Metadata repository that contains copies of Metadata from the various sources. Organizations with limited IT resources or those seeking to automate as much as possible, may choose to avoid this architecture option. Organizations seeking a high degree of consistency within the common Metadata repository can benefit from a centralized architecture.

Advantages of a centralized repository include:

High availability, since it is independent of the source systems
Quick Metadata retrieval, since the repository and the query reside together
Resolved database structures not affected by the proprietary nature of third party or commercial systems
Extracted Metadata may be transformed, customized, or enhanced with additional Metadata that may not reside in the source system, improving quality

Some limitations of the centralized approach include:

Complex processes are necessary to ensure that changes in source Metadata are quickly replicated into the repository
Maintenance of a centralized repository can be costly
Extraction could require custom modules or middleware
Validation and maintenance of customized code can increase the demands on both internal IT staff and the software vendors

This figure shows how Metadata is collected in a standalone Metadata repository with its own internal Metadata store. The internal store is populated through a scheduled import (arrows) of the Metadata from the various tools. In turn, the centralized repository exposes a portal for the end users to submit their queries. The Metadata portal passes the request to the centralized Metadata repository. The centralized repository will fulfill the request from the collected Metadata. In this type of implementation, the capability to pass the request from the user to various tools directly is not supported. Global search across the Metadata collected from the various tool is possible due to the collection of various Metadata in the centralized repository.

DMBoK Figure 84 Context Diagram: Metadata

11/13/2024 11:03 AM | Anonymous member (Administrator)

The most common definition of Metadata, “data about data,” is misleadingly simple. The kind of information that can be classified as Metadata is wide-ranging. Metadata includes information about technical and business processes, data rules and constraints, and logical and physical data structures. It describes the data itself (e.g., databases, data elements, data models), the concepts the data represents (e.g., business processes, application systems, software code, technology infrastructure), and the connections (relationships) between the data and concepts. Metadata helps an organization understand its data, its systems, and its workflows. It enables Data Quality assessment and is integral to the management of databases and other applications. It contributes to the ability to process, maintain, integrate, secure, audit, and govern other data.

To understand Metadata’s vital role in data management, imagine a large library, with hundreds of thousands of books and magazines, but no card catalog. Without a card catalog, readers might not even know how to start looking for a specific book or even a specific topic. The card catalog not only provides the necessary information (which books and materials the library owns and where they are shelved) it also enables patrons to find materials using different starting points (subject area, author, or title). Without the catalog, finding a specific book would be difficult if not impossible. An organization without Metadata is like a library without a card catalog.

Metadata is essential to data management as well as data usage (see multiple references to Metadata throughout the DAMA-DMBOK). All large organizations produce and use a lot of data. Across an organization, different individuals will have different levels of data knowledge, but no individual will know everything about the data. This information must be documented or the organization risks losing valuable knowledge about itself. Metadata provides the primary means of capturing and managing organizational knowledge about data. However, Metadata management is not only a knowledge management challenge; it is also a risk management necessity. Metadata is necessary to ensure an organization can identify private or sensitive data and that it can manage the data lifecycle for its own benefit and in order to meet compliance requirements and minimize risk exposure.

Without reliable Metadata, an organization does not know what data it has, what the data represents, where it originates, how it moves through systems, who has access to it, or what it means for the data to be of high quality. Without Metadata, an organization cannot manage its data as an asset. Indeed, without Metadata, an organization may not be able to manage its data at all. As technology has evolved, the speed at which data is generated has also increased. Technical Metadata has become integral to the way in which data is moved and integrated. ISO’s Metadata Registry Standard, ISO/IEC 11179, is intended to enable Metadata-driven exchange of data in a heterogeneous environment, based on exact definitions of data. Metadata present in XML and other formats enables use of the data. Other types of Metadata tagging allow data to be exchanged while retaining signifiers of ownership, security requirements, etc. (See Chapter 8.)

Like other data, Metadata requires management. As the capacity of organizations to collect and store data increases, the role of Metadata in data management grows in importance. To be data-driven, an organization must be Metadata-driven.

DMBoK Figure 83 Release Process Example

11/06/2024 7:00 AM | Anonymous member (Administrator)

Release Management is critical to an incremental development processes that grows new capabilities, enhances the production deployment, and ensures provision of regular maintenance across the deployed assets. This process will keep the warehouse up-to-date, clean, and operating at its best. However, this process requires the same alignment between IT and Business as between the Data Warehouse model and the BI capabilities. It is a continual improvement effort.

This Figure illustrates an example release process, based on a quarterly schedule. Over the year, there are three business-driven releases and one technology-based release (to address requirements internal to the warehouse). The process should enable incremental development of the warehouse and management of the backlog of requirements.

DMBoK Figure 82 Conceptual DW/BI and Big Data Architecture

10/30/2024 7:00 AM | Anonymous member (Administrator)

The data warehouse environment includes a collection of architectural components that need to be organized to meet the needs of the enterprise. Figure 82 depicts the architectural components of the DW/BI and Big Data Environment discussed in this section. The evolution of Big Data has changed the DW/BI landscape by adding another path through which data may be brought into an enterprise.

This Figure also depicts aspects of the data lifecycle. Data moves from source systems into a staging area where it may be cleansed and enriched as it is integrated and stored in the DW and/or an ODS. From the DW, it may be accessed via marts or cubes and used for various kinds of reporting. Big Data goes through a similar process but with a significant difference: while most warehouses integrate data before landing it in tables, Big Data solutions ingest data before integrating it. Big Data BI may include predictive analytics and data mining, as well as more traditional forms of reporting. (See Chapter 14.)

Source Systems, on the left side of this Figure, include the operational systems and external data to be brought into the DW/BI environment. These typically include operational systems such as CRM, Accounting, and Human Resources applications, as well as operational systems that differ based on industry. Data from vendors and external sources may also be included, as may DaaS, web content, and any Big Data computation results.

Data integration covers Extract, Transform, and Load (ETL), data virtualization, and other techniques of getting data into a common form and location. In a SOA environment, the data services layers are part of this component. In this Figure, all the arrows represent data integration processes. (See Chapter 8.)

DMBoK Figure 81 Kimball's Data Warehouse Chess Pieces

10/23/2024 7:00 AM | Anonymous member (Administrator)

Kimball’s Dimensional Data Warehouse is the other primary pattern for DW development. Kimball defines a data warehouse simply as “a copy of transaction data specifically structured for query and analysis” (Kimball, 2002). The ‘copy’ is not exact, however. Warehouse data is stored in a dimensional data model. The dimensional model is designed to enable data consumers to understand and use the data, while also enabling query performance. It is not normalized in the way an entity relationship model is.

Often referred to as Star Schema, dimensional models are comprised of facts, which contain quantitative data about business processes (e.g., sales numbers), and dimensions, which store descriptive attributes related to fact data and allow data consumers to answer questions about the facts (e.g., how many units of product X were sold this quarter?) A fact table joins with many dimension tables, and when viewed as a diagram, appears as a star. (See Chapter 5.) Multiple fact tables will share the common, or conformed, dimensions via a ‘bus’, similar to a bus in a computer. Multiple data marts can be integrated at an enterprise level by plugging into the bus of conformed dimensions.

The DW bus matrix shows the intersection of business processes that generate fact data and data subject areas that represent dimensions. Opportunities for conformed dimensions exist where multiple processes use the same data. Table 27 is a sample bus matrix. In this example, the business processes for Sales, Inventory, and Orders all require Date and Product data. Sales and Inventory both require Store data, while Inventory and Orders require Vendor data. Date, Product, Store, and Vendor are all candidates for conformed dimensions. In contrast, Warehouse is not shared; it is used only by Inventory.

The enterprise DW bus matrix can be used to represent the long-term data content requirements for the DW/BI system, independent of technology. This tool enables an organization to scope manageable development efforts. Each implementation builds an increment of the overall architecture. At some point, enough dimensional schemas exist to make good on the promise of an integrated enterprise data warehouse environment. This figure represents Kimball’s Data Warehouse Chess Pieces view of DW/BI architecture. Note that Kimball’s Data Warehouse is more expansive than Inmon’s. The DW encompasses all components in the data staging and data presentation areas.

Operational source systems: Operational / transactional applications of the Enterprise. These create the data that is integrated into the ODS and DW. This component is equivalent to the application systems in the CIF diagram.

Data staging area: Kimball’s staging includes the set of processes needed to integrate and transform data for presentation. It can be compared to a combination of CIF’s integration, transformation, and DW components. Kimball’s focus is on efficient end-delivery of the analytical data, a scope smaller than Inmon’s corporate management of data. Kimball’s enterprise DW can fit into the architecture of the data staging area.

Data presentation area: Similar to the Data Marts in the CIF. The key architectural difference being an integrating paradigm of a ‘DW Bus,’ such as shared or conformed dimensions unifying the multiple data marts.

Data access tools: Kimball’s approach focuses on end users’ data requirements. These needs drive the adoption of appropriate data access tools.

October 2024 Newsletter

10/18/2024 7:00 AM | Anonymous member (Administrator)

October 2024 Newsletter.pdf

DMBoK Figure 80 The Corporate Information Factory

10/16/2024 7:00 AM | Anonymous member (Administrator)

Bill Inmon’s Corporate Information Factory (CIF) is one of the two primary patterns for data warehousing. The component parts of Inmon’s definition of a data warehouse, “a subject oriented, integrated, time variant, and nonvolatile collection of summary and detailed historical data,” describe the concepts that support the CIF and point to the differences between warehouses and operational systems.

Subject-oriented: The data warehouse is organized based on major business entities, rather than focusing on a functional or application.

Integrated: Data in the warehouse is unified and cohesive. The same key structures, encoding and decoding of structures, data definitions, naming conventions are applied consistently throughout the warehouse. Because data is integrated, Warehouse data is not simply a copy of operational data. Instead, the warehouse becomes a system of record for the data.

Time variant: The data warehouse stores data as it exists in a set point in time. Records in the DW are like snapshots. Each one reflects the state of the data at a moment of time. This means that querying data based on a specific time period will always produce the same result, regardless of when the query is submitted.

Non-volatile: In the DW, records are not normally updated as they are in operational systems. Instead, new data is appended to existing data. A set of records may represent different states of the same transaction.

Aggregate and detail data: The data in the DW includes details of atomic level transactions, as well as summarized data. Operational systems rarely aggregate data. When warehouses were first established, cost and space considerations drove the need to summarize data. Summarized data can be persistent (stored in a table) or non-persistent (rendered in a view) in contemporary DW environments. The deciding factor in whether to persist data is usually performance.

Historical: The focus of operational systems is current data. Warehouses contain historical data as well. Often they house vast amounts of it.

Inmon, Claudia Imhoff and Ryan Sousa describe data warehousing in the context of the Corporate Information Factory (CIF). See this figure. CIF components include:

Applications: Applications perform operational processes. Detail data from applications is brought into the data warehouse and the operational data stores (ODS) where it can be analyzed.

Staging Area: A database that stands between the operational source databases and the target databases. The data staging area is where the extract, transform, and load effort takes place. It is not used by end users. Most data in the data staging area is transient, although typically there is some relatively small amount of persistent data.

Integration and transformation: In the integration layer, data from disparate sources is transformed so that it can be integrated into the standard corporate representation / model in the DW and ODS.

Operational Data Store (ODS): An ODS is integrated database of operational data. It may be sourced directly from applications or from other databases. ODS’s generally contain current or near term data (30-90 days), while a DW contains historical data as well (often several years of data). Data in ODS’s is volatile, while warehouse data is stable. Not all organizations use ODS’s. They evolved as to meet the need for low latency data. An ODS may serve as the primary source for a data warehouse; it may also be used to audit a data warehouse.

Data marts: Data marts provide data prepared for analysis. This data is often a sub-set of warehouse data designed to support particular kinds of analysis or a specific group of data consumers. For example, marts can aggregate data to support faster analysis. Dimensional modeling (using denormalization techniques) is often used to design user-oriented data marts.

Operational Data Mart (OpDM): An OpDM is a data mart focused on tactical decision support. It is sourced directly from an ODS, rather than from a DW. It shares characteristics of the ODS: it contains current or near-term data. Its contents are volatile.

Data Warehouse: The DW provides a single integration point for corporate data to support management decision-making, and strategic analysis and planning. The data flows into a DW from the application systems and ODS, and flows out to the data marts, usually in one direction only. Data that needs correction is rejected, corrected at its source, and ideally re-fed through the system.

Operational reports: Reports are output from the data stores.

Reference, Master, and external data: In addition to transactional data from applications, the CIF also includes data required to understand transactions, such as reference and Master Data. Access to common data simplifies integration in the DW. While applications consume current master and Reference Data, the DW also requires historical values and the timeframes during which they were valid (see Chapter 10).

This figure depicts movement within the CIF, from data collection and creation via applications (on the left) to the creation of information via marts and analysis (on the right). Movement from left to right includes other changes. For example,

The purpose shifts from execution of operational functions to analysis
End users of systems move from front line workers to decision-makers
System usage moves from fixed operations to ad hoc uses
Response time requirements are relaxed (strategic decisions take more time than do daily operations)
Much more data is involved in each operation, query, or process

The data in DW and marts differs from that in applications:

Data is organized by subject rather than function
Data is integrated data rather than ‘siloed’
Data is time-variant vs. current-valued only
Data has higher latency in DW than in applications
Significantly more historical data is available in DW than in applications

DMBoK Figure 79 Context Diagram DW/BI

10/09/2024 7:30 AM | Anonymous member (Administrator)

The concept of the Data Warehouse emerged in the 1980s as technology enabled organizations to integrate data from a range of sources into a common data model. Integrated data promised to provide insight into operational processes and open up new possibilities for leveraging data to make decisions and create organizational value. As importantly, data warehouses were seen as a means to reduce the proliferation of decision support systems (DSS), most of which drew on the same core enterprise data. The concept of an enterprise warehouse promised a way to reduce data redundancy, improve the consistency of information, and enable an enterprise to use its data to make better decisions.

Data warehouses began to be built in earnest in the 1990s. Since then (and especially with the co-evolution of Business Intelligence as a primary driver of business decision-making), data warehouses have become ‘mainstream’. Most enterprises have data warehouses and warehousing is the recognized core of enterprise data management.63 Even though well established, the data warehouse continues to evolve. As new forms of data are created with increasing velocity, new concepts, such as data lakes, are emerging that will influence the future of the data warehouse. See Chapters 8 and 15.

The primary driver for data warehousing is to support operational functions, compliance requirements, and Business Intelligence (BI) activities (though not all BI activities depend on warehouse data). Increasingly organizations are being asked to provide data as evidence that they have complied with regulatory requirements. Because they contain historical data, warehouses are often the means to respond to such requests. Nevertheless, Business Intelligence support continues to be the primary reason for a warehouse. BI promises insight about the organization, its customers, and its products. An organization that acts on knowledge gained from BI can improve operational efficiency and competitive advantage. As more data has become available at a greater velocity, BI has evolved from retrospective assessment to predictive analytics.

DMBoK Figure 78 Reference Data Change Request Process

10/02/2024 7:30 AM | Anonymous member (Administrator)

Since Reference Data is a shared resource, it cannot be changed arbitrarily. The key to successful Reference Data Management is organizational willingness to relinquish local control of shared data. To sustain this support, provide channels to receive and respond to requests for changes to Reference Data. The Data Governance Council should ensure that policies and procedures are implemented to handle changes to data within reference and Master Data environments.

Changes to Reference Data will need to be managed. Minor changes may affect a few rows of data. For example, when the Soviet Union broke into independent states, the term Soviet Union was deprecated and new codes were added. In the healthcare industry, procedure and diagnosis codes are updated annually to account for refinement of existing codes, obsoleting of codes, and the introduction of new codes. Major revisions to Reference Data impact data structure. For example, ICD-10 Diagnostic Codes are structured in ways very different from ICD-9. ICD10 has a different format. There are different values for the same concepts. More importantly, ICD-10 has additional principles of organization. ICD10 codes have a different granularity and are much more specific, so more information is conveyed in a single code. Consequently, there are many more of them (as of 2015, there were 68,000 ICD-10 codes, compared with 13,000 ICD-9s).

The mandated use of ICD-10 codes in the US in 2015 required significant planning. Healthcare companies needed to make system changes as well as adjustments to impacted reporting to account for the new standard.

Types of changes include:

Row level changes to external Reference Data sets
Structural changes to external Reference Data sets
Row level changes to internal Reference Data sets
Structural changes to internal Reference Data sets
Creation of new Reference Data sets

Changes can be planned / scheduled or ad hoc. Planned changes, such as monthly or annual updates to industry standard codes, require less governance than ad hoc updates. The process to request new Reference Data sets should account for potential uses beyond those of the original requestor.

Change requests should follow a defined process, as illustrated in this figure. When requests are received, stakeholders should be notified so that impacts can be assessed. If changes need approval, discussions should be held to get that approval. Changes should be communicated.

News & Announcements

Welcome New Board Member - Mandi Albano

11/21/2024 7:00 AM | Anonymous member (Administrator)

DMBok Figure 85 Centralized Metadata Architecture

11/20/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 84 Context Diagram: Metadata

11/13/2024 11:03 AM | Anonymous member (Administrator)

DMBoK Figure 83 Release Process Example

11/06/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 82 Conceptual DW/BI and Big Data Architecture

10/30/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 81 Kimball's Data Warehouse Chess Pieces

10/23/2024 7:00 AM | Anonymous member (Administrator)

October 2024 Newsletter

10/18/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 80 The Corporate Information Factory

10/16/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 79 Context Diagram DW/BI

10/09/2024 7:30 AM | Anonymous member (Administrator)

DMBoK Figure 78 Reference Data Change Request Process

10/02/2024 7:30 AM | Anonymous member (Administrator)

Featured Articles

Not a member yet?
Join us now

Quick links

Follow our activities

News & Announcements

11/21/2024 7:00 AM | Anonymous member (Administrator)

11/20/2024 7:00 AM | Anonymous member (Administrator)

11/13/2024 11:03 AM | Anonymous member (Administrator)

11/06/2024 7:00 AM | Anonymous member (Administrator)

10/30/2024 7:00 AM | Anonymous member (Administrator)

10/23/2024 7:00 AM | Anonymous member (Administrator)

10/18/2024 7:00 AM | Anonymous member (Administrator)

10/16/2024 7:00 AM | Anonymous member (Administrator)

10/09/2024 7:30 AM | Anonymous member (Administrator)

10/02/2024 7:30 AM | Anonymous member (Administrator)

Featured Articles

Not a member yet? Join us now

Quick links

Follow our activities

Not a member yet?
Join us now