Join | Renew | Donate
Stay on top of all DAMA-RMC news and announcements here.
Data Quality can be defined as the degree to which dimensions of Data Quality meet the requirements. This implies that requirements should be formulated for each (relevant) dimension. A much shorter definition for quality of data is ‘fit for purpose.’
Data that meets the requirements are of sufficient quality; data that doesn’t meet the requirements are of insufficient quality. To keep it simple, we respectively speak of high and low, or poor quality data.
Effective Data Management involves a set of interrelated processes enabling an organization to use its data to achieve strategic goals. An underlying assertion is that the data itself is of high quality. Data Quality Management is the planning, implementation, and control of activities that apply quality management techniques to data in order to assure it is fit for consumption and meets the needs of data consumers.
High quality data is context driven. This means that the same data may be simultaneously viewed as high quality by some areas of an organization while being viewed as low quality by other areas. Many organizations fail to engage with this question of context, that is, high Data Quality being that which is fit for purpose.
If we understand organizations as data manufacturing machines, we can assert (from our experience in manufacturing) that organizations that formally manage the quality of data will be more effective, more efficient and deliver a better experience than those that leave Data Quality to chance. However, no organization has perfect business processes, technical processes, or data management practices. In reality, all organizations experience problems related to their Data Quality. Many factors undermine quality data: lack of understanding about the effects on organizational success, leadership that does not value Data Quality, poor planning, ‘siloed’ system design, inconsistent development processes, incomplete documentation, a lack of standards, or a lack of Data Governance.
As is the case with Data Governance and with Data Management as a whole, Data Quality Management is a function, not a program or project. This is because projects and even programs have starts, middles, and ends. A Data Quality Function is, or should be, a continuing business as usual set of activities. It will include both projects and programs (to address specific Data Quality improvements) as well as operational work, along with a commitment to communications and training. Most importantly, the long-term success of a Data Quality improvement program depends on getting an organization to change its culture and adopt a quality mindset. As stated in The Leader’s Data Manifesto, “fundamental, lasting change requires committed leadership and involvement from people at all levels in an organization.” People who use data to do their jobs – which in most organizations is a very large percentage of employees – need to drive change. One of the most critical changes to focus on is how their organizations manage and improve the quality of their data.
DAMA Rocky Mountain Chapter is happy to announce its partnership with TechYeet. A Slack community based on connecting people in the wider data and technology communities. With over 5,000 TechYeet members, DAMA-RMC is utilizing the TechYeet community platform to bring people together in the Data Management space.
Please reach out to Greg Sheridan PMI-ACP, VP of Partnerships & Sponsorships, at PartnershipsVP@damarmc.org, if you are in TechYeet and would like to join the DAMA-RMC channel.
Although a lineage graphic, such as in last week's figure, describes what is happening to a particular data element, not all business users will understand it. Higher levels of lineage (e.g., ‘System Lineage’) summarize movement at the system or application level. Many visualization tools provide zoom-in / zoom-out capability, to show data element lineage in the context of system lineage. For example, this figure shows a sample system lineage, where at a glance, general data movement is understood and visualized at a system or an application level.
As the number of data elements in a system grows, the lineage discovery becomes complex and difficult to manage. In order to successfully achieve the business goals, a strategy for discovering and importing assets into the Metadata repository requires planning and design. Successful lineage discovery needs to account for both business and technical focus:
Many data integration tools offer lineage analysis that considers not only the developed population code but the data model and the physical database as well. Some offer business user facing web interfaces to monitor and update definitions. These begin to look like business glossaries.
Documented lineage helps both business and technical people use data. Without it, much time is wasted in investigating anomalies, potential change impacts, or unknown results. Look to implement an integrated impact and lineage tool that can understand all the moving parts involved in the load process as well as end user reporting and analytics. Impact reports outline which components are affected by a potential change expediting and streamlining estimating and maintenance tasks.
Gift a colleague, or yourself, a 25% off discounted DAMA-RMC professional membership this holiday season. Join as a professional member OR upgrade from a guest membership. Professional membership includes:
Promo Code: 12HOLIDAY25
Join HERE.
Thanks to everyone who participated in DAMA-RMC's study sessions, bootcamp, and "Pay-If-You Pass" exam prep over the last few months.
We are thrilled at the progress everyone made and excited to announce 6 new CDMPs:
Funke Bishi
John Lieto
Katrina Miyamoto
Kris New
Benjamin Seidle
Rachel Udow
Several others will be completing their tests in the next few weeks.
We wish everyone luck!
Learn more: CDMP Certification with DAMA-RMC Support
Questions?
December 2024 Newsletter.pdf
A key benefit of discovering and documenting Metadata about the physical assets is to provide information on how data is transformed as it moves between systems. Many Metadata tools carry information about what is happening to the data within their environments and provide capabilities to view the lineage across the span of the systems or applications they interface. The current version of the lineage based on programming code is referred to as ‘As Implemented Lineage’. In contrast, lineage describe in mapping specification documents is referred to as ‘As Designed Lineage’.
The limitations of a lineage build are based on the coverage of the Metadata management system. Function-specific Metadata repositories or data visualization tools have information about the data lineage within the scope of the environments they interact with but will not provide visibility to what is happening to the data outside their environments.
Metadata management systems import the ‘As Implemented’ lineage from the various tools that can provide this lineage detail and then augment the data lineage with the ‘As Designed’ from the places where the actual implementation details is not extractable. The process of connecting the pieces of the data lineage referred to as stitching. It results in a holistic visualization of the data as it moves from its original locations (official source or system of record) until it lands in its final destination.
This figure shows a sample data element lineage. In reading this, the ‘Total Backorder’ business data element, which is physically implemented as column zz_total, depends on 3 other data elements: ‘Units Cost in Cents’ physically implemented as ‘yy_unit_cost’, ‘Tax in Ship to State’ implemented in ‘yy_tax’ and ‘Back Order Quantity’ implemented in ‘yy_qty’.
A Metadata Management system must be capable of extracting Metadata from many sources. Design the architecture to be capable of scanning the various Metadata sources and periodically updating the repository. The system must support the manual updates of Metadata, requests, searches, and lookups of Metadata by various user groups.
A managed Metadata environment should isolate the end user from the various and disparate Metadata sources. The architecture should provide a single access point for the Metadata repository. The access point must supply all related Metadata resources transparently to the user. Users should be able to access Metadata without being aware of the differing environments of the data sources. In analytics and Big Data solutions, the interface may have largely user-defined functions (UDF) to draw on various data sets, and the Metadata exposure to the end user is inherent to those customizations. With less reliance on UDF in solutions, end users will be gathering, inspecting, and using data sets more directly and various supporting Metadata is usually more exposed.
Design of the architecture depends on the specific requirements of the organization. Three technical architectural approaches to building a common Metadata repository mimic the approaches to designing data warehouses: centralized, distributed, and hybrid (see Section 1.3.6). These approaches all take into account implementation of the repository, and how the update mechanisms operate.
Create a data model for the Metadata repository, or metamodel, as one of the first design steps after the Metadata strategy is complete and the business requirements are understood. Different levels of metamodel may be developed as needed; a high-level conceptual model, that explains the relationships between systems, and a lower level metamodel that details the attributions, to describe the elements and processes of a model. In addition to being a planning tool and a means of articulating requirements, the metamodel is in itself a valuable source of Metadata. this figure depicts a sample Metadata repository metamodel. The boxes represent the high-level major entities, which contain the data.
Another advanced architectural approach is bi-directional Metadata Architecture, which allows Metadata to change in any part of the architecture (source, data integration, user interface), and then feedback is coordinated from the repository (broker) into its original source.
Various challenges are apparent in this approach. The design forces the Metadata repository to contain the latest version of the Metadata source and forces it to manage changes to the source, as well. Changes must be trapped systematically, and then resolved. Additional sets of process interfaces to tie the repository back to the Metadata source(s) must be built and maintained.
This figure illustrates how common Metadata from different sources is collected in a centralized Metadata store. Users submit their queries to the Metadata portal, which passes the request to a centralized repository. The centralized repository will try to fulfill the user request from the common Metadata collected initially from the various sources. As the request becomes more specific or the user needs more detailed Metadata then the centralized repository will delegate down to the specific source to research the specific details. Global search across the various tools is available due to the common Metadata collected in the centralized repository.
A completely distributed architecture maintains a single access point. The Metadata retrieval engine responds to user requests by retrieving data from source systems in real time; there is no persistent repository. In this architecture, the Metadata management environment maintains the necessary source system catalogs and lookup information needed to process user queries and searches effectively. A common object request broker or similar middleware protocol accesses these source systems.
Advantages of distributed Metadata Architecture include:
Distributed architectures also have limitations:
This figure illustrates a distributed Metadata Architecture. There is no centralized Metadata repository store and the portal passes the users’ requests to the appropriate tool to execute. As there is no centralized store for the Metadata to be collected from the various tools, every request has to be delegated down to the sources; hence, no capability exist for a global search across the various Metadata sources.
Featured articles coming soon!
About us| Events | Learn | Join DAMA-RMC| Contacts
© DAMA-RMC 2022