DAMA - Rocky Mountain Chapter

June 2024 Newsletter

06/21/2024 7:00 AM | Anonymous member (Administrator)

June 2024 Newsletter.pdf

DMBOK Figure 62 Sources of Data Security Requirements

06/18/2024 1:51 PM | Anonymous member (Administrator)

Data Security includes the planning, development, and execution of security policies and procedures to provide proper authentication, authorization, access, and auditing of data and information assets. The specifics of data security (which data needs to be protected, for example) differ between industries and countries. Nevertheless, the goal of data security practices is the same: To protect information assets in alignment with privacy and confidentiality regulations, contractual agreements, and business requirements. These requirements come from:

Stakeholders: Organizations must recognize the privacy and confidentiality needs of their stakeholders, including clients, patients, students, citizens, suppliers, or business partners. Everyone in an organization must be a responsible trustee of data about stakeholders.

Government regulations: Government regulations are in place to protect the interests of some stakeholders. Regulations have different goals. Some restrict access to information, while others ensure openness, transparency, and accountability.

Proprietary business concerns: Each organization has proprietary data to protect. An organization’s data provides insight into its customers and, when leveraged effectively, can provide a competitive advantage. If confidential data is stolen or breached, an organization can lose competitive advantage.

Legitimate access needs: When securing data, organizations must also enable legitimate access. Business processes require individuals in certain roles be able to access, use, and maintain data.

Contractual obligations: Contractual and non-disclosure agreements also influence data security requirements. For example, the PCI Standard, an agreement among credit card companies and individual business enterprises, demands that certain types of data be protected in defined ways (e.g., mandatory encryption for customer passwords).

Effective data security policies and procedures ensure that the right people can use and update data in the right way, and that all inappropriate access and update is restricted (Ray, 2012) (see this figure). Understanding and complying with the privacy and confidentiality interests and needs of all stakeholders is in the best interest of every organization. Client, supplier, and constituent relationships all trust in, and depend on, the responsible use of data.

DMBoK Figure 61 SLAs for System and Database Performance

06/12/2024 7:00 AM | Anonymous member (Administrator)

Set Database Performance Levels

System performance, data availability and recovery expectations, and expectations for teams to respond to issues are usually governed through Service Level Agreements (SLAs) between IT data management services organizations and data owners (this figure).

Typically, an SLA will identify the timeframes during which the database is expected to be available for use. Often an SLA will identify a specified maximum allowable execution time for a few application transactions (a mix of complex queries and updates). If the database is not available as agreed to, or if process execution times violate the SLA, the data owners will ask the DBA to identify and remediate the causes of the problem.

DMBoK Figure 60 Log Shipping vs. Mirroring

06/05/2024 2:33 PM | Anonymous member (Administrator)

Data replication means same data is stored on multiple storage devices. In some situations, having duplicate databases is useful, such as in a high-availability environment where spreading the workload among identical databases in different hardware or even data centers can preserve functionality during peak usage times or disasters.

Replication can be active or passive:

Active replication is performed by recreating and storing the same data at every replica from every other replica.

Passive replication involves recreating and storing data on a single primary replica and then transforming its resultant state to other secondary replicas.

Replication has two dimensions of scaling:

Horizontal data scaling has more data replicas.
Vertical data scaling has data replicas located further away in distance geographically.

Multi-master replication, where updates can be submitted to any database node and then ripple through to other servers, is often desired, but increases complexity and cost.

Replication transparency occurs when data is replicated between database servers so that the information remains consistent throughout the database system and users cannot tell or even know which database copy they are using.

The two primary replication patterns are mirroring and log shipping (see this Figure).

In mirroring, updates to the primary database are replicated immediately (relatively speaking) to the secondary database, as part of a two-phase commit process.

In log shipping, a secondary server receives and applies copies of the primary database’s transaction logs at regular intervals.

The choice of replication method depends on how critical the data is, and how important it is that failover to the secondary server be immediate. Mirroring is usually a more expensive option than log shipping. For one secondary server, mirroring is effective; log shipping may be used to update additional secondary servers.

DMBoK Figure 59 Database Organization Spectrum

05/29/2024 7:00 AM | Anonymous member (Administrator)

Data storage systems provide a way to encapsulate the instructions necessary to put data on disks and manage processing, so developers can simply use instructions to manipulate data. Databases are organized in three general ways: Hierarchical, Relational, and Non-Relational. These classes are not mutually exclusive (see this figure). Some database systems can read and write data organized in relational and non-relational structures. Hierarchical databases can be mapped to relational tables. Flat files with line delimiters can be read as tables with rows, and one or more columns can be defined to describe the row contents.

May 2024 Newsletter

05/24/2024 2:17 PM | Anonymous member (Administrator)

May 2024 Newsletter.pdf

DMBoK Figure 58 CAP Theorem

05/22/2024 12:37 PM | Anonymous member (Administrator)

The CAP Theorem (or Brewer’s Theorem) was developed in response to a shift toward more distributed systems (Brewer, 2000). The theorem asserts that a distributed system cannot comply with all parts of ACID at all time. The larger the system, the lower the compliance. A distributed system must instead trade-off between properties.

Consistency: The system must operate as designed and expected at all times.
Availability: The system must be available when requested and must respond to each request.
Partition Tolerance: The system must be able to continue operations during occasions of data loss or partial system failure.

The CAP Theorem states that at most two of the three properties can exist in any shared-data system. This is usually stated with a ‘pick two’ statement, illustrated in this figure.

An interesting use of this theorem drives the Lambda Architecture design discussed in Chapter 14. Lambda Architecture uses two paths for data: a Speed path where availability and partition tolerance are most important, and a Batch path where consistency and availability are most important.

DMBoK Figure 57 Coupling

05/15/2024 7:00 AM | Anonymous member (Administrator)

Loosely coupled systems require component databases to construct their own federated schema. A user will typically access other component database systems by using a multi-database language, but this removes any levels of location transparency, forcing the user to have direct knowledge of the federated schema. A user imports the data required from other component databases, and integrates it with their own to form a federated schema.

Tightly coupled systems consist of component systems that use independent processes to construct and publish an integrated federated schema, as illustrated in this figure. The same schema can apply to all parts of the federation, with no data replication.

DMBoK Figure 56 Federated Databases

05/08/2024 7:00 AM | Anonymous member (Administrator)

Federation provisions data without additional persistence or duplication of source data. A federated database system maps multiple autonomous database systems into a single federated database. The constituent databases, sometimes geographically separated, are interconnected via a computer network. They remain autonomous yet participate in a federation to allow partial and controlled sharing of their data. Federation provides an alternative to merging disparate databases. There is no actual data integration in the constituent databases because of data federation; instead, data interoperability manages the view of the federated databases as one large object (see Chapter 8). In contrast, a non-federated database system is an integration of component DBMS’s that are not autonomous; they are controlled, managed and governed by a centralized DBMS.

Federated databases are best for heterogeneous and distributed integration projects such as enterprise information integration, data virtualization, schema matching, and Master Data Management.

Federated architectures differ based on levels of integration with the component database systems and the extent of services offered by the federation. A FDBMS can be categorized as either loosely or tightly coupled.

DMBoK Figure 55 Centralized vs. Distributed

05/01/2024 7:00 AM | Anonymous member (Administrator)

A database can be classified as either centralized or distributed. A centralized system manages a single database, while a distributed system manages multiple databases on multiple systems. A distributed system’s components can be classified depending on the autonomy of the component systems into two types: federated (autonomous) or non-federated (non-autonomous). This figure illustrates the difference between centralized and distributed.

Centralized databases have all the data in one system in one place. All users come to the one system to access the data. For certain restricted data, centralization can be ideal, but for data that needs to be widely available, centralized databases have risks. For example, if the centralized system is unavailable, there are no other alternatives for accessing the data.

Distributed databases make possible quick access to data over a large number of nodes. Popular distributed database technologies are based on using commodity hardware servers. They are designed to scale out from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the database management software itself is designed to replicate data amongst the servers, thereby delivering a highly available service on top of a cluster of computers. Database management software is also designed to detect and handle failures. While any given computer may fail, the system overall is unlikely to.

Some distributed databases implement a computational paradigm named MapReduce to further improve performance. In MapReduce, the data request is divided into many small fragments of work, each of which may be executed or re-executed on any node in the cluster. In addition, data is co-located on the compute nodes, providing very high aggregate bandwidth across the cluster. Both the filesystem and the application are designed to automatically handle node failures.

News & Announcements

June 2024 Newsletter

06/21/2024 7:00 AM | Anonymous member (Administrator)

DMBOK Figure 62 Sources of Data Security Requirements

06/18/2024 1:51 PM | Anonymous member (Administrator)

DMBoK Figure 61 SLAs for System and Database Performance

06/12/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 60 Log Shipping vs. Mirroring

06/05/2024 2:33 PM | Anonymous member (Administrator)

DMBoK Figure 59 Database Organization Spectrum

05/29/2024 7:00 AM | Anonymous member (Administrator)

May 2024 Newsletter

05/24/2024 2:17 PM | Anonymous member (Administrator)

DMBoK Figure 58 CAP Theorem

05/22/2024 12:37 PM | Anonymous member (Administrator)

DMBoK Figure 57 Coupling

05/15/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 56 Federated Databases

05/08/2024 7:00 AM | Anonymous member (Administrator)

DMBoK Figure 55 Centralized vs. Distributed

05/01/2024 7:00 AM | Anonymous member (Administrator)

Featured Articles

Not a member yet?
Join us now

Quick links

Follow our activities

News & Announcements

06/21/2024 7:00 AM | Anonymous member (Administrator)

06/18/2024 1:51 PM | Anonymous member (Administrator)

06/12/2024 7:00 AM | Anonymous member (Administrator)

06/05/2024 2:33 PM | Anonymous member (Administrator)

05/29/2024 7:00 AM | Anonymous member (Administrator)

05/24/2024 2:17 PM | Anonymous member (Administrator)

05/22/2024 12:37 PM | Anonymous member (Administrator)

05/15/2024 7:00 AM | Anonymous member (Administrator)

05/08/2024 7:00 AM | Anonymous member (Administrator)

05/01/2024 7:00 AM | Anonymous member (Administrator)

Featured Articles

Not a member yet? Join us now

Quick links

Follow our activities

Not a member yet?
Join us now