This Document Contains Chapters 12 to 13 Chapter 12 Distributed Database Management Systems Discussion Focus Discuss the possible data request scenarios in a distributed database environment. 1. Single request accessing a single remote database. (See Figure D12.1.) Figure D12.1 Single Request to Single Remote DBMS The most primitive and least effective of the distributed database scenarios is based on a single SQL statement (a "request" or "unit of work") is directed to a single remote DBMS. (Such a request is known as a remote request.). We suggest that you remind the student of the distinction between a request and a transaction: • A request uses a single SQL statement to request data. • A transaction is a collection of two or more SQL statements. 2. Multiple requests accessing a single remote database. (See Figure D12.2.) Figure D12.2 Multiple Requests to a Single Remote DBMS A unit of work now consists of multiple SQL statements directed to a single remote DBMS. The local user defines the start/stop sequence of the units of work, using COMMIT, but the remote DBMS manages the unit of work's processing. 3. Multiple requests accessing multiple remote databases. (See Figure D12.3.) Figure D12.3 Multiple requests, Multiple Remote DBMSes A unit of work now may be composed of multiple SQL statements directed to multiple remote DBMSes. However, any one SQL statement may access only one of the remote DBMSes. As was true in the second scenario, the local user defines the start/stop sequence of the units of work, using COMMIT, but the remote DBMS to which the SQL statement was directed manages the unit of work's processing. In this scenario, a two-phase COMMIT must be used to coordinate COMMIT processing for the multiple locations. 4. Multiple requests accessing any combination of multiple remote DBMSes. (See Figure D12.4.) Figure D12.4 Multiple Requests and any Combination of Remote Databases A unit of work now may consist of multiple SQL statements addressed to multiple remote DBMSes, and each SQL statement may address any combination of databases. As was true in the third scenario, each local user defines the start/stop sequence of the units of work, using COMMIT, but the remote DBMS to which the SQL statement was directed manages the unit of work's processing. A two-phase COMMIT must be used to coordinate COMMIT processing for the multiple locations. Remaining discussion focus: The review questions cover a wide range of distributed database concept and design issues. The most important questions to be raised are: • What is the difference between a distributed database and distributed processing? • What is a fully distributed database management system? • Why is there a need for a two phase commit protocol, and what are these two phases? • What does "data fragmentation" mean, and what strategies are available to deal with data fragmentation? • Why and how must data replication be addressed in a distributed database environment? What replication strategies are available, and how do they work? • Since the current literature abounds with references to file servers and client server architectures, what do these terms mean? How are file servers different from client/server architectures? Why would you want to know? We have answered these questions in detail in the Answers to Review Question section of this chapter. Note particularly the answers to questions 5, 6, 11, and 15-17. NOTE Many questions raised in this section are more specific -- and certainly more technical -- than the questions raised in the previous chapters. Since the chapter covers the answers to these questions in great detail, we have elected to give you section references to avoid needless duplication. Answers to Review Questions 1. Describe the evolution from centralized DBMSs to distributed DBMSs. This question is answered in detail in section 12-1. 2. List and discuss some of the factors that influenced the evolution of the DDBMS. These factors are listed and discussed in section 12-1. 3. What are the advantages of the DDBMS? See section 12-2 and Table 12.1. 4. What are the disadvantages of the DDBMS? See section 12-2 and Table 12.1. 5. Explain the difference between distributed database and distributed processing. See section 12-3. 6. What is a fully distributed database management system? See section 12-4. 7. What are the components of a DDBMS? See section 12-5. 8. List and explain the transparency features of a DDBMS. See section 12-7. 9. Define and explain the different types of distribution transparency. See section 12-8. 10. Describe the different types of database requests and transactions. A database transaction is formed by one or more database requests. Each database request is the equivalent of a single SQL statement. The basic difference between a local transaction and a distributed transaction is that the latter can update or request data from several remote sites on a network. In a DDBMS, a database request and a database transaction can be of two types: remote or distributed. NOTE The figure references in the discussions refer to the figures found in the text. Note: The figure references in the discussions refer to the figures found in the text. The figures are not reproduced in this manual. A remote request accesses data located at a single remote database processor (or DP site). In other words, an SQL statement (or request) can reference data at only one remote DP site. Use Figure 12.9 to illustrate the remote request. A remote transaction, composed of several requests, accesses data at only a single remote DP site. Use Figure 12.10 to illustrate the remote transaction. As you discuss Figure 12.10, note that both tables are located at a remote DP (site B) and that the complete transaction can reference only one remote DP. Each SQL statement (or request) can reference only one (the same) remote DP at a time; the entire transaction can reference only one remote DP; and it is executed at only one remote DP. A distributed transaction allows a transaction to reference several different local or remote DP sites. Although each single request can reference only one local or remote DP site, the complete transaction can reference multiple DP sites because each request can reference a different site. Use Figure 12.11 to illustrate the distributed transaction. A distributed request lets us reference data from several different DP sites. Since each request can access data from more than one DP site, a transaction can access several DP sites. The ability to execute a distributed request requires fully distributed database processing because we must be able to: 1. Partition a database table into several fragments. 2. Reference one or more of those fragments with only one request. In other words, we must have fragmentation transparency. The location and partition of the data should be transparent to the end user. Use Figure 12.12 to illustrate the distributed request. As you discuss Figure 12.12, note that the transaction uses a single SELECT statement to reference two tables, CUSTOMER and INVOICE. The two tables are located at two different remote DP sites, B and C. The distributed request feature also allows a single request to reference a physically partitioned table. For example, suppose that a CUSTOMER table is divided into two fragments C1 and C2, located at sites B and C respectively. The end user wants to obtain a list of all customers whose balance exceeds $250.00. Use Figure 12.13 to illustrate this distributed request. Note that full fragmentation support is provided only by a DDBMS that supports distributed requests. 11. Explain the need for the two-phase commit protocol. Then describe the two phases. See section 12-9c. 12. What is the objective of the query optimization functions? The objective of query optimization functions is to minimize the total costs associated with the execution of a database request. The costs associated with a request are a function of: • the access time (I/O) cost involved in accessing the physical data stored on disk • the communication cost associated with the transmission of data among nodes in distributed database systems • the CPU time cost. It is difficult to separate communication and processing costs. Query optimization algorithms use different parameters, and the algorithms assign different weight to each parameter. For example, some algorithms minimize total time, others minimize the communication time, and still others do not factor in the CPU time, considering it insignificant relative to the other costs. Query optimization must provide distribution and replica transparency in distributed database systems. 13. To which transparency feature are the query optimization functions related? Query optimization functions are associated with the performance transparency features of a DDBMS. In a DDBMS the query optimization routines are more complicated because the DDBMS must decide where and which fragment of the database to access. Data fragments are stored at several sites, and the data fragments are replicated at several sites. 14. What issues should be considered when resolving data requests in a distributed environment? A data request could be either a read or a write request. However, most requests tend to be read requests. In both cases, resolving data requests in a distributed data environment most consider the following issues: • Data distribution. • Data replication. • Network and node availability. A more detailed discussion of these factors can be found in section 12-10. 15. Describe the three data fragmentation strategies. Give some examples of each. See section 12-11a. 16. What is data replication, and what are the three replication strategies? See section 12-11b. 17. What are the two basic styles of data replication? There are basically two styles of replication: • Push replication. In this case, the originating DP node sends the changes to the replica nodes to ensure that all data are mutually consistent. • Pull replication. The originating DP node notifies the replica nodes so they can pull the updates one their own time. See section 12-11b for more information. 18. What trade-offs are involved in building highly distributed data environments? In the year 2000, Dr. Eric Brewer stated in a presentation that: “in any highly distributed data system there are three common desirable properties: consistency, availability and partition tolerance. However, it is impossible for a system to provide all three properties at the same time.” Therefore, the system designers have to balance the trade-offs of these properties in order to provide a workable system. This is what is known as the CAP theorem. For more information on this, see section 12-12. 19. How does a BASE system differ from a traditional distributed database system? A traditional database system enforces the ACID properties as to ensure that all database transactions yield a database in a consistent state. In a centralized database system, all data resides in a centralized node. However, in a distributed database system data are located in multiple geographically disperse sites connected via a network. In such cases, network latency and network partitioning impose a new level of complexity. In most highly distributed systems, designers tend to emphasize availability over data consistency and partition tolerance. This trade-off has given way to a new type of database systems in which data are basically available, soft state and eventually consistent (BASE). For more information about BASE systems see section12-12. Problem Solutions The first problem is based on the DDBMS scenario in Figure P12.1. Figure P12.1 The DDBMS Scenario for Problem 1 1. Specify the minimum types of operations the database must support to perform the following operations. These opertaions should include remote request, remote transaction, distributed transaction, and distributed requests in order to perform the following operations. NOTE To answer the following questions, remind the students that the key to each answer is in the number of different data processors that are accessed by each request/transaction. Ask the students to first identify how many different DP sites are to be accessed by the transaction/request. Next, remind the students that a distributed request is necessary if a single SQL statement is to access more than one DP site. Use the following summary: Number of DPs Operation 1 > 1 Request Remote Distributed Transaction Remote Distributed Based on this summary, the questions are answered easily. At C: a. SELECT * FROM CUSTOMER; This SQL sequence represents a remote request. b. SELECT * FROM INVOICE WHERE INV_TOTAL < 1000; This SQL sequence represents a remote request. c. SELECT * FROM PRODUCT WHERE PROD_QOH 1000; This SQL sequence represents a remote request, because it accesses only one remote DP site. h. SELECT * FROM PRODUCT WHERE PROD_QOH 1000 AND CUSTOMER.CUS_NUM = INVOICE.CUS_NUM; This SQL sequence represents a distributed request. k. SELECT * FROM PRODUCT WHERE PROD_QOH < 10; This SQL sequence represents a distributed request. (See explanation for part h.) 2. The following data structure and constraints exist for a magazine publishing company. a. The company publishes one regional magazine each in Florida (FL), South Carolina (SC), Georgia (GA), and Tennessee (TN). b. The company has 300,000 customers (subscribers) distributed throughout the four states listed in Part 2a. c. On the first of each month, an annual subscription INVOICE is printed and sent to each customer whose subscription is due for renewal. The INVOICE entity contains a REGION attribute to indicate the customer’s state of residence (FL, SC, GA, TN): CUSTOMER (CUS_NUM, CUS_NAME, CUS_ADDRESS, CUS_CITY, CUS_STATE, CUS_ZIP, CUS_SUBSDATE) INVOICE (INV_NUM, INV_REGION, CUS_NUM, INV_DATE, INV_TOTAL) The company is aware of the problems associated with centralized management and has decided that it is time to decentralize the management of the subscriptions in its four regional subsidiaries. Each subscription site will handle its own customer and invoice data. The management at company headquarters, however, will have access to customer and invoice data to generate annual reports and to issue ad hoc queries, such as: • List all current customers by region. • List all new customers by region. • Report all invoices by customer and by region. Given these requirements, how must you partition the database? The CUSTOMER table must be partitioned horizontally by state. (We show the partitions in the answer to 3c.) 3. Given the scenario and the requirements in Problem 2, answer the following questions: a. What recommendations will you make regarding the type and characteristics of the required database system? The Magazine Publishing Company requires a distributed system with distributed database capabilities. The distributed system will be distributed among the company locations in South Carolina, Georgia, Florida, and Tennessee. The DDBMS must be able to support distributed transparency features, such as fragmentation transparency, replica transparency, transaction transparency, and performance transparency. Heterogeneous capability is not a mandatory feature since we assume there is no existing DBMS in place and that the company wants to standardize on a single DBMS. b. What type of data fragmentation is needed for each table? The database must be horizontally partitioned, using the STATE attribute for the CUSTOMER table and the REGION attribute for the INVOICE table. c. What must be the criteria used to partition each database? The following fragmentation segments reflect the criteria used to partition each database: Horizontal Fragmentation of the CUSTOMER Table by State Fragment Name Location Condition Node name C1 Tennessee CUS_STATE = 'TN' NAS C2 Georgia CUS_STATE = 'GA' ATL C3 Florida CUS_STATE = 'FL' TAM C4 South Carolina CUS_STATE = 'SC' CHA Horizontal Fragmentation of the INVOICE Table by Region Fragment Name Location Condition Node name I1 Tennessee REGION_CODE = 'TN' NAS I2 Georgia REGION_CODE = 'GA' ATL I3 Florida REGION_CODE = 'FL' TAM I4 South Carolina REGION_CODE = 'SC' CHA d. Design the database fragments. Show an example with node names, location, fragment names, attribute names, and demonstration data. Note the following fragments: Fragment C1 Location: Tennessee Node: NAS CUS_NUM CUS_NAME CUS_ADDRESS CUS_CITY CUS_STATE CUS_SUB_DATE 10884 James D. Burger 123 Court Avenue Memphis TN 8-DEC-16 10993 Lisa B. Barnette 910 Eagle Street Nashville TN 12-MAR-17 Fragment C2 Location: Georgia Node: ATL CUS_NUM CUS_NAME CUS_ADDRESS CUS_CITY CUS_STATE CUS_SUB_DATE 11887 Ginny E. Stratton 335 Main Street Atlanta GA 11-AUG-16 13558 Anna H. Ariona 657 Mason Ave. Dalton GA 23-JUN-17 Fragment C3 Location: Florida Node: TAM CUS_NUM CUS_NAME CUS_ADDRESS CUS_CITY CUS_STATE CUS_SUB_DATE 10014 John T. Chi 456 Brent Avenue Miami FL 18-NOV-16 15998 Lisa B. Barnette 234 Ramala Street Tampa FL 23-MAR-17 Fragment C4 Location: South Carolina Node: CHA CUS_NUM CUS_NAME CUS_ADDRESS CUS_CITY CUS_STATE CUS_SUB_DATE 21562 Thomas F. Matto 45 N. Pratt Circle Charleston SC 2-DEC-16 18776 Mary B. Smith 526 Boone Pike Charleston SC 28-OCT-17 Fragment I1 Location: Tennessee Node: NAS INV_NUM REGION_CODE CUS_NUM INV_DATE INV_TOTAL 213342 TN 10884 1-NOV-15 45.95 209987 TN 10993 15-FEB-16 45.95 Fragment I2 Location: Georgia Node: ATL INV_NUM REGION_CODE CUS_NUM INV_DATE INV_TOTAL 198893 GA 11887 15-AUG-15 70.45 224345 GA 13558 1-JUN-16 45.95 Fragment I3 Location: Florida Node: TAM INV_NUM REGION_CODE CUS_NUM INV_DATE INV_TOTAL 200915 FL 10014 1-NOV-15 45.95 231148 FL 15998 1-MAR-16 24.95 Fragment I4 Location: South Carolina Node: CHA INV_NUM REGION_CODE CUS_NUM INV_DATE INV_TOTAL 243312 SC 21562 15-NOV-15 45.95 231156 SC 18776 1-OCT-16 45.95 e. What type of distributed database operations must be supported at each remote site? To answer this question, you must first draw a map of the locations, the fragments at each location, and the type of transaction or request support required to access the data in the distributed database. Node Fragment NAS ATL TAM CHA Headquarters CUSTOMER C1 C2 C3 C4 INVOICE I1 I2 I3 I4 Distributed Operations Required none none none none distributed request Given the problem's specifications, you conclude that no interstate access of CUSTOMER or INVOICE data is required. Therefore, no distributed database access is required in the four nodes. For the headquarters, the manager wants to be able to access the data in all four nodes through a single SQL request. Therefore, the DDBMS must support distributed requests. f. What type of distributed database operations must be supported at the headquarters site? See the answer for part e. Chapter 13 Business Intelligence and Data Warehouses Discussion Focus Start by discussing the need for business intelligence in a highly competitive global economy. Note that Business Intelligence (BI) describes a comprehensive, cohesive, and integrated set of applications used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making. As the names implies, BI is about creating intelligence about a business. This intelligence is based on learning and understanding the facts about a business environment. BI is a framework that allows a business to transform data into information, information into knowledge, and knowledge into wisdom. BI has the potential to positively affect a company's culture by creating “business wisdom” and distributing it to all users in an organization. This business wisdom empowers users to make sound business decisions based on the accumulated knowledge of the business as reflected on recorded facts (historic operational data). Table 13.1 in the text gives some real-world examples of companies that have implemented BI tools (data warehouse, data mart, OLAP, and/or data mining tools) and shows how the use of such tools benefited the companies. Discuss the need for data analysis and how such analysis is used to make strategic decisions. The computer systems that support strategic decision-making are known as Decision Support Systems (DSS). Explain what a DSS is and what its main functional components are. (Use Figure 13.1.) The effectiveness of DSS depends on the quality of the data gathered at the operational level. Therefore, remind the students of the importance of proper operational database design -- and use this reminder to briefly review the major (production database) design issues that were explored in Chapters 3, 4, 5, and 6. Next, review Section 13-3a to illustrate how operational and decision support data differ -- use the summary in Table 13.5 --, placing special emphasis on these characteristics that form the foundation for decision support analysis: • timespan • granularity (See Section 13-3a and use Figure 13.2 to illustrate the conversion • dimensionality from operational data to DSS data.) After a thorough discussion of these three characteristics, students should be able to understand what the main DSS database requirements are. Note how these three requirements match the main characteristics of a DSS and its decision support data. After laying this foundation, introduce the data warehouse concept. A data warehouse is a database that provides support for decision making. Using Section 13-4 as the basis for your discussion, note that a data warehouse database must be: • Integrated. • Subject-oriented. • Time-variant. • Non-volatile. After you have explained each one of these four characteristics in detail, your students should understand: • What the characteristics are of the data likely to be found in a data warehouse. • How the data warehouse is a part of a BI infrastructure. Stress that the data warehouse is a major component of the BI infrastructure. Discuss the contents of Table 13.9 Twelve Rules For Data Warehouse. (See Inmon, Bill, and Chuck Kelley, "The Twelve Rules of Data Warehouse for a Client/Server World," Data Management Review, 4(5), May, 1994, pp. 6-16.) The data warehouse stores the data needed for decision support. On-Line Analytical Processing (OLAP) refers to a set of tools used by the end users to access and analyze such data. Therefore, the Data Warehouse and OLAP tools are complements to each other. By illustrating various OLAP Architectures, the instructor will help students see how: • Operational data are transformed to data warehouse data. • Data warehouse data are extracted for analysis. • Multidimensional tools are used to analyze the extracted data. The OLAP Architectures are yet another example of the application of client/server concepts to systems development. Because they are the key to data warehouse design, star schemas constitute the chapter's focal point. Therefore, make sure that the following data warehouse design components are thoroughly understood: • Facts. • Dimensions. (See Section 13-5.) • Attributes. • Attribute hierarchies. These four concepts are used to implement data warehouses in the relational database environment. Carefully explain the chapter's Sales and Orders star schema's construction to help ensure that students are equipped to handle the actual design of star schemas. Illustrate the use of performance-enhancing techniques (Section 13-5f), and OLAP (Section 13-6). Introduce Data Analytics and how it is used to extract knowledge from data. Explain the use of explanatory and predictive analytics (See section 13-7). Then, get students involved in doing some hands-on SQL examples of using SQL Analytics functions. Finally, introduce students to data visualization (See Section 13-9). Illustrate how visualization is used to quickly identify trends, patterns and relationships using Figures 13.26 and 13.27. Explain how the science of data visualization has evolved and it is used to discover the “story behind the data” – use Figure 13.28 to show the importance of the science behind data visualization. Use the Vehicle Crash Analysis dashboard in Figure 13.29 to show how data visualization can be used to quickly extract information from data. Answers to Review Questions ONLINE CONTENT Answers to selected Review Questions and Problems for this chapter are contained in the Premium Website for this book. 1. What is business intelligence? Give some recent examples of BI usage, using the Internet for assistance. What BI benefits have companies found? Business intelligence (BI) is a term used to describe a comprehensive, cohesive, and integrated set of applications used to capture, collect, integrate, store, and analyze data with the purpose of generating and presenting information used to support business decision making. As the names implies, BI is about creating intelligence about a business. This intelligence is based on learning and understanding the facts about a business environment. BI is a framework that allows a business to transform data into information, information into knowledge, and knowledge into wisdom. BI has the potential to positively affect a company's culture by creating “business wisdom” and distributing it to all users in an organization. This business wisdom empowers users to make sound business decisions based on the accumulated knowledge of the business as reflected on recorded facts (historic operational data). Table 13.1 – in the text -- gives some real-world examples of companies that have implemented BI tools (data warehouse, data mart, OLAP, and/or data mining tools) and shows how the use of such tools benefited the companies. Emphasize that the main focus of BI is to gather, integrate, and store business data for the purpose of creating information. BI integrates people and processes using technology in order to add value to the business. Such value is derived from how end users use such information in their daily activities, and in particular, their daily business decision making. Also note that the BI technology components are varied. Examples of BI usage found in web sources: 1) The Dallas Teachers Credit Union (DTCU), used geographical data analysis to increase its customer base from 250,000 professional educators to 3.5 million potential customers - virtually overnight. The increase gave the credit union the ability to compete with larger banks that had a strong presence in Dallas. [http://www.computerworld.com/s/article/47371/Business_Intelligence?taxonomyId=120] 2) Researchers from the Rand Corporation, recently applied business intelligence and analytics technology to determine the dangerous side effects of prescription drugs. [http://www.panorama.com/industry-news/article-view.html?name=Analytics-spots-prescription-problems-508338] 3) Microsoft Case Study web site for hundreds of cases about Business Intelligence usage. [http://www.microsoft.com/casestudies/] 2. Describe the BI framework. Illustrate the evolution of BI. BI is not a product by itself, but a framework of concepts, practices, tools, and technologies that help a business better understand its core capabilities, provide snapshots of the company situation, and identify key opportunities to create competitive advantage. In practice, BI provides a well-orchestrated framework for the management of data that works across all levels of the organization. BI involves the following general steps: 1. Collecting and storing operational data 2. Aggregating the operational data into decision support data 3. Analyzing decision support data to generate information 4. Presenting such information to the end user to support business decisions 5. Making business decisions, which in turn generate more data that is collected, stored, etc. (restarting the process). 6. Monitoring results to evaluate outcomes of the business decisions (providing more data to be collected, stored, etc.) To implement all these steps, BI uses varied components and technologies. Section 13-2 is where you’ll find a discussion of these components and technologies – see Table 13.2. Figure 13.2 illustrates the evolution of BI formats. 3. What are decision support systems, and what role do they play in the business environment? Decision Support Systems (DSS) are based on computerized tools that are used to enhance managerial decision-making. Because complex data and the proper analysis of such data are crucial to strategic and tactical decision making, DSS are essential to the well-being and even survival of businesses that must compete in a global market place. 4. Explain how the main components of the BI architecture interact to form a system. Describe the evolution of BI information dissemination formats. Refer the students to section 13-3 in the chapter. Emphasize that, actually, there is no single BI architecture; instead, it ranges from highly integrated applications from a single vendor to a loosely integrated, multi-vendor environment. However, there are some general types of functionality that all BI implementations share. Like any critical business IT infrastructure, the BI architecture is composed of data, people, processes, technology, and the management of such components. Figure 13.1 (in the text) depicts how all those components fit together within the BI framework. Figure 13.2, in section 13-2c “Business Intelligence Evolution”, tracks the changes of business intelligence reporting and information dissemination over time. In summary: 1) 1970s: centralized reports running on mainframes, minicomputers, or even central server environments. Such reports were predefined and took considerable time to process. 2) 1980s: desktop computers, downloaded spreadsheet data from central locations. 3) 1990s: first generation DSS, centralized reporting and OLAP. 4) 2000s: BI web-based dashboards and mobile BI. 5) 2010s - Present: Big Data, NoSQL, Data Visualization 5. What are the most relevant differences between operational and decision support data? Operational data and decision support data serve different purposes. Therefore, it is not surprising to learn that their formats and structures differ. Most operational data are stored in a relational database in which the structures (tables) tend to be highly normalized. Operational data storage is optimized to support transactions that represent daily operations. For example, each time an item is sold, it must be accounted for. Customer data, inventory data, and so on, are in a frequent update mode. To provide effective update performance, operational systems store data in many tables, each with a minimum number of fields. Thus, a simple sales transaction might be represented by five or more different tables (for example, invoice, invoice line, discount, store, and department). Although such an arrangement is excellent in an operational database, it is not efficient for query processing. For example, to extract a simple invoice, you would have to join several tables. Whereas operational data are useful for capturing daily business transactions, decision support data give tactical and strategic business meaning to the operational data. From the data analyst’s point of view, decision support data differ from operational data in three main areas: times pan, granularity, and dimensionality. 1. Time span. Operational data cover a short time frame. In contrast, decision support data tend to cover a longer time frame. Managers are seldom interested in a specific sales invoice to customer X; rather, they tend to focus on sales generated during the last month, the last year, or the last five years. 2. Granularity (level of aggregation). Decision support data must be presented at different levels of aggregation, from highly summarized to near-atomic. For example, if managers must analyze sales by region, they must be able to access data showing the sales by region, by city within the region, by store within the city within the region, and so on. In that case, summarized data to compare the regions is required, but also data in a structure that enables a manager to drill down, or decompose, the data into more atomic components (that is, finer-grained data at lower levels of aggregation). In contrast, when you roll up the data, you are aggregating the data to a higher level. 3. Dimensionality. Operational data focus on representing individual transactions rather than on the effects of the transactions over time. In contrast, data analysts tend to include many data dimensions and are interested in how the data relate over those dimensions. For example, an analyst might want to know how product X fared relative to product Z during the past six months by region, state, city, store, and customer. In that case, both place and time are part of the picture. Figure 13.3 (in the text) shows how decision support data can be examined from multiple dimensions (such as product, region, and year), using a variety of filters to produce each dimension. The ability to analyze, extract, and present information in meaningful ways is one of the differences between decision support data and transaction-at-a-time operational data. The DSS components that form a system are shown in the text’s Figure 13.1. Note that: • The data store component is basically a DSS database that contains business data and business-model data. These data represent a snapshot of the company situation. • The data extraction and filtering component is used to extract, consolidate, and validate the data store. • The end user query tool is used by the data analyst to create the queries used to access the database. • The end user presentation tool is used by the data analyst to organize and present the data. 6. What is a data warehouse, and what are its main characteristics? How does it differ from a data mart? A data warehouse is an integrated, subject-oriented, time-variant and non-volatile database that provides support for decision-making. (See section 13-4 for an in-depth discussion about the main characteristics.) The data warehouse is usually a read-only database optimized for data analysis and query processing. Typically, data are extracted from various sources and are then transformed and integrated—in other words, passed through a data filter—before being loaded into the data warehouse. Users access the data warehouse via front-end tools and/or end-user application software to extract the data in usable form. Figure 13.4 in the text illustrates how a data warehouse is created from the data contained in an operational database. You might be tempted to think that the data warehouse is just a big summarized database. But a good data warehouse is much more than that. A complete data warehouse architecture includes support for a decision support data store, a data extraction and integration filter, and a specialized presentation interface. To be useful, the data warehouse must conform to uniform structures and formats to avoid data conflicts and to support decision making. In fact, before a decision support database can be considered a true data warehouse, it must conform to the twelve rules described in section 13-4b and illustrated in Table 13.9.. 7. Give three examples of likely problems when operational data are integrated into the data warehouse. Within different departments of a company, operational data may vary in terms of how they are recorded or in terms of data type and structure. For instance, the status of an order may be indicated with text labels such as "open", "received", "cancel", or "closed" in one department while another department has it as "1", "2", "3", or "4". The student status can be defined as "Freshman", "Sophomore", "Junior", or "Senior" in the Accounting department and as "FR", "SO", "JR", or "SR" in the Computer Information Systems department. A social security number field may be stored in one database as a string of numbers and dashes ('XXX-XX-XXXX'), in another as a string of numbers without the dashes ('XXXXXXXXX'), and in yet a third as a numeric field (#########). Most of the data transformation problems are related to incompatible data formats, the use of synonyms and homonyms, and the use of different coding schemes. Use the following scenario to answer questions 8 through 14. While working as a database analyst for a national sales organization, you are asked to be part of its data warehouse project team. 8. Prepare a high-level summary of the main requirements for evaluating DBMS products for data warehousing. There are four primary ways to evaluate a DBMS that is tailored to provide fast answers to complex queries: • the database schema supported by the DBMS • the availability and sophistication of data extraction and loading tools • the end user analytical interface • the database size requirements Establish the requirements based on the size of the database, the data sources, the necessary data transformations, and the end user query requirements. Determine what type of database is needed, i.e., a multidimensional or a relational database using the star schema. Other valid evaluation criteria include the cost of acquisition and available upgrades (if any), training, technical and development support, performance, ease of use, and maintenance. 9. Your data warehousing project group is debating whether to create a prototype of a data warehouse before its implementation. The project group members are especially concerned about the need to acquire some data warehousing skills before implementing the enterprise-wide data warehouse. What would you recommend? Explain your recommendations. Knowing that data warehousing requires time, money, and considerable managerial effort, many companies create data marts, instead. Data marts use smaller, more manageable data sets that are targeted to fit the special needs of small groups within the organization. In other words, data marts are small, single-subject data warehouse subsets. Data mart development and use costs are lower and the implementation time is shorter. Once the data marts have demonstrated their ability to serve the DSS, they can be expanded to become data warehouses or they can be migrated into larger existing data warehouses. 10. Suppose you are selling the data warehouse idea to your users. How would you define multidimensional data analysis for them? How would you explain its advantages to them? Multidimensional data analysis refers to the processing of data in which data are viewed as part of a multidimensional structure, one in which data are related in many different ways. Business decision makers usually view data from a business perspective. That is, they tend to view business data as they relate to other business data. For example, a business data analyst might investigate the relationship between sales and other business variables such as customers, time, product line, and location. The multidimensional view is much more representative of a business perspective. A good way to visualize the data is to use tools such as pivot tables in MS Excel or data visualization products such as MS Power BI, Tableau Software’s Tableau or QlikView. 11. The data warehousing project group has invited you to provide an OLAP overview. The group’s members are particularly concerned about the OLAP client/server architecture requirements and how OLAP will fit the existing environment. Your job is to explain to them the main OLAP client/server components and architectures. OLAP systems are based on client/server technology and they consist of these main modules: • OLAP Graphical User Interface (GUI) • OLAP Analytical Processing Logic • OLAP Data Processing Logic. The location of each of these modules is a function of different client/server architectures. How and where the modules are placed depends on hardware, software, and professional judgment. Any placement decision has its own advantages or disadvantages. However, the following constraints must be met: • The OLAP GUI is always placed in the end user's computer. The reason it is placed at the client side is simple: this is the main point of contact between the end user and the system. Specifically, it provides the interface through which the end user queries the data warehouse's contents. • The OLAP Analytical Processing Logic (APL) module can be place in the client (for speed) or in the server (for better administration and better throughput). The APL performs the complex transformations required for business data analysis, such as multiple dimensions, aggregation, period comparison, and so on. • The OLAP Data Processing Logic (DPL) maps the data analysis requests to the proper data objects in the Data Warehouse and is, therefore, generally placed at the server level. 12. One of your vendors recommends using an MDBMS. How would you explain this recommendation to your project leader? Multidimensional On-Line Analytical Processing (MOLAP) provides OLAP functionality using multidimensional databases (MDBMS) to store and analyze multidimensional data. Multidimensional database systems (MDBMS) use special proprietary techniques to store data in matrix-like arrays of n-dimensions. 13. The project group is ready to make a final decision between ROLAP and MOLAP. What should be the basis for this decision? Why? The basis for the decision should be the system and end user requirements. Both ROLAP and MOLAP will provide advanced data analysis tools to enable organizations to generate required information. The selection of one or the other depends on which set of tools will fit best within the company's existing expertise base, its technology and end user requirements, and its ability to perform the job at a given cost. The proper OLAP/MOLAP selection criteria must include: • purchase and installation price • supported hardware and software • compatibility with existing hardware, software, and DBMS • available programming interfaces • performance • availability, extent, and type of administrative tools • support for the database schema(s) • ability to handle current and projected database size • database architecture • available resources • flexibility • scalability • total cost of ownership. 14. The data warehouse project is in the design phase. Explain to your fellow designers how you would use a star schema in the design. The star schema is a data modeling technique that is used to map multidimensional decision support data into a relational database. The reason for the star schema's development is that existing relational modeling techniques, E-R and normalization, did not yield a database structure that served the advanced data analysis requirements well. Star schemas yield an easily implemented model for multidimensional data analysis while still preserving the relational structures on which the operational database is built. The basic star schema has two four components: facts, dimensions, attributes, and attribute hierarchies. The star schemas represent aggregated data for specific business activities. Using the schemas, we will create multiple aggregated data sources that will represent different aspects of business operations. For example, the aggregation may involve total sales by selected time periods, by products, by stores, and so on. Aggregated totals can be total product units, total sales values by products, etc. 15. Briefly discuss OLAP architectural styles with and without data marts. Section 13-6d, “OLAP Architecture”, details the basic architectural components of an OLAP environment: • The graphical user interface (GUI front-end) – located always at the end-user end. • The analytical processing logic – this component could be located in the back end (OLAP server) or could be split between the back end and front-end components. • Data processing logic – logic used to extract data from data; typically located in the back-end. The term OLAP “engine” is sometimes used to refer to the arrangement of the OLAP components as a whole. However, the architecture allows for the split of the some of the components in a client/server arrangement as depicted in Figures 13.16 and 13.17. Figure 13.16 shows a typical OLAP architecture without data marts. In this architecture, the OLAP tool will extract data from the data warehouse and process the data to be presented by the end-user GUI. The processing of the data takes place mostly on the OLAP engine. The OLAP engine location could be located in each client computer or it could be shared from an OLAP “server”. Figure 13.17 shows a typical OLAP architecture with local data marts (end-user located). The local data marts are “miniature” data warehouses that focus in a subset of the data in the data warehouse. Normally these data marts are subject oriented, such as customers, products, sales, etc. The local data marts provide faster processing but require that the data be periodically “synchronized” with the main data warehouse. 16. What is OLAP, and what are its main characteristics? OLAP stands for On-Line Analytical Processing and uses multidimensional data analysis techniques. OLAP yields an advanced data analysis environment that provides the framework for decision making, business modeling, and operations research activities. Its four main characteristics are: 1. Multidimensional data analysis techniques 2. Advanced database support 3. Easy to use end user interfaces 4. Support for client/server architecture. 17. Explain ROLAP, and list the reasons you would recommend its use in the relational database environment. Relational On-Line Analytical Processing (ROLAP) provides OLAP functionality for relational databases. ROLAP's popularity is based on the fact that it uses familiar relational query tools to store and analyze multidimensional data. Because ROLAP is based on familiar relational technologies, it represents a natural extension to organizations that already use relational database management systems within their organizations. 18. Explain the use of facts, dimensions, and attributes in the star schema. Facts are numeric measurements (values) that represent a specific business aspect or activity. For example, sales figures are numeric measurements that represent product and/or service sales. Facts commonly used in business data analysis are units, costs, prices, and revenues. Facts are normally stored in a fact table, which is the center of the star schema. The fact table contains facts that are linked through their dimensions. Dimensions are qualifying characteristics that provide additional perspectives to a given fact. Dimensions are of interest to us, because business data are almost always viewed in relation to other data. For instance, sales may be compared by product from region to region, and from one time period to the next. The kind of problem typically addressed by DSS might be "make a comparison of the sales of product units of X by region for the first quarter from 1995 through 2005." In this example, sales have product, location, and time dimensions. Dimensions are normally stored in dimension tables. Each dimension table contains attributes. The attributes are often used to search, filter, or classify facts. Dimensions provide descriptive characteristics about the facts through their attributes. Therefore, the data warehouse designer must define common business attributes that will be used by the data analyst to narrow down a search, group information, or describe dimensions. For example, we can identify some possible attributes for the product, location and time dimensions: • Product dimension: product id, description, product type, manufacturer, etc. • Location dimension: region, state, city, and store number. • Time dimension: year, quarter, month, week, and date. These product, location, and time dimensions add a business perspective to the sales facts. The data analyst can now associate the sales figures for a given product, in a given region, and at a given time. The star schema, through its facts and dimensions, can provide the data when they are needed and in the required format, without imposing the burden of additional and unnecessary data (such as order #, po #, status, etc.) that commonly exist in operational databases. In essence, dimensions are the magnifying glass through which we study the facts. 19. Explain multidimensional cubes, and describe how the slice and dice technique fits into this model. To explain the multidimensional cube concept, let's assume a sales fact table with three dimensions: product, location, and time. In this case, the multidimensional data model for the sales example is (conceptually) best represented by a three-dimensional cube. This cube represents the view of sales dimensioned by product, location, and time. (We have chosen a three-dimensional cube because such a cube makes it easier for humans to visualize the problem. There is, of course, no limit to the number of dimensions we can use.) The power of multidimensional analysis resides in its ability to focus on specific slices of the cube. For example, the product manager may be interested in examining the sales of a product, thus producing a slice of the product dimension. The store manager may be interested in examining the sales of a store, thus producing a slice of the location dimension. The intersection of the slices yields smaller cubes, thereby producing the "dicing" of the multidimensional cube. By examining these smaller cubes within the multidimensional cube, we can produce very precise analyses of the variable components and interactions. In short, Slice and dice refers to the process that allows us to subdivide a multidimensional cube. Such subdivisions permit a far more detailed analysis than would be possible with the conventional two-dimensional data view. The text's Section 13-5 and Figures 13.5 through 13.9 illustrate the slice and dice concept. To gain the benefits of slice and dice, we must be able to identify each slice of the cube. Slice identification requires the use of the values of each attribute within a given dimension. For example, to slice the location dimension, we can use a STORE_ID attribute in order to focus on a given store. 20. In the star schema context, what are attribute hierarchies and aggregation levels and what is their purpose? Attributes within dimensions can be ordered in an attribute hierarchy. The attribute hierarchy yields a top-down data organization that permits both aggregation and drill-down/roll-up data analysis. Use Figure Q13.18 to show how the attributes of the location dimension can be organized into a hierarchy that orders that location dimension by region, state, city, and store. Figure Q13.18 A Location Attribute Hierarchy The attribute hierarchy gives the data warehouse the ability to perform drill-down and roll-up data searches. For example, suppose a data analyst wants an answer to the query "How does the 2005 total monthly sales performance compare to the 2000 monthly sales performance?" Having performed the query, suppose that the data analyst spots a sharp total sales decline in March, 2005. Given this discovery, the data analyst may then decide to perform a drill-down procedure for the month of March to see how this year's March sales by region stack up against last year's. The drill-down results are then used to find out whether the low over-all March sales were reflected in all regions or only in a particular region. This type of drill-down operation may even be extended until the data analyst is able to identify the individual store(s) that is (are) performing below the norm. The attribute hierarchy allows the data warehouse and OLAP systems to use a carefully defined path that will govern how data are to be decomposed and aggregated for drill-down and roll-up operations. Of course, keep in mind that it is not necessary for all attributes to be part of an attribute hierarchy; some attributes exist just to provide narrative descriptions of the dimensions. 21. Discuss the most common performance improvement techniques used in star schemas. The following four techniques are commonly used to optimize data warehouse design: • Normalization of dimensional tables is done to achieve semantic simplicity and to facilitate end user navigation through the dimensions. For example, if the location dimension table contains transitive dependencies between region, state, and city, we can revise these relationships to the third normal form (3NF). By normalizing the dimension tables, we simplify the data filtering operations related to the dimensions. • We can also speed up query operations by creating and maintaining multiple fact tables related to each level of aggregation. For example, we may use region, state, and city in the location dimension. These aggregate tables are pre-computed at the data loading phase, rather than at run-time. The purpose of this technique is to save processor cycles at run-time, thereby speeding up data analysis. An end user query tool optimized for decision analysis will then properly access the summarized fact tables, instead of computing the values by accessing a "lower level of detail" fact table. • Denormalizing fact tables is done to improve data access performance and to save data storage space. The latter objective, storage space savings, is becoming less of a factor: Data storage costs are on a steeply declining path, decreasing almost daily. DBMS limitations that restrict database and table size limits, record size limits, and the maximum number of records in a single table, are far more critical than raw storage space costs. Denormalization improves performance by storing in one single record what normally would take many records in different tables. For example, to compute the total sales for all products in all regions, we may have to access the region sales aggregates and summarize all the records in this table. If we have 300,000 product sales records, we wind up summarizing at least 300,000 rows. Although such summaries may not be a very taxing operation for a DBMS initially, a comparison of ten or twenty years' worth of sales is likely to start bogging the system down. In such cases, it will be useful to have special aggregate tables, which are denormalized. For example a YEAR_TOTAL table may contain the following fields: YEAR_ID, MONTH_1, MONTH_2,....MONTH12, YEAR_TOTAL Such a denormalized YEAR_TOTAL table structure works well to become the basis for year-to-year comparisons at the month level, the quarter level, or the year level. But keep in mind that design criteria such as frequency of use and performance requirements are evaluated against the possible overload placed on the DBMS to manage these denormalized relations. • Table partitioning and replication are particularly important when a DSS is implemented in widely dispersed geographic areas. Partitioning will split a table into subsets of rows or columns. These subsets can then be placed in or near the client computer to improve data access times. Replication makes a copy of a table and places it in a different location for the same reasons. 22. What is data analytics? Briefly define explanatory and predictive analytics. Give some examples. Data analytics is a subset of BI functionality that encompasses a wide range of mathematical, statistical, and modeling techniques with the purpose of extracting knowledge from data. Data analytics is used at all levels within the BI framework, including queries and reporting, monitoring and alerting, and data visualization. Hence, data analytics is a “shared” service that is crucial to what BI adds to an organization. Data analytics represents what business managers really want from BI: the ability to extract actionable business insight from current events and foresee future problems or opportunities. Data analytics discovers characteristics, relationships, dependencies, or trends in the organization’s data, and then explains the discoveries and predicts future events based on the discoveries. Data analytics tools can be grouped into two separate (but closely related and often overlapping) areas: • Explanatory analytics focuses on discovering and explaining data characteristics and relationships based on existing data. Explanatory analytics uses statistical tools to formulate hypotheses, test them, and answer the how and why of such relationships—for example, how do past sales relate to previous customer promotions? • Predictive analytics focuses on predicting future data outcomes with a high degree of accuracy. Predictive analytics uses sophisticated statistical tools to help the end user create advanced models that answer questions about future data occurrences—for example, what would next month’s sales be based on a given customer promotion? 23. Describe and contrast the focus of data mining and predictive analytics. Give some examples. In practice, data analytics is better understood as a continuous spectrum of knowledge acquisition that goes from discovery to explanation to prediction. The outcomes of data analytics then become part of the information framework on which decisions are built. You can think of data mining (explanatory analytics) as explaining the past and present, while predictive analytics forecasts the future. However, you need to understand that both sciences work together; predictive analytics uses explanatory analytics as a stepping stone to create predictive models. Data mining refers to analyzing massive amounts of data to uncover hidden trends, patterns, and relationships; to form computer models to simulate and explain the findings; and then to use such models to support business decision making. In other words, data mining focuses on the discovery and explanation stages of knowledge acquisition. However, data mining can also be used as the basis to create advanced predictive data models. For example, a predictive model could be used to predict future customer behavior, such as a customer response to a target marketing campaign. So, what is the difference between data mining and predictive analytics? In fact, data mining and predictive analytics use similar and overlapping sets of tools, but with a slightly different focus. Data mining focuses on answering the “how” and “what” of past data, while predictive analytics focuses on creating actionable models to predict future behaviors and events. In some ways, you can think of predictive analytics as the next logical step after data mining; once you understand your data, you can use the data to predict future behaviors. In fact, most BI vendors are dropping the term data mining and replacing it with the more alluring term predictive analytics. Predictive analytics can be traced back to the banking and credit card industries. The need to profile customers and predict customer buying patterns in these industries was a critical driving force for the evolution of many modeling methodologies used in BI data analytics today. For example, based on your demographic information and purchasing history, a credit card company can use data-mining models to determine what credit limit to offer, what offers you are more likely to accept, and when to send those offers. Another example, a data mining tool could be used to analyze customer purchase history data. The data mining tool will find many interesting purchasing patterns, and correlations about customer demographics, timing of purchases and the type of items they purchase together. The predictive analytics tool will use those finding to build a model that will predict with high degree of accuracy when a certain type of customer will purchase certain items and what items are likely to be purchased on certain nights and times. 24. How does data mining work? Discuss the different phases in the data mining process. Data mining is subject to four phases: • In the data preparation phase, the main data sets to be used by the data mining operation are identified and cleansed from any data impurities. Because the data in the data warehouse are already integrated and filtered, the Data Warehouse usually is the target set for data mining operations. • The data analysis and classification phase objective is to study the data to identify common data characteristics or patterns. During this phase the data mining tool applies specific algorithms to find: data groupings, classifications, clusters, or sequences. data dependencies, links, or relationships. data patterns, trends, and deviations. • The knowledge acquisition phase uses the results of the data analysis and classification phase. During this phase, the data mining tool (with possible intervention by the end user) selects the appropriate modeling or knowledge acquisition algorithms. The most typical algorithms used in data mining are based on neural networks, decision trees, rules induction, genetic algorithms, classification and regression trees, memory-based reasoning, or nearest neighbor and data visualization. A data mining tool may use many of these algorithms in any combination to generate a computer model that reflects the behavior of the target data set. • Although some data mining tools stop at the knowledge acquisition phase, others continue to the prognosis phase. In this phase, the data mining findings are used to predict future behavior and forecast business outcomes. Examples of data mining findings can be: 65% of customers who did not use the credit card in six months are 88% likely to cancel their account 82% of customers who bought a new TV 27" or bigger are 90% likely to buy a entertainment center within the next 4 weeks. If age < 30 and income <= 25,0000 and credit rating 25,000, the minimum term is 10 years. The complete set of findings can be represented in a decision tree, a neural net, a forecasting model or a visual presentation interface which is then used to project future events or results. For example the prognosis phase may project the likely outcome of a new product roll-out or a new marketing promotion. 25. Describe the characteristics of predictive analytics. What is the impact of Big Data in predictive analytics? Predictive analytics employs mathematical and statistical algorithms, neural networks, artificial intelligence, and other advanced modeling tools to create actionable predictive models based on available data. The algorithms used to build the predictive model are specific to certain types of problems and work with certain types of data. Therefore, it is important that the end user, who typically is trained in statistics and understands business, applies the proper algorithms to the problem in hand. However, thanks to constant technology advances, modern BI tools automatically apply multiple algorithms to find the optimum model. Most predictive analytics models are used in areas such as customer relationships, customer service, customer retention, fraud detection, targeted marketing, and optimized pricing. Predictive analytics can add value to an organization in many different ways; for example, it can help optimize existing processes, identify hidden problems, and anticipate future problems or opportunities. However, predictive analytics is not the “secret sauce” to fix all business problems. Managers should carefully monitor and evaluate the value of predictive analytics models to determine their return on investment. Predictive analytics received a big stimulus with the advent of social media. Companies turned to data mining and predictive analytics as a way to harvest the mountains of data stored on social media sites. Google was one of the first companies that offered targeted ads as a way to increase and personalize search experiences. Similar initiatives were used by all types of organizations to increase customer loyalty and drive up sales. Take the example of the airline and credit card industries and their frequent flyer and affinity card programs. Nowadays, many organizations use predictive analytics to profile customers in an attempt to get and keep the right ones, which in turn will increase loyalty and sales. 26. Describe Data Visualization? What is the goal of data visualization? Data visualization is the process of abstracting data to provide a visual data representation that enhances the user’s ability to comprehend the meaning of the data. The goal of data visualization is to allow the user to quickly and efficiently see the data’s big picture by identifying trends, patterns and relationships. 27. Is data visualization only useful when used with Big Data? Explain and Expand. It is a mistake to think that data visualization is useful only when dealing with Big Data. Any organization (regardless of size) that collects and uses data in its daily activities can benefit from the use of data analytics and visualization techniques. We all have heard the saying “a picture is worth a thousand words,” and this has never been more accurate than in data visualization. Tables with hundreds, thousands, or millions of rows of data cannot be processed by the human mind in a meaningful way. Providing summarized tabular data to managers does not give them enough insight into the meaning of the data to make informed decisions. Data visualization encodes the data into visually rich formats (mostly graphical) that provide at-a-glance insight into overall trends, patterns, and possible relationships. Data visualization techniques range from simple to very complex, and many are familiar. Such techniques include pie charts, line graphs, bar charts, bubble charts, bubble maps, donut charts, scatter plots, Gantt charts, heat maps, histograms, time series plots, steps charts, waterfall charts, and many more. The tools used in data visualization range from a simple spreadsheet (such as MS Excel) to advanced data visualization software such as Tableau, Microsoft PowerBI, Domo, and Qlik.4 Common productivity tools such as Microsoft Excel can often provide surprisingly powerful data visualizations. Excel has long included basic charting and PivotTable and PivotChart capabilities for visualizing spreadsheet data. More recently, the introduction of the PowerPivot add-in has eliminated row and column data limitations and allows for the integration of data from multiple sources. This puts powerful data visualization capabilities within reach of most business users. 28. As a discipline As a discipline, data visualization can be studied as _a group of visual communication techniques used to explore and discover data insights by applying: pattern recognition, spatial awareness and aesthetics. As a discipline, data visualization can be studied as a group of visual communication techniques used to explore and discover data insights by applying pattern recognition, spatial awareness, and aesthetics. These techniques help transform complex data into understandable and actionable visual representations. 29. Describe the different types of data and how they map to star schemas and data analysis. Give some examples of the different data types In general, there are two types of data: • Qualitative: describes qualities of the data. This type of data can be subdivided in two subtypes: –– Nominal: This is data that can be counted but not ordered or aggregated. Examples: sex (male or female); student class (graduate or undergraduate). –– Ordinal: This is data that can be counted and ordered but not aggregated. Examples: rate your teacher (excellent, good, fair, poor), what is your family income (under 20,000, 20,001 to 40,000, 40,001 to 60,000, 60,001 or more). • Quantitative: describes numeric facts or measures of the data. This type of data can be counted, ordered and aggregated. Statisticians refer to this data as “interval and ratio” data. Examples of quantitative data include age, GPA, number of accidents, etc. You can think of qualitative data as being the dimensions on a star schema and the quantitative data as being the facts of a star schema. This is important because it means that you must use the correct type of functions and operations with each data type, including the proper way to visually represent it. 30. What five graphical data characteristics does data visualization use to highlight and contrast data findings and convey a story? Data visualization uses shape, color, size, position, and group/order to represent and highlight data in certain ways. The way you visualize the data tells a story and has an impact on the end users. Some data visualizations can provide unknown insights and others can be a way to draw attention to an issue. When used correctly, data visualization can tell the story behind the data. For example you can use data visualization to explore data and provide some useful data insights using vehicle crash data for the state of Iowa, available at https://catalog.data.gov/. The data set contains data on car accidents in Iowa from 2010 to early 2015. See Figure 13.29 in the textbook and explain how you can quickly identify some trends and characteristics using either Excel or a toll such as Tableau. NOTE: The data files for this chapter has a Dashboards folder that contains two complete data sets: H1B Visa data and Vehicle Crash data. In addition, there are sample dashboards in Excel, MS Power BI and Tableau. Read the included documentation explaining some of the data transformations applied to the raw data. Problem Solutions ONLINE CONTENT The databases used for this problem set are available at www.cengagebrain.com. These databases are stored in Microsoft Access 2000 format. The databases, named Ch13_P1.mdb, Ch13_P3.mdb, and Ch13_P4.mdb, contain the data for Problems 1, 3, and 4, respectively. The data for Problem 2 are stored in Microsoft Excel format at www.cengagebrain.com. The spreadsheet filename is Ch13_P2.xls. 1. The university computer lab's director keeps track of the lab usage, as measured by the number of students using the lab. This particular function is very important for budgeting purposes. The computer lab director assigns you the task of developing a data warehouse in which to keep track of the lab usage statistics. The main requirements for this database are to • Show the total number of users by different time periods. • Show usage numbers by time period, by major, and by student classification. • Compare usage for different major and different semesters. Use the Ch13_P1.mdb database, which includes the following tables: USELOG contains the student lab access data STUDENT is a dimension table containing student data Given the three preceding requirements, and using the Ch13_P1.mdb data, complete the following problems: a. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) b. Define and describe the appropriate dimensions. (Hint: These dimensions become the source for the design of the dimension tables.) c. Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b. d. Define the attributes for each of the dimensions in Problem 1b. e. Recommend the appropriate attribute hierarchies. f. Implement your data warehouse design, using the star schema you created in Problem 1c and the attributes you defined in Problem 1d. g. Create the reports that will meet the requirements listed in this problem’s introduction. Before problems 1 a-g can be answered, the students must create the time and semester dimensions. Looking at the data in the USELOG table, the students should be able to figure out that the data belong to the Fall 2017 and Spring 2018 semesters, so the semester dimension must contain entries for at least these two semesters. The time dimension can be defined in several different ways. It will be very useful to provide class time during which students can explore the different benefits derived from various ways to represent the time dimension. Regardless of what time dimension representation is selected, it is clear that the date and time entries in the USELOG must be transformed to meet the TIME and SEMESTER codes. For data analysis purposes, we suggest using the TIME and SEMESTER dimension table configurations shown in Tables P13.1A and P13.1B. (We have used these configurations in the DW-P1sol.MDB database that is located on the CD.) Table P13.1A The TIME Dimension Table Structure TIME_ID TIME_DESCRIPTION BEGIN_TIME END_TIME 1 Morning 6:01AM 12:00PM 2 Afternoon 12:01PM 6:00PM 3 Night 6:01PM 6:00AM Table P13.1B The SEMESTER Dimension Table Structure SEMESTER_ID SEMESTER_DESCRIPTION BEGIN_DATE END_DATE FA17 Fall 2017 15-Aug-2017 18-Dec-2017 SP18 Spring 2018 08-Jan-2018 15-May-2018 The USELOG table contains only the date and time of the access, rather than the semester or time IDs. The student must create the TIME and SEMESTER dimension tables and assign the proper TIME_ID and SEMESTER_ID keys to match the USELOG's time and date. The students should also create the MAJOR dimension table, using the data already stored in the STUDENT table. Using Microsoft Access, we used the Make New Table query type to produce the MAJOR table. The Make New Table query lets you create a new table, MAJOR, using query output. In this case, the query must select all unique major codes and descriptions. The same technique can be used to create the student classification dimension table (In our solution, we have named the student classification dimension table CLASS.) Naturally, you can use some front-end tool other than Access, but we have found Access to be particularly effective in this environment. To produce the solution we have stored in the PW-P1sol.MBD database, we have used the queries listed in Table P13.1C. Table P13.1C The Queries in the DW_P1sol.MDB Database Query Name Query Description Update DATE format in USELOG The DATE field in USELOG was originally given to us as a character field. This query converted the date text to a date field we can use for date comparisons. Update STUDENT_ID format in STUDENT This query changes the STUDENT_ID format to make it compatible with the format used in USELOG. Update STUDENT_ID format in USELOG This query changes the STUDENT_ID format to make it compatible with the format used in STUDENT. Append TEST records from USELOG & STUDENT Creates a temporary storage table (TEST) used to make some data transformations previous the creation of the fact table. The TEST table contains the fields that will be used in the USEFACT table, plus other fields used for data transformation purposes. Update TIME_ID and SEMESTER_ID in TEST Before we create the USEFACT table, we must transform the dates and time to match the SEMESTER_ID and TIME_ID keys used in our SEMESTER and TIME dimension tables. This query does that. Count STUDENTS sort by Fact Keys: SEM, MAJOR, CLASS, TIME. This query does data aggregation over the data in TEST table. This query table will be used to create the new USEFACT table. Populate USEFACT This query uses the results of the previous query to populate our USEFACT table. Compares usage by Semesters by Times Used to generate Report1 Usage by Time, Major and Classification Used to generate Report2 Usage by Major and Semester Used to generate Report3 Having completed the preliminary work, we can now present the solutions to the seven problems: a. Define the main facts to be analyzed. (Hint: These facts become the source for the design of the fact table.) The main facts are the total number of students by time, the major, the semester, and the student classification. b. Define and describe the possible dimensions. (Hint: These dimensions become the source for the design of the dimension tables.) The possible dimensions are semester, major, classification, and time. Each of these dimensions provides an additional perspective to the total number of students fact table. The dimension table names and attributes are shown in the screen shot that illustrates the answer to problem 3. c. Draw the lab usage star schema, using the fact and dimension structures you defined in Problems 1a and 1b. Figure P13.1c shows the MS Access relational diagram – see the Ch13-P1sol.mdb database in the Student Online Companion -- to illustrate the star schema, the relationships, the table names, and the field names used in our solution. The students are given only the USELOG and STUDENT tables and they must produce the fact table and dimension tables. Figure P13.1c The Microsoft Access Relational Diagram d. Define the attributes for each of the dimensions in Problem (b). Given problem 1c's star schema snapshot, the dimension attributes are easily defined: Semester dimension: semester_id, semester_description, begin_date, and end_date. Major dimension: major_code and major_name. Class dimension: class_id, and class_description. Time dimension: time_id, time_description, begin_time and end_time. e. Recommend the appropriate attribute hierarchies. See the answer to question 18 and the dimensions shown in Problems 1c and 1d to develop the appropriate attribute hierarchies. NOTE To create the dimension tables in MS Access, we had to modify the data. These modifications can be examined in the update queries stored in the Ch13_P1sol.mdb database. We used the switch function in MS Access to assign the proper SEMESTER_ID and the TIME_ID values to the USEFACT table. f. Implement your data warehouse design, using the star schema you created in problem (c) and the attributes you defined in Problem (d). The solution is included in the Ch13_P1sol.mdb database on the Instructor's CD. g. Create the reports that will meet the requirements listed in this problem’s introduction. Use the Ch13_P1sol.mdb database on the Instructor's CD as the basis for the reports. Keep in mind that the Microsoft Access export function can be used to put the Access tables into a different database such as Oracle or DB2. 2. Victoria Ephanor manages a small product distribution company. Because the business is growing fast, Ephanor recognizes that it is time to manage the vast information pool to help guide the accelerating growth. Ephanor, who is familiar with spreadsheet software, currently employs a sales force of four people. She asks you to develop a data warehouse application prototype that will enable her to study sales figures by year, region, salesperson, and product. (This prototype is to be used as the basis for a future data warehouse database.) Using the data supplied in the Ch13_P2.xls file, complete the following seven problems: a. Identify the appropriate fact table components. The dimensions for this star schema are: Year, Region, Agent, and Product. (These are shown in Figure P13.2c.) b. Identify the appropriate dimension tables. (These are shown in Figure P13.2c.) c. Draw a star schema diagram for this data warehouse. See Figure P13.2c. Figure P13.2C The Star Schema for the Ephanor Distribution Company d. Identify the attributes for the dimension tables that will be required to solve this problem. The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. e. Using a Microsoft Excel or any other spreadsheet capable of producing pivot tables, generate a pivot table to show the sales by product and by region. The end user must be able to specify the display of sales for any given year. The sample output is shown in the first pivot table in Figure P13.2E. FIGURE P13.2E Using a pivot table The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. f. Using Problem 2e as your base, add a second pivot table (see Figure P13.2E) to show the sales by salesperson and by region. The end user must be able to specify sales for a given year or for all years, and for a given product or for all products. The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. g. Create a 3-D bar graph to show sales by salesperson, by product, and by region. (See the sample output in Figure P13.2G.) FIGURE P13.2G 3-D bar graph showing the relationships among sales person, product, and region The solution to this problem is presented in the Ch13_P2sol.xls file in the Student Online Companion. 3. David Suker, the inventory manager for a marketing research company, wants to study the use of supplies within the different company departments. Suker has heard that his friend, Ephanor, has developed a spreadsheet-based data warehouse model that she uses in her analysis of sales data (See Problem 2). Suker is interested in developing a data warehouse model like Ephanor’s so he can analyze orders by department and by product. He will use Microsoft Access as the Data Warehouse DBMS and Microsoft Excel as the analysis tool. a. Develop the order star schema. Figure P13.3A's MS-Access relational diagram reflects the star schema and its relationships. Note that the students are given only the ORDERS table. The student must study the data set and make the queries necessary to create the dimension tables (TIME, DEPT, VENDOR and PRODUCT) and the ORDFACT fact table. Figure P13.3A The Marketing Research Company Relational Diagram b. Identify the appropriate dimension attributes. The dimensions are: TIME, DEPT, VENDOR, and PRODUCT. (See Figure P13.3A.) c. Identify the attribute hierarchies required to support the model. The main hierarchy used for data drilling purposes is represented by TIME-DEPT-VENDOR-PRODUCT sequence. (See Figure P13.3A.) Within this hierarchy, the user can analyze data at different aggregation levels. Additional hierarchies can be constructed in the TIME dimension to account for quarters or, if necessary, by daily aggregates. The VENDOR dimension could also be expanded to include geographic information that could be used for drill-down purposes. d. Develop a crosstab report (in Microsoft Access), using a 3-D bar graph to show sales by product and by department. (The sample output is shown in Figure P13.3.) FIGURE P13.3 A Crosstab Report: Sales by Product and Department The solution to this problem is included in the Ch13_P3sol.mdb database in the Student Online Companion. 4. ROBCOR, Inc., whose sample data are contained in the database named Ch13_P4.mdb, provides "on demand" aviation charters using a mix of different aircraft and aircraft types. Because ROBCOR has grown rapidly, its owner has hired you to be its first database manager. The company's database, developed by an outside consulting team, already has a charter database in place to help manage all of its operations. Your first and critical assignment is to develop a decision support system to analyze the charter data. (Review the company’s operations in Problems 24-31 in Chapter 3, The Relational Database Model.) The charter operations manager wants to be able to analyze charter data such as cost, hours flown, fuel used, and revenue. She also wants to be able to drill down by pilot, type of airplane, and time periods. Given those requirements, complete the following: Create a star schema for the charter data. NOTE The students must first create the queries required to filter, integrate, and consolidate the data prior to their inclusion in the Data Warehouse. The Ch13_P4.mdb database contains the data to be used by the students. The Ch13_P4sol.mdb database contains the data and solution to the problems. The problem requires the creation of the time dimension. Looking at the data in the CHARTER table, the students should figure out that the two attributes in the time dimension should be year and month. Another possible attribute could be day, but since no one pilot or airplane was used more than once a day, including it as an attribute would only reduce the database's efficiency. The analysis to be done on the time dimension can be done on a monthly or yearly basis. The CHARTER table contains the date of the charter. No time IDs exist and the date is contained within a single field. The student must create the TIME dimension table and assign the proper TIME_ID keys and its attributes. A temporary table is created to aid in the creation of the CHARTER_FACT table. The queries in Table P13.4-1 are used in the transformation process: Table P13.4-1 The ROBCOR Data Warehouse Queries Query Name Query Description Make a TEMP table from CHARTER, PILOT, and MODEL Creates a temporary storage table used to make the necessary data transformations before the creation of the fact table. Update TIME_ID in TEMP Used to create the TIME_ID key used in the TIME dimension table. Update YEAR and MONTH in TEMP In order to get the year and month attributes in the TIME dimension it is necessary to separate that data in the temporary table first. The date is in the TEMP table but will not be in the fact table. Make TIME table from TEMP This query is used to create the time table using the appropriate data from the TEMP table. Aggregate TEMP table by fact keys This query does data aggregation over the data in the TEMP table. This query table will be used to create the new CHARTER_FACT table. Populate CHARTER_FACT table This query uses the results of the previous query to populate our CHARTER_FACT table. The MS Access relational diagram in Figure P13.4a reflects the star schema, the relationships, the table names, and field names used in our solution. The student is given only the CHARTER, AIRCRAFT, MODEL, EMPLOYEE, PILOT, and CUSTOMER tables, and they must produce the fact table and the dimension table. Figure P13.4A The RobCor Relational Diagram Define the dimensions and attributes for the charter operation’s star schema. The dimensions are TIME, MODEL, and PILOT. Each of these dimensions is depicted in Figure P13.4a’s star schema figure. The attributes are: Time dimension: time id, year, and month. Model dimension: model code, manufacturer, name, number of seats, etc. Pilot dimension: employee number, pilot license, pilot ratings, etc. Define the necessary attribute hierarchies. The main attribute hierarchy is based on the sequence year-month-model-pilot. The aggregate analysis is based on this hierarchy. We can produce a query to generate revenue, hours flown, and fuel used on a yearly basis. We can then drill down to a monthly time period to generate the aggregate information for each model of airplane. We can also drill down to get that information about each pilot. Implement the data warehouse design, using the design components you developed in Problems 4a-4c. The Ch13_P4sol.mdb database contains the data and solutions for problems 4a-4c. Generate the reports that will illustrate that your data warehouse is able to meet the specified information requirements. The Ch13-P4sol.mdb database contains the solution for problem 4e. Using the data provided in the Ch13_SaleCo_DW database, solve the following problems. ONLINE CONTENT The script files used to populate the database are available at www.cengagebrain.com. The script files are available in Oracle, MySQL and SQL Server formats. MS Access does not have SQL support for the complex grouping required. 5. What is the SQL command to list the total sales by customer and by product, with subtotals by customer and a grand total for all product sales? Oracle: SELECT CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT GROUP BY ROLLUP (CUS_CODE, P_CODE); SQL Server and MySQL: SELECT CUS_CODE, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT GROUP BY CUS_CODE, P_CODE WITH ROLLUP; 6. What is the SQL command to list the total sales by customer, month and product, with subtotals by customer and by month and a grand total for all product sales? Oracle: SELECT CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY ROLLUP (CUS_CODE, TM_MONTH, P_CODE); SQL Server and MySQL: SELECT CUS_CODE, TM_MONTH, P_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY CUS_CODE, TM_MONTH, P_CODE WITH ROLLUP; 7. What is the SQL command to list the total sales by region and customer, with subtotals by region and a grand total for all sales? Oracle: SELECT REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE GROUP BY ROLLUP (REG_ID, CUS_CODE); SQL Server and MySQL: SELECT REG_ID, CUS_CODE, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWCUSTOMER C ON S.CUS_CODE = C.CUS_CODE GROUP BY REG_ID, CUS_CODE WITH ROLLUP; 8. What is the SQL command to list the total sales by month and product category, with subtotals by month and a grand total for all sales? Oracle: SELECT TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY ROLLUP (TM_MONTH, P_CATEGORY); SQL Server and MySQL: SELECT TM_MONTH, P_CATEGORY, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY TM_MONTH, P_CATEGORY WITH ROLLUP; 9. What is the SQL command to list the number of product sales (number of rows) and total sales by month, with subtotals by month and a grand total for all sales? Oracle: SELECT TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY ROLLUP (TM_MONTH); SQL Server and MySQL: SELECT TM_MONTH, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY TM_MONTH WITH ROLLUP; 10. What is the SQL command to list the number of product sales (number of rows) and total sales by month and product category with subtotals by month and product category and a grand total for all sales? Oracle: SELECT TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY ROLLUP (TM_MONTH, P_CATEGORY); SQL Server and MySQL: SELECT TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY TM_MONTH, P_CATEGORY WITH ROLLUP; 11. What is the SQL command to list the number of product sales (number of rows) and total sales by month, product category and product, with subtotals by month and product category and a grand total for all sales? Oracle: SELECT TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE GROUP BY ROLLUP (TM_MONTH, P_CATEGORY, P_CODE); SQL Server and MySQL: SELECT TM_MONTH, P_CATEGORY, P_CODE, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWTIME T ON S.TM_ID = T.TM_ID JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE GROUP BY TM_MONTH, P_CATEGORY, P_CODE WITH ROLLUP; 12. Using the answer to Problem 10 as your base, what command would you need to generate the same output but with subtotals in all columns? (Hint: Use the CUBE command). Oracle: SELECT TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY CUBE (TM_MONTH, P_CATEGORY); SQL Server: SELECT TM_MONTH, P_CATEGORY, COUNT(*) AS NUMPROD, SUM(SALE_UNITS*SALE_PRICE) AS TOTSALES FROM DWDAYSALESFACT S JOIN DWPRODUCT P ON S.P_CODE = P.P_CODE JOIN DWTIME T ON S.TM_ID = T.TM_ID GROUP BY TM_MONTH, P_CATEGORY WITH CUBE; 13. Create your own data analysis and visualization presentation. The purpose of this project is for you to search for a publicly available data set using the Internet and create your own presentation using what you have learned in this chapter. a. Search for a data set that may interest you and download it. Some examples of public data sets sources are: • http://www.data.gov • http://data.worldbank.org • http://aws.amazon.com/datasets • http://usgovxml.com/ • https://data.medicare.gov/ • http://www.faa.gov/data_research/ b. Use any tool available to you to analyze the data. You can use tools such as MS Excel Pivot Tables, Pivot Charts, or other free tools, such as Google Fusion tables, Tableau free trial, IBM Many Eyes, etc. c. Create a short presentation to explain some of your findings (what the data sources are, where the data comes from, what the data represents, etc.) There are an incredible number of possible visualizations that students can create for an exercise like this. Most students enjoy the opportunity to express their creativity in producing visually interesting solutions. Attempt to keep the focus on how the visualization might make the data actionable. What can we learn from the visualization, and how might a decision maker be influenced by it. Data Sources available: There are several public sources of large data sets that could be used by students to practice visualizations. Some of the most common sources are: http://catalog.data.gov http://data.worldbank.org http://aws.amazon.com/datasets http://usgovxml.com https://data.medicare.gov http://www.faa.gov/data_research/ https://www.cdc.gov/nchs/data_access/ https://data.world/ For some good examples of data visualizations, see the Centers for Disease Control and Prevention, Data Visualization Gallery at https://www.cdc.gov/nchs/data-visualization/ NOTE: The data files for this chapter has a Dashboards folder that contains two complete data visualization examples, including two data sets: H1B Visa data and Vehicle Crash data. In addition, there are sample dashboards build in Excel, MS Power BI and Tableau. Read the included documentation explaining some of the data transformations applied to the raw data. These should serve as a good starting point to the students on how to create some simple dashboards. See sample figures below. Figure P13.13A H1b Visa Applications Dashboard (Excel) Figure P13.13B H1b Visa Applications Dashboard (PowerBi) Figure P13.13C H1b Visa Applications Dashboard (Tableau) Solution Manual for Database Systems: Design, Implementation, and Management Carlos Coronel, Steven Morris 9781337627900, 9781305627482
Close