This Document Contains Chapters 1 to 2 Chapter 1 Database Systems Discussion Focus How often have your students heard that “you have only one chance to make a good first impression?” That’s why it’s so important to sell the importance of databases and the desirability of good database design during the first class session. Start by showing your students that they interact with databases on a daily basis. For example, how many of them have bought anything using a credit card during the past day, week, month, or year? None of those transactions would be possible without a database. How many have shipped a document or a package via an overnight service or via certified or registered mail? How many have checked course catalogs and class schedules online? And surely all of your students registered for your class? Did anybody use a web search engine to look for – and find – information about almost anything? This point is easy to make: Databases are important because we depend on their existence to perform countless transactions and to provide information. If you are teaching in a classroom equipped with computers, give some “live” performances. For example, you can use the web to look up a few insurance quotes or compare car prices and models. Incidentally, this is a good place to make the very important distinction between data and information. In short, spend some time discussing the points made in Section 1.1, "Why Databases?" and Section 1.2 “Data vs. Information.” After demonstrating that modern daily life is almost inconceivable without the ever-present databases, discuss how important it is that the (database) transactions are made successfully, accurately, and quickly. That part of the discussion points to the importance of database design, which is at the heart of this book. If you want to have the keys to the information kingdom, you’ll want to know about database design and implementation. And, of course, databases don’t manage themselves … and that point leads to the importance of the database administration (DBA) function. There is a world of exciting database employment opportunities out there. After discussing why databases, database design, and database administration are important, you can move through the remainder of the chapter to develop the necessary vocabulary and concepts. The review questions help you do that … and the problems provide the chance to test the newfound knowledge. Answers to Review Questions 1. Define each of the following terms: a. data Raw facts from which the required information is derived. Data have little meaning unless they are grouped in a logical manner. b. field A character or a group of characters (numeric or alphanumeric) that describes a specific characteristic. A field may define a telephone number, a date, or other specific characteristics that the end user wants to keep track of. c. record A logically connected set of one or more fields that describes a person, place, event, or thing. For example, a CUSTOMER record may be composed of the fields CUST_NUMBER, CUST_LNAME, CUST_FNAME, CUST_INITIAL, CUST_ADDRESS, CUST_CITY, CUST_STATE, CUST_ZIPCODE, CUST_AREACODE, and CUST_PHONE. d. file Historically, a collection of file folders, properly tagged and kept in a filing cabinet. Although such manual files still exist, we more commonly think of a (computer) file as a collection of related records that contain information of interest to the end user. For example, a sales organization is likely to keep a file containing customer data. Keep in mind that the phrase related records reflects a relationship based on function. For example, customer data are kept in a file named CUSTOMER. The records in this customer file are related by the fact that they all pertain to customers. Similarly, a file named PRODUCT would contain records that describe products – the records in this file are all related by the fact that they all pertain to products. You would not expect to find customer data in a product file, or vice versa. NOTE Note: Field, record, and file are computer terms, created to help describe how data are stored in secondary memory. Emphasize that computer file data storage does not match the human perception of such data storage. 2. What is data redundancy, and which characteristics of the file system can lead to it? Data redundancy exists when unnecessarily duplicated data are found in the database. For example, a customer's telephone number may be found in the customer file, in the sales agent file, and in the invoice file. Data redundancy is symptomatic of a (computer) file system, given its inability to represent and manage data relationships. Data redundancy may also be the result of poorly designed databases that allow the same data to be kept in different locations. (Here's another opportunity to emphasize the need for good database design!) 3. What is data independence, and why is it lacking in file systems? Data independence is a condition in which the programs that access data are not dependent on the data storage characteristics of the data. Systems that lack data independence are said to exhibit data dependence. File systems exhibit data dependence because file access is dependent on a file's data characteristics. Therefore, any time the file data characteristics are changed, the programs that access the data within those files must be modified. Data independence exists when changes in the data characteristics don't require changes in the programs that access those data. File systems lack data independence because all data access programs are subject to change when any of the file system’s data storage characteristics – such as changing a data type -- change. 4. What is a DBMS, and what are its functions? A DBMS is best described as a collection of programs that manage the database structure and that control shared access to the data in the database. Current DBMSes also store the relationships between the database components; they also take care of defining the required access paths to those components. The functions of a current-generation DBMS may be summarized as follows: • The DBMS stores the definitions of data and their relationships (metadata) in a data dictionary; any changes made are automatically recorded in the data dictionary. • The DBMS creates the complex structures required for data storage. • The DBMS transforms entered data to conform to the data structures in item 2. • The DBMS creates a security system and enforces security within that system. • The DBMS creates complex structures that allow multiple user access to the data. • The DBMS performs backup and data recovery procedures to ensure data safety. • The DBMS promotes and enforces integrity rules to eliminate data integrity problems. • The DBMS provides access to the data via utility programs and from programming languages interfaces. • The DBMS provides end-user access to data within a computer network environment. 5. What is structual independence, and why is it important? Structural independence exists when data access programs are not subject to change when the file's structural characteristics, such as the number or order of the columns in a table, change. Structural independence is important because it substantially decreases programming effort and program maintenance costs. 6. Explain the differences between data, information, and a database Data are raw facts. Information is processed data to reveal the meaning behind the facts. Let’s summarize some key points: • Data constitute the building bocks of information. • Information is produced by processing data. • Information is used to reveal the meaning of data. • Good, relevant, and timely information is the key to good decision making. • Good decision making is the key to organizational survival in a global environment. A database is a computer structure for storing data in a shared, integrated fashion so that the data can be transformed into information as needed. 7. What is the role of a DBMS, and what are its advantages? What are its disadvantages? A database management system (DBMS) is a collection of programs that manages the database structure and controls access to the data stored in the database. Figure 1.2 (shown in the text) illustrates that the DBMS serves as the intermediary between the user and the database. The DBMS receives all application requests and translates them into the complex operations required to fulfill those requests. The DBMS hides much of the database’s internal complexity from the application programs and users. The application program might be written by a programmer using a programming language such as COBOL, Visual Basic, or C++, or it might be created through a DBMS utility program. Having a DBMS between the end user’s applications and the database offers some important advantages. First, the DBMS enables the data in the database to be shared among multiple applications or users. Second, the DBMS integrates the many different users’ views of the data into a single all-encompassing data repository. Because data are the crucial raw material from which information is derived, you must have a good way of managing such data. As you will discover in this book, the DBMS helps make data management more efficient and effective. In particular, a DBMS provides advantages such as: • Improved data sharing. The DBMS helps create an environment in which end users have better access to more and better-managed data. Such access makes it possible for end users to respond quickly to changes in their environment. • Better data integration. Wider access to well-managed data promotes an integrated view of the organization’s operations and a clearer view of the big picture. It becomes much easier to see how actions in one segment of the company affect other segments. • Minimized data inconsistency. Data inconsistency exists when different versions of the same data appear in different places. For example, data inconsistency exists when a company’s sales department stores a sales representative’s name as “Bill Brown” and the company’s personnel department stores that same person’s name as “William G. Brown” or when the company’s regional sales office shows the price of product “X” as $45.95 and its national sales office shows the same product’s price as $43.95. The probability of data inconsistency is greatly reduced in a properly designed database. • Improved data access. The DBMS makes it possible to produce quick answers to ad hoc queries. From a database perspective, a query is a specific request for data manipulation (for example, to read or update the data) issued to the DBMS. Simply put, a query is a question and an ad hoc query is a spur-of-the-moment question. The DBMS sends back an answer (called the query result set) to the application. For example, end users, when dealing with large amounts of sales data, might want quick answers to questions (ad hoc queries) such as: What was the dollar volume of sales by product during the past six months? What is the sales bonus figure for each of our salespeople during the past three months? How many of our customers have credit balances of $3,000 or more? • Improved decision making. Better-managed data and improved data access make it possible to generate better quality information, on which better decisions are based. • Increased end-user productivity. The availability of data, combined with the tools that transform data into usable information, empowers end users to make quick, informed decisions that can make the difference between success and failure in the global economy. The advantages of using a DBMS are not limited to the few just listed. In fact, you will discover many more advantages as you learn more about the technical details of databases and their proper design. Although the database system yields considerable advantages over previous data management approaches, database systems do carry significant disadvantages. For example: • Increased costs. Database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial. Training, licensing, and regulation compliance costs are often overlooked when database systems are implemented. • Management complexity. Database systems interface with many different technologies and have a significant impact on a company’s resources and culture. The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company’s objectives. Given the fact that databases systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly. • Maintaining currency. To maximize the efficiency of the database system, you must keep your system current. Therefore, you must perform frequent updates and apply the latest patches and security measures to all components. Because database technology advances rapidly, personnel training costs tend to be significant. • Vendor dependence. Given the heavy investment in technology and personnel training, companies might be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers, and those customers might be limited in their choice of database system components. • Frequent upgrade/replacement cycles. DBMS vendors frequently upgrade their products by adding new functionality. Such new features often come bundled in new upgrade versions of the software. Some of these versions require hardware upgrades. Not only do the upgrades themselves cost money, but it also costs money to train database users and administrators to properly use and manage the new features. 8. List and describe the different types of databases. The focus is on Section 1-3b, TYPES OF DATABASES. Organize the discussion around the number of users, database site location, and data use: • Number of users o Single-user o Multiuser o Workgroup o Enterprise • Database site location o Centralized o Distributed o Cloud-based • Type of data o General-purpose o Discipline-specific • Database use o Transactional (production) database (OLTP) o Data warehouse database (OLAP) • Degree of data structure o Unstructured data o Structured data For a description of each type of database, please see section 1-3b. 9. What are the main components of a database system? The basis of this discussion is Section 1-7a, THE DATABASE SYSTEM ENVIRONMENT. Figure 1.10 provides a good bird’s eye view of the components. Note that the system’s components are hardware, software, people, procedures, and data. 10. What is metadata? Metadata is data about data. That is, metadata define the data characteristics such as the data type (such as character or numeric) and the relationships that link the data. Relationships are an important component of database design. What makes relationships especially interesting is that they are often defined by their environment. For instance, the relationship between EMPLOYEE and JOB is likely to depend on the organization’s definition of the work environment. For example, in some organizations an employee can have multiple job assignments, while in other organizations – or even in other divisions within the same organization – an employee can have only one job assignment. The details of relationship types and the roles played by those relationships in data models are defined and described in Chapter 2, Data Models.”. Relationships will play a key role in subsequent chapters. You cannot effectively deal with database design issues unless you address relationships. 11. Explain why database design is important. The focus is on Section 1-4, WHY DATABASE DESIGN IS IMPORTANT. Explain that modern database and applications development software is so easy to use that many people can quickly learn to implement a simple database and develop simple applications within a week or so, without giving design much thought. As data and reporting requirements become more complex, those same people will simply (and quickly!) produce the required add-ons. That's how data redundancies and all their attendant anomalies develop, thus reducing the "database" and its applications to a status worse than useless. Stress these points: • Good applications can't overcome bad database designs. • The existence of a DBMS does not guarantee good data management, nor does it ensure that the database will be able to generate correct and timely information. • Ultimately, the end user and the designer decide what data will be stored in the database. A database created without the benefit of a detailed blueprint is unlikely to be satisfactory. Pose this question: would you think it smart to build a house without the benefit of a blueprint? So why would you want to create a database without a blueprint? (Perhaps it would be OK to build a chicken coop without a blueprint, but would you want your house to be built the same way?) 12. What are the potential costs of implementing a database system? Although the database system yields considerable advantages over previous data management approaches, database systems do impose significant costs. For example: • Increased acquisition and operating costs. Database systems require sophisticated hardware and software and highly skilled personnel. The cost of maintaining the hardware, software, and personnel required to operate and manage a database system can be substantial. • Management complexity. Database systems interface with many different technologies and have a significant impact on a company's resources and culture. The changes introduced by the adoption of a database system must be properly managed to ensure that they help advance the company's objectives. Given the fact that databases systems hold crucial company data that are accessed from multiple sources, security issues must be assessed constantly. • Maintaining currency. To maximize the efficiency of the database system, you must keep your system current. Therefore, you must perform frequent updates and apply the latest patches and security measures to all components. Because database technology advances rapidly, personnel training costs tend to be significant. • Vendor dependence. Given the heavy investment in technology and personnel training, companies may be reluctant to change database vendors. As a consequence, vendors are less likely to offer pricing point advantages to existing customers and those customers may be limited in their choice of database system components. 13. Use examples to compare and contrast unstructured and structured data. Which type is more prevalent in a typical business environment? Unstructured data are data that exist in their original (raw) state, that is, in the format in which they were collected. Therefore, unstructured data exist in a format that does not lend itself to the processing that yields information. Structured data are the result of taking unstructured data and formatting (structuring) such data to facilitate storage, use, and the generation of information. You apply structure (format) based on the type of processing that you intend to perform on the data. Some data might be not ready (unstructured) for some types of processing, but they might be ready (structured) for other types of processing. For example, the data value 37890 might refer to a zip code, a sales value, or a product code. If this value represents a zip code or a product code and is stored as text, you cannot perform mathematical computations with it. On the other hand, if this value represents a sales transaction, it is necessary to format it as numeric. If invoices are stored as images for future retrieval and display, you can scan them and save them in a graphic format. On the other hand, if you want to derive information such as monthly totals and average sales, such graphic storage would not be useful. Instead, you could store the invoice data in a (structured) spreadsheet format so that you can perform the requisite computations. Based on sheer volume, most data is unstructured or semistructured. Data for conducting actual business transactions is usually structured. 14. What are some basic database functions that a spreadsheet cannot perform. Spreadsheets do not support self-documentation through metadata, enforcement of data types or domains to ensure consistency of data within a column, defined relationships among tables, or constraints to ensure consistency of data across related tables. 15. What common problems do a collection of spreadsheets created by end users share with the typical file system? A collection of spreadsheets shares several problems with the typical file system. First problem is that end users create their own, private, copies of the data, which creates issues of data ownership. This situation also creates islands of information where changes to one set of data are not reflected in all of the copies of the data. This leads to the second problem – lack of data consistency. Because the data in various spreadsheets may be intended to represent a view of the business environment, a lack of consistency in the data may lead to faulty decision making based on inaccurate data. 16. Explain the significance of the loss of direct, hands-on access to business data that users experienced with the advent of computerized data repositories. Users lost direct, hands-on access to the business data when computerized data repositories were developed because the IT skills necessary to directly access and manipulate the data were beyond the average user's abilities, and because security precautions restricted access to the shared data. This was significant because it removed users from the direct manipulation of data and introduced significant time delays for data access. When users need answers to business questions from the data, necessity often does not give them the luxury of time to wait days, weeks, or even months for the required reports. The desire to return hands-on access to the data to the users, among other drivers, helped to propel the development of database systems. While database systems have greatly improved the ability of users to directly access data, the need to quickly manipulate data for themselves has lead to the problems of spreadsheets being used when databases are needed. 17. Explain why the cost of ownership may be lower with a cloud database than with a traditional, company database. Cloud databases reside on the Internet instead of within the organization’s own network infrastructure. This can reduce costs because the organization is not required to purchase and maintain the hardware and software necessary to house the database and support the necessary levels of system performance. Problem Solutions ONLINE CONTENT The file structures you see in this problem set are simulated in a Microsoft Access database named Ch01_Problems, available www.cengagebrain.com. Given the file structure shown in Figure P1.1, answer Problems 1 - 4. FIGURE P1.1 The File Structure for Problems 1-4 1. How many records does the file contain? How many fields are there per record? The file contains seven records (21-5Z through 31-7P) and each of the records is composed of five fields (PROJECT_CODE through PROJECT_BID_PRICE.) 2. What problem would you encounter if you wanted to produce a listing by city? How would you solve this problem by altering the file structure? The city names are contained within the MANAGER_ADDRESS attribute and decomposing this character (string) field at the application level is cumbersome at best. (Queries become much more difficult to write and take longer to execute when internal string searches must be conducted.) If the ability to produce city listings is important, it is best to store the city name as a separate attribute. 3. If you wanted to produce a listing of the file contents by last name, area code, city, state, or zip code, how would you alter the file structure? The more we divide the address into its component parts, the greater its information capabilities. For example, by dividing MANAGER_ADDRESS into its component parts (MGR_STREET, MGR_CITY, MGR_STATE, and MGR_ZIP), we gain the ability to easily select records on the basis of zip codes, city names, and states. Similarly, by subdividing the MANAGER name into its components MGR_LASTNAME, MGR_FIRSTNAME, and MGR_INITIAL, we gain the ability to produce more efficient searches and listings. For example, creating a phone directory is easy when you can sort by last name, first name, and initial. Finally, separating the area code and the phone number will yield the ability to efficiently group data by area codes. Thus MGR_PHONE might be decomposed into MGR_AREA_CODE and MGR_PHONE. The more you decompose the data into their component parts, the greater the search flexibility. Data that are decomposed into their most basic components are said to be atomic. 4. What data redundancies do you detect? How could those redundancies lead to anomalies? Note that the manager named Holly B. Parker occurs three times, indicating that she manages three projects coded 21-5Z, 25-9T, and 29-2D, respectively. (The occurrences indicate that there is a 1:M relationship between PROJECT and MANAGER: each project is managed by only one manager but, apparently, a manager may manage more than one project.) Ms. Parker's phone number and address also occur three times. If Ms. Parker moves and/or changes her phone number, these changes must be made more than once and they must all be made correctly... without missing a single occurrence. If any occurrence is missed during the change, the data are "different" for the same person. After some time, it may become difficult to determine what the correct data are. In addition, multiple occurrences invite misspellings and digit transpositions, thus producing the same anomalies. The same problems exist for the multiple occurrences of George F. Dorts. 5. Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure P1.5. FIGURE P1.5 The File Structure for Problems 5-8 NOTE It is not too early to begin discussing proper structure. For example, you may focus student attention on the fact that, ideally, each row should represent a single entity. Therefore, each row's fields should define the characteristics of one entity, rather than include characteristics of several entities. The file structure shown here includes characteristics of multiple entities. For example, the JOB_CODE is likely to be a characteristic of a JOB entity. PROJ_NUM and PROJ_NAME are clearly characteristics of a PROJECT entity. Also, since (apparently) each project has more than one employee assigned to it, the file structure shown here shows multiple occurrences for each of the projects. (Hurricane occurs three times, Coast occurs twice, and Satellite occurs four times.) Given the file's poor structure, the stage is set for multiple anomalies. For example, if the charge for JOB_CODE = EE changes from $85.00 to $90.00, that change must be made twice. Also, if employee June H. Sattlemeier is deleted from the file, you also lose information about the existence of her JOB_CODE = EE, its hourly charge of $85.00, and the PROJ_HOURS = 17.5. The loss of the PROJ_HOURS value will ultimately mean that the Coast project costs are not being charged properly, thus causing a loss of PROJ_HOURS*JOB_CHG_HOUR = 17.5 x $85.00 = $1,487.50 to the company. Incidentally, note that the file contains different JOB_CHG_HOUR values for the same CT job code, thus illustrating the effect of changes in the hourly charge rate over time. The file structure appears to represent transactions that charge project hours to each project. However, the structure of this file makes it difficult to avoid update anomalies and it is not possible to determine whether a charge change is accurately reflected in each record. Ideally, a change in the hourly charge rate would be made in only one place and this change would then be passed on to the transaction based on the hourly charge. Such a structural change would ensure the historical accuracy of the transactions. You might want to emphasize that the recommended changes require a lot of work in a file system. 6. Looking at the EMP_NAME and EMP_PHONE contents in Figure P1.5, what change(s) would you recommend? A good recommendation would be to make the data more atomic. That is, break up the data componnts whenever possible. For example, separate the EMP_NAME into its componenst EMP_FNAME, EMP_INITIAL, and EMP_LNAME. This change will make it much easier to organize employee data through the employee name component. Similarly, the EMP_PHONE data should be decomposed into EMP_AREACODE and EMP_PHONE. For example, breaking up the phone number 653-234-3245 into the area code 653 and the phone number 234-3245 will make it much easier to organize the phone numbers by area code. (If you want to print an employee phone directory, the more atomic employee name data will make the job much easier.) 7. Identify the various data sources in the file you examined in Problem 5. Given their answers to problem 5 and some additional scrutiny of Figure 1.5, your students should be able to identify these data sources: • Employee data such as names and phone numbers. • Project data such as project names. If you start with an EMPLOYEE file, the project names clearly do not belong in that file. (Project names are clearly not employee characteristics.) • Job data such as the job charge per hour. If you start with an EMPLOYEE file, the job charge per hour clearly does not belong in that file. (Hourly charges are clearly not employee characteristics.) • The project hours, which are most likely the hours worked by the employee for that project. (Such hours are associated with a work product, not the employee per se.) 8. Given your answer to Problem 7, what new files should you create to help eliminate the data redundancies found in the file shown in Figure P1.5? The data sources are probably the PROJECT, EMPLOYEE, JOB, and CHARGE. The PROJECT file should contain project characteristics such as the project name, the project manager/coordinator, the project budget, and so on. The EMPLOYEE file might contain the employee names, phone number, address, and so on. The JOB file would contain the billing charge per hour for each of the job types – a database designer, an applications developer, and an accountant would generate different billing charges per hour. The CHARGE file would be used to keep track of the number of hours by job type that will be billed for each employee who worked on the project. 9. Identify and discuss the serious data redundancy problems exhibited by the file structure shown in Figure P1.9. (The file is meant to be used as a teacher class assignment schedule. One of the many problems with data redundancy is the likely occurrence of data inconsistencies – that two different initials have been entered for the teacher named Maria Cordoza.) FIGURE P1.9 The File Structure for Problems 9-10 Note that the teacher characteristics occur multiple times in this file. For example, the teacher named Maria Cordoza’s first name, last name, and initial occur three times. If changes must be made for any given teacher, those changes must be made multiple times. All it takes is one incorrect entry or one forgotten change to create data inconsistencies. Redundant data are not a luxury you can afford in a data environment. 10. Given the file structure shown in Figure P1.9, what problem(s) might you encounter if building KOM were deleted? You would lose all the time assignment data about teachers Williston, Cordoza, and Hawkins, as well as the KOM rooms 204E, 123, and 34. Here is yet another good reason for keeping data about specific entities in their own tables! This kind of an anomaly is known as a deletion anomaly. 11. Using your school’s student information system, print your class schedule. The schedule probably would contain the student identification number, student name, class code, class name, class credit hours, class instructor name, the class meeting days and times, and the class room number. Use Figure P1.11 as a template to complete the following actions. a. Create a spreadsheet using the template shown in Figure P1.11 and enter your current class schedule. b. Enter the class schedule of two of your classmates into the same spreadsheet. c. Discuss the redundancies and anomalies caused by this design. This could be a good “mini-group” problem – groups of 3 students max. Ask them to create their individual class schedules in separate spreadsheets and then, a single spreadsheet containing all their class schedules. This exercise should incentivate “group discussion” and discover data anomalies and brain storm better wayd to store the class schedule data. Students are likely to use MS Excel or Google Sheets to create a simple tabular spreasheet containing the data outlined in Figure P1.11. The rows of the spreadsheet(s) will represent each one of the classes they are taking. Students are likely to identify the redundancies around the class information since all three schedules (the student’s own schedule plus the schedules of the two classmates) will have at least the database class in common. This easily leads to discussions of separating the data into at least two tables in a database. However, that still leaves the redundancies of redundant student data with each class that they are taking. Astute students might realize that this is analogous to the Employee Skill Certifications shown in Figures 1.4 and Figure 1.5, such that a table for student data, a table for class data, and a table to relate the students and classes is appropriate. Chapter 2 Data Models Discussion Focus Although all of the topics covered in this chapter are important, our students have given us consistent feedback: If you can write precise business rules from a description of operations, database design is not that difficult. Therefore, once data modeling (Sections 2-1, "Data Modeling and Data Models", Section 2-2 "The Importance of Data Models,” and 2-3, “Data Model Basic Building Blocks,”) has been examined in detail, Section 2-4, “Business Rules,” should receive a lot of class time and attention. Perhaps it is useful to argue that the answers to questions 2 and 3 in the Review Questions section are the key to successful design. That’s why we have found it particularly important to focus on business rules and their impact on the database design process. What are business rules, what is their source, and why are they crucial? Business rules are precisely written and unambiguous statements that are derived from a detailed description of an organization's operations. When written properly, business rules define one or more of the following modeling components: • entities • relationships • attributes • connectivities • cardinalities – these will be examined in detail in Chapter 3, “The Relational Database Model.” Basically, the cardinalities yield the minimum and maximum number of entity occurrences in an entity. For example, the relationship decribed by “a professor teaches one or more classes” means that the PROFESSOR entity is referenced at least once and no more than four times in the CLASS entity. • constraints Because the business rules form the basis of the data modeling process, their precise statement is crucial to the success of the database design. And, because the business rules are derived from a precise description of operations, much of the design's success depends on the accuracy of the description of operations. Examples of business rules are: • An invoice contains one or more invoice lines. • Each invoice line is associated with a single invoice. • A store employs many employees. • Each employee is employed by only one store. • A college has many departments. • Each department belongs to a single college. (This business rule reflects a university that has multiple colleges such as Business, Liberal Arts, Education, Engineering, etc.) • A driver may be assigned to drive many different vehicles. • Each vehicle can be driven by many drivers. (Note: Keep in mind that this business rule reflects the assignment of drivers during some period of time.) • A client may sign many contracts. • Each contract is signed by only one client. • A sales representative may write many contracts. • Each contract is written by one sales representative. Note that each relationship definition requires the definition of two business rules. For example, the relationship between the INVOICE and (invoice) LINE entities is defined by the first two business rules in the bulleted list. This two-way requirement exists because there is always a two-way relationship between any two related entities. (This two-way relationship description also reflects the implementation by many of the available database design tools.) Keep in mind that the ER diagrams cannot always reflect all of the business rules. For example, examine the following business rule: A customer cannot be given a credit line over $10,000 unless that customer has maintained a satisfactory credit history (as determined by the credit manager) during the past two years. This business rule describes a constraint that cannot be shown in the ER diagram. The business rule reflected in this constraint would be handled at the applications software level through the use of a trigger or a stored procedure. (Your students will learn about triggers and stored procedures in Chapter 8, “Advanced SQL.”) Given their importance to successful design, we cannot overstate the importance of business rules and their derivation from properly written description of operations. It is not too early to start asking students to write business rules for simple descriptions of operations. Begin by using familiar operational scenarios, such as buying a book at the book store, registering for a class, paying a parking ticket, or renting a DVD. Also, try reversing the process: Give the students a chance to write the business rules from a basic data model such as the one represented by the text’s Figure 2.1 and 2.2. Ask your students to write the business rules that are the foundation of the relational diagram in Figure 2.2 and then point their attention to the relational tables in Figure 2.1 to indicate that an AGENT occurrence can occur multiple times in the CUSTOMER entity, thus illustrating the implementation impact of the business rules An agent can serve many customers. Each customer is served by one agent. Answers to Review Questions 1. Discuss the importance of data modeling. A data model is a relatively simple representation, usually graphical, of a more complex real world object event. The data model’s main function is to help us understand the complexities of the real-world environment. The database designer uses data models to facilitate the interaction among designers, application programmers, and end users. In short, a good data model is a communications device that helps eliminate (or at least substantially reduce) discrepancies between the database design’s components and the real world data environment. The development of data models, bolstered by powerful database design tools, has made it possible to substantially diminish the database design error potential. (Review Sections 2.1 and 2.2 in detail.) 2. What is a business rule, and what is its purpose in data modeling? A business rule is a brief, precise, and unambigous description of a policy, procedure, or principle within a specific organization’s environment. In a sense, business rules are misnamed: they apply to any organization -- a business, a government unit, a religious group, or a research laboratory; large or small -- that stores and uses data to generate information. Business rules are derived from a description of operations. As its name implies, a description of operations is a detailed narrative that describes the operational environment of an organization. Such a description requires great precision and detail. If the description of operations is incorrect or inomplete, the business rules derived from it will not reflect the real world data environment accurately, thus leading to poorly defined data models, which lead to poor database designs. In turn, poor database designs lead to poor applications, thus setting the stage for poor decision making – which may ultimately lead to the demise of the organization. Note especially that business rules help to create and enforce actions within that organization’s environment. Business rules must be rendered in writing and updated to reflect any change in the organization’s operational environment. Properly written business rules are used to define entities, attributes, relationships, and constraints. Because these components form the basis for a database design, the careful derivation and definition of business rules is crucial to good database design. 3. How do you translate business rules into data model components? As a general rule, a noun in a business rule will translate into an entity in the model, and a verb (active or passive) associating nouns will translate into a relationship among the entities. For example, the business rule “a customer may generate many invoices” contains two nouns (customer and invoice) and a verb (“generate”) that associates them. 4. Describe the basic features of the relational data model and discuss their importance to the end user and the designer. A relational database is a single data repository that provides both structural and data independence while maintaining conceptual simplicity. The relational database model is perceived by the user to be a collection of tables in which data are stored. Each table resembles a matrix composed of row and columns. Tables are related to each other by sharing a common value in one of their columns. The relational model represents a breakthrough for users and designers because it lets them operate in a simpler conceptual environment. End users find it easier to visualize their data as a collection of data organized as a matrix. Designers find it easier to deal with conceptual data representation, freeing them from the complexities associated with physical data representation. 5. Explain how the entity relationship (ER) model helped produce a more structured relational database design environment. An entity relationship model, also known as an ERM, helps identify the database's main entities and their relationships. Because the ERM components are graphically represented, their role is more easily understood. Using the ER diagram, it’s easy to map the ERM to the relational database model’s tables and attributes. This mapping process uses a series of well-defined steps to generate all the required database structures. (This structures mapping approach is augmented by a process known as normalization, which is covered in detail in Chapter 6 “Normalization of Database Tables.”) 6. Consider the scenario described by the statement “A customer can make many payments, but each payment is made by only one customer” as the basis for an entity relationship diagram (ERD) representation. This scenario yields the ERDs shown in Figure Q2.6. (Note the use of the PowerPoint Crow’s Foot template. We will start using the Visio Professional-generated Crow’s Foot ERDs in Chapter 3, but you can, of course, continue to use the template if you do not have access to Visio Professional.) Figure Q2.6 The Chen and Crow’s Foot ERDs for Question 6 NOTE Remind your students again that we have not (yet) illustrated the effect of optional relationships on the ERD’s presentation. Optional relationships and their treatment are covered in detail in Chapter 4, “Entity Relationship (ER) Modeling.” 7. Why is an object said to have greater semantic content than an entity? An object has greater semantic content because it embodies both data and behavior. That is, the object contains, in addition to data, also the description of the operations that may be performed by the object. 8. What is the difference between an object and a class in the object oriented data model (OODM)? An object is an instance of a specific class. It is useful to point out that the object is a run-time concept, while the class is a more static description. Objects that share similar characteristics are grouped in classes. A class is a collection of similar objects with shared structure (attributes) and behavior (methods.) Therefore, a class resembles an entity set. However, a class also includes a set of procedures known as methods. 9. How would you model Question 6 with an OODM? (Use Figure 2.4 as your guide.) The OODM that corresponds to question 6’s ERD is shown in Figure Q1.9: Figure Q2.9 The OODM Model for Question 9 10. What is an ERDM, and what role does it play in the modern (production) database environment? The Extended Relational Data Model (ERDM) is the relational data model’s response to the Object Oriented Data Model (OODM.) Most current RDBMSes support at least a few of the ERDM’s extensions. For example, support for large binary objects (BLOBs) is now common. Although the "ERDM" label has frequently been used in the database literature to describe the relational database model's response to the OODM's challenges, C. J. Date objects to the ERDM label for the following reasons: • The useful contribution of "the object model" is its ability to let users define their own -- and often very complex -- data types. However, mathematical structures known as "domains" in the relational model also provide this ability. Therefore, a relational DBMS that properly supports such domains greatly diminishes the reason for using the object model. Given proper support for domains, relational database models are quite capable of handling the complex data encountered in time series, engineering design, office automation, financial modeling, and so on. Because the relational model can support complex data types, the notion of an "extended relational database model" or ERDM is "extremely inappropriate and inaccurate" and "it should be firmly resisted." (The capability that is supposedly being extended is already there!) • Even the label object/relational model (O/RDM) is not quite accurate, because the relational database model's domain is not an object model structure. However, there are already quite a few O/R products -- also known as Universal Database Servers -- on the market. Therefore, Date concedes that we are probably stuck with the O/R label. In fact, Date believes that "an O/R system is in everyone's future." More precisely, Date argues that a true O/R system would be "nothing more nor less than a true relational system -- which is to say, a system that supports the relational model, with all that such support entails." C. J. Date concludes his discussion by observing that "We need do nothing to the relational model achieve object functionality. (Nothing, that is, except implement it, something that doesn't yet seem to have been tried in the commercial world.)" 11. What is a relationship, and what three types of relationships exist? A relationship is an association among (two or more) entities. Three types of relationships exist: one-to-one (1:1), one-to-many (1:M), and many-to-many (M:N or M:M.) 12. Give an example of each of the three types of relationships. 1:1 An academic department is chaired by one professor; a professor may chair only one academic department. 1:M A customer may generate many invoices; each invoice is generated by one customer. M:N An employee may have earned many degrees; a degree may have been earned by many employees. 13. What is a table, and what role does it play in the relational model? Strictly speaking, the relational data model bases data storage on relations. These relations are based on algebraic set theory. However, the user perceives the relations to be tables. In the relational database environment, designers and users perceive a table to be a matrix consisting of a series of row/column intersections.Tables, also called relations, are related to each other by sharing a common entity characteristic. For example, an INVOICE table would contain a customer number that points to that same number in the CUSTOMER table. This feature enables the RDBMS to link invoices to the customers who generated them. Tables are especially useful from the modeling and implementation perspecectives. Because tables are used to describe the entities they represent, they provide ane asy way to summarize entity characteristics and relationships among entities. And, because they are purely conceptual constructs, the designer does not need to be concerned about the physical implementation aspects of the database design. 14. What is a relational diagram? Give an example. A relational diagram is a visual representation of the relational database’s entities, the attributes within those entities, and the relationships between those entities. Therefore, it is easy to see what the entities represent and to see what types of relationships (1:1, 1:M, M:N) exist among the entities and how those relationships are implemented. An example of a relational diagram is found in the text’s Figure 2.2. MS Access, Database Tools, “Relationships”option on the main Acces menu could be used to illustrate simple relational diagrams. 15. What is connectivity? (Use a Crow’s Foot ERD to illustrate connectivity.) Connectivity is the relational term to describe the types of relationships (1:1, 1:M, M:N). In the figure, the businesss rule that an advisor can advise many students and a student has only one assigned advisor is shown with in a relationship with a connectivity of 1:M. The business rule that a student can register only one vehicle to park on campus and a vehicle can be registered by only one student is shown with a relationship with a connectivity of 1:1. Finally, the rule that a student can register for many classes, and a class can be registered for by many students, is shown by the relationship with a connectivity of M:N. 16. Describe the Big Data phenomenon. Over the last few years, a new wave of data has “emerged” to the limelight. Such data have alsways exsisted but did not recive the attention that is receiving today. These data are characterized for being high volume (petabyte size and beyond), high frequency (data are generated almost constantly), and mostly semi-structured. These data come from multiple and vatied sources such as web site logs, web site posts in social sites, and machine generated information (GPS, sensors, etc.) Such data; have been accumulated over the years and companies are now awakining to the fact that it contains a lot of hidden information that could help the day-to-day business (such as browsing patterns, purchasing preferences, behaivor patterns, etc.) The need to manage and leverage this data has triggered a phenomenon labeled “Big Data”. Big Data refers to a movement to find new and better ways to manage large amounts of web-generated data and derive business insight from it, while, at the same time, providing high performance and scalability at a reasonable cost. 17. What does the term “3 vs” refers to? The term “3 Vs” refers to the 3 basic characteristics of Big Data databases, they are: • Volume: Refers to the amounts of data being stored. With the adoption and growth of the Internet and social media, companies have multiplied the ways to reach customers. Over the years, and with the benefit of technological advances, data for millions of e-transactions were being stored daily on company databases. Furthermore, organizations are using multiple technologies to interact with end users and those technologies are generating mountains of data. This ever-growing volume of data quickly reached petabytes in size and it's still growing. • Velocity: Refers not only to the speed with which data grows but also to the need to process these data quickly in order to generate information and insight. With the advent of the Internet and social media, business responses times have shrunk considerably. Organizations need not only to store large volumes of quickly accumulating data, but also need to process such data quickly. The velocity of data growth is also due to the increase in the number of different data streams from which data is being piped to the organization (via the web, e-commerce, Tweets, Facebook posts, emails, sensors, GPS, and so on). • Variety: Refers to the fact that the data being collected comes in multiple different data formats. A great portion of these data comes in formats not suitable to be handled by the typical operational databases based on the relational model. The 3 Vs framework illustrates what companies now know, that the amount of data being collected in their databases has been growing exponentially in size and complexity. Traditional relational databases are good at managing structured data but are not well suited to managing and processing the amounts and types of data being collected in today's business environment. 18. What is Haddop and what are its basic components? In order to create value from their previously unused Big Data stores, companies are using new Big Data technologies. These emerging technologies allow organizations to process massive data stores of multiple formats in cost-effective ways. Some of the most frequently used Big Data technologies are Hadoop and MapReduce. • Hadoop is a Java based, open source, high speed, fault-tolerant distributed storage and computational framework. Hadoop uses low-cost hardware to create clusters of thousands of computer nodes to store and process data. Hadoop originated from Google's work on distributed file systems and parallel processing and is currently supported by the Apache Software Foundation. Hadoop has several modules, but the two main components are Hadoop Distributed File System (HDFS) and MapReduce. • Hadoop Distributed File System (HDFS) is a highly distributed, fault-tolerant file storage system designed to manage large amounts of data at high speeds. In order to achieve high throughput, HDFS uses the write-once, read many model. This means that once the data is written, it cannot be modified. HDFS uses three types of nodes: a name node that stores all the metadata about the file system; a data node that stores fixed-size data blocks (that could be replicated to other data nodes) and a client node that acts as the interface between the user application and the HDFS. • MapReduce is an open source application programming interface (API) that provides fast data analytics services. MapReduce distributes the processing of the data among thousands of nodes in parallel. MapReduce works with structured and nonstructured data. The MapReduce framework provides two main functions, Map and Reduce. In general terms, the Map function takes a job and divides it into smaller units of work; the Reduce function collects all the output results generated from the nodes and integrates them into a single result set. 19. Define and describe the basic characteristics of a NoSQL database. Every time you search for a product on Amazon, send messages to friends in Facebook, watch a video in YouTube or search for directions in Google Maps, you are using a NoSQL database. NoSQL refers to a new generation of databases that address the very specific challenges of the “big data” era and have the following general characteristics: • Not based on the relational model. These databases are generally based on a variation of the key-value data model rather than in the relational model, hence the NoSQL name. The key-value data model is based on a structure composed of two data elements: a key and a value; in which for every key there is a corresponding value (or a set of values). The key-value data model is also referred to as the attribute-value or associative data model. In the key-value data model, each row represents one attribute of one entity instance. The “key” column points to an attribute and the “value” column contains the actual value for the attribute. The data type of the “value” column is generally a long string to accommodate the variety of actual data types of the values that are placed in the column. • Support distributed database architectures. One of the big advantages of NoSQL databases is that they generally use a distributed architecture. In fact, several of them (Cassandra, Big Table) are designed to use low cost commodity servers to form a complex network of distributed database nodes • Provide high scalability, high availability and fault tolerance. NoSQL databases are designed to support the ability to add capacity (add database nodes to the distributed database) when the demand is high and to do it transparently and without downtime. Fault tolerant means that if one of the nodes in the distributed database fails, the database will keep operating as normal. • Support very large amounts of sparse data. Because NoSQL databases use the key-value data model, they are suited to handle very high volumes of sparse data; that is for cases where the number of attributes is very large but the number of actual data instances is low. • Geared toward performance rather than transaction consistency. One of the biggest problems of very large distributed databases is to enforce data consistency. Distributed databases automatically make copies of data elements at multiple nodes – to ensure high availability and fault tolerance. If the node with the requested data goes down, the request can be served from any other node with a copy of the data. However, what happen if the network goes down during a data update? In a relational database, transaction updates are guaranteed to be consistent or the transaction is rolled back. NoSQL databases sacrifice consistency in order to attain high levels of performance. NoSQL databases provide eventual consistency. Eventual consistency is a feature of NoSQL databases that indicates that data are not guaranteed to be consistent immediately after an update (across all copies of the data) but rather, that updates will propagate through the system and eventually all data copies will be consistent. 20. Using the example of a medical clinic with patients and tests, provide a simple representation of how to model this example using the relational model and how it wold be represented using the key-value data modeling technique. As you can see in Figure Q2.20, the relational model stores data in a tabular format in which each row represents a “record” for a given patient. While, the key-value data model uses three differnet fields to represent each data element in the record. Therefore, for each patient row, there are three rows in the key-value model. 21. What is logical independence? Logical independence exists when you can change the internal model without affecting the conceptual model. When you discuss logical and other types of independence, it’s worthwhile to discuss and review some basic modeling concepts and terminology: • In general terms, a model is an abstraction of a more complex real-world object or event. A model’s main function is to help you understand the complexities of the real-world environment. Within the database environment, a data model represents data structures and their characteristics, relations, constraints, and transformations. As its name implies, a purely conceptual model stands at the highest level of abstraction and focuses on the basic ideas (concepts) that are explored in the model, without specifying the details that will enable the designer to implement the model. For example, a conceptual model would include entities and their relationships and it may even include at least some of the attributes that define the entities, but it would not include attribute details such as the nature of the attributes (text, numeric, etc.) or the physical storage requirements of those atttributes. • The terms data model and database model are often used interchangeably. In the text, the term database model is be used to refer to the implementation of a data model in a specific database system. • Data models (relatively simple representations, usually graphical, of more complex real-world data structures), bolstered by powerful database design tools, have made it possible to substantially diminish the potential for errors in database design. • The internal model is the representation of the database as “seen” by the DBMS. In other words, the internal model requires the designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model. • An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database. • The external model is the end users’ view of the data environment. 22. What is physical independence? You have physical independence when you can change the physical model without affecting the internal model. Therefore, a change in storage devices or methods and even a change in operating system will not affect the internal model. The terms physical model and internal model may require a bit of additional discussion: • The physical model operates at the lowest level of abstraction, describing the way data are saved on storage media such as disks or tapes. The physical model requires the definition of both the physical storage devices and the (physical) access methods required to reach the data within those storage devices, making it both software- and hardware-dependent. The storage structures used are dependent on the software (DBMS, operating system) and on the type of storage devices that the computer can handle. The precision required in the physical model’s definition demands that database designers who work at this level have a detailed knowledge of the hardware and software used to implement the database design. • The internal model is the representation of the database as “seen” by the DBMS. In other words, the internal model requires the designer to match the conceptual model’s characteristics and constraints to those of the selected implementation model. An internal schema depicts a specific representation of an internal model, using the database constructs supported by the chosen database. Problem Solutions Use the contents of Figure 2.1 to work problems 1-3. 1. Write the business rule(s) that governs the relationship between AGENT and CUSTOMER. Given the data in the two tables, you can see that an AGENT – through AGENT_CODE -- can occur many times in the CUSTOMER table. But each customer has only one agent. Therefore, the business rules may be written as follows: One agent can have many customers. Each customer has only one agent. Given these business rules, you can conclude that there is a 1:M relationship between AGENT and CUSTOMER. 2. Given the business rule(s) you wrote in Problem 1, create the basic Crow’s Foot ERD. The Crow’s Foot ERD is shown in Figure P2.2a. Figure P2.2a The Crow’s Foot ERD for Problem 3 For discussion purposes, you might use the Chen model shown in Figure P2.2b. Compare the two representations of the business rules by noting the different ways in which connectivities (1,M) are represented. The Chen ERD is shown in Figure P2.2b. Figure P2.2b The Chen ERD for Problem 2 3. Using the ERD you drew in Problem 2, create the equivalent Object representation and UML class diagram. (Use Figure 2.4 as your guide.) The OO model is shown in Figure P2.3. Figure P2.3a The OO Model for Problem 3 Figure P.3b The UML Model for Problem 3 Using Figure P2.4 as your guide, work Problems 4–5. The DealCo relational diagram shows the initial entities and attributes for the DealCo stores, located in two regions of the country. Figure P2.4 The DealCo relational diagram 4. Identify each relationship type and write all of the business rules. One region can be the location for many stores. Each store is located in only one region. Therefore, the relationship between REGION and STORE is 1:M. Each store employs one or more employees. Each employee is employed by one store. (In this case, we are assuming that the business rule specifies that an employee cannot work in more than one store at a time.) Therefore, the relationship between STORE and EMPLOYEE is 1:M. A job – such as accountant or sales representative -- can be assigned to many employees. (For example, one would reasonably assume that a store can have more than one sales representative. Therefore, the job title “Sales Representative” can be assigned to more than one employee at a time.) Each employee can have only one job assignment. (In this case, we are assuming that the business rule specifies that an employee cannot have more than one job assignment at a time.) Therefore, the relationship between JOB and EMPLOYEE is 1:M. 5. Create the basic Crow’s Foot ERD for DealCo. The Crow’s Foot ERD is shown in Figure P2.5a. Figure P2.5a The Crow’s Foot ERD for DealCo The Chen model is shown in Figure P2.5b. (Note that you always read the relationship from the “1” to the “M” side.) Figure P2.5b The Chen ERD for DealCo Using Figure P2.6 as your guide, work Problems 6−8 The Tiny College relational diagram shows the initial entities and attributes for Tiny College. Figure P2.6 The Tiny College relational diagram 6. Identify each relationship type and write all of the business rules. The simplest way to illustrate the relationship between ENROLL, CLASS, and STUDENT is to discuss the data shown in Table P2.6. As you examine the Table P2.6 contents and compare the attributes to relational schema shown in Figure P2.6, note these features: • We have added an attribute, ENROLL_SEMESTER, to identify the enrollment period. • Naturally, no grade has yet been assigned when the student is first enrolled, so we have entered a default value “NA” for “Not Applicable.” The letter grade – A, B, C, D, F, I (Incomplete), or W (Withdrawal) -- will be entered at the conclusion of the enrollment period, the SPRING-12 semester. • Student 11324 is enrolled in two classes; student 11892 is enrolled in three classes, and student 10345 is enrolled in one class. Table P2.6 Sample Contents of an ENROLL Table STU_NUM CLASS_CODE ENROLL_SEMESTER ENROLL_GRADE 11324 MATH345-04 SPRING-14 NA 11324 ENG322-11 SPRING-14 NA 11892 CHEM218-05 SPRING-14 NA 11892 ENG322-11 SPRING-14 NA 11892 CIS431-01 SPRING-14 NA 10345 ENG322-07 SPRING-14 NA All of the relationships are 1:M. The relationships may be written as follows: COURSE generates CLASS. One course can generate many classes. Each class is generated by one course. CLASS is referenced in ENROLL. One class can be referenced in enrollment many times. Each individual enrollment references one class. Note that the ENROLL entity is also related to STUDENT. Each entry in the ENROLL entity references one student and the class for which that student has enrolled. A student cannot enroll in the same class more than once. If a student enrolls in four classes, that student will appear in the ENROLL entity four times, each time for a different class. STUDENT is shown in ENROLL. One student can be shown in enrollment many times. (In database design terms, “many” simply means “more than once.”) Each individual enrollment entry shows one student. 7. Create the basic Crow’s Foot ERD for Tiny College. The Crow’s Foot model is shown in Figure P2.7a. Figure P2.7a The Crow’s Foot Model for Tiny College The Chen model is shown in Figure P2.7b. Figure P2.7b The Chen Model for Tiny College 8. Create the UML class diagram that reflects the entities and relationships you identified in the relational diagram. The OO model is shown in Figure P2.8. Figure P2.8a The OO Model for Tiny College Figure P2.8b The UML Model for Tiny College 9. Typically, a patient staying in a hospital receives medications that have been ordered by a particular doctor. Because the patient often receives several medications per day, there is a 1:M relationship between PATIENT and ORDER. Similarly, each order can include several medications, creating a 1:M relationship between ORDER and MEDICATION. a. Identify the business rules for PATIENT, ORDER, and MEDICATION. The business rules reflected in thePATIENT description are: A patient can have many (medical) orders written for him or her. Each (medical) order is written for a single patient. The business rules refected in the ORDER description are: Each (medical) order can prescribe many medications. Each medication can be prescribed in many orders. The business rules refected in the MEDICATION description are: Each medication can be prescribed in many orders. Each (medical) order can prescribe many medications. b. Create a Crow's Foot ERD that depicts a relational database model to capture these business rules. Figure P2.9 Crow's foot ERD for Problem 9 10. United Broke Artists (UBA) is a broker for not-so-famous painters. UBA maintains a small network database to track painters, paintings, and galleries. A painting is painted by a particular artist, and that painting is exhibited in a particular gallery. A gallery can exhibit many paintings, but each painting can be exhibited in only one gallery. Similarly, a painting is painted by a single painter, but each painter can paint many paintings. Using PAINTER, PAINTING, and GALLERY, in terms of a relational database: a. What tables would you create, and what would the table components be? We would create the three tables shown in Figure P2.10a. (Use the teacher’s Ch02_UBA database in your instructor's resources to illustrate the table contents.) FIGURE P2.10a The UBA Database Tables As you discuss the UBA database contents, note in particular the following business rules that are reflected in the tables and their contents: • A painter can paint may paintings. • Each painting is painted by only one painter. • A gallery can exhibit many paintings. • A painter can exhibit paintings at more than one gallery at a time. (For example, if a painter has painted six paintings, two may be exhibited in one gallery, one at another, and three at the third gallery. Naturally, if galleries specify exclusive contracts, the database must be changed to reflect that business rule.) • Each painting is exhibited in only one gallery. The last business rule reflects the fact that a painting can be physically located in only one gallery at a time. If the painter decides to move a painting to a different gallery, the database must be updated to remove the painting from one gallery and add it to the different gallery. b. How might the (independent) tables be related to one another? Figure P2.10b shows the relationships. FIGURE P2.10b The UBA Relational Model 11. Using the ERD from Problem 10, create the relational schema. (Create an appropriate collection of attributes for each of the entities. Make sure you use the appropriate naming conventions to name the attributes.) The relational diagram is shown in Figure P2.11. FIGURE P2.11 The Relational Diagram for Problem 11 12. Convert the ERD from Problem 10 into the corresponding UML class diagram. The basic UML solution is shown in Figure P2.12. FIGURE P2.12 The UML for Problem 12 13. Describe the relationships (identify the business rules) depicted in the Crow’s Foot ERD shown in Figure P2.13. Figure P2.13 The Crow’s Foot ERD for Problem 13 The business rules may be written as follows: • A professor can teach many classes. • Each class is taught by one professor. • A professor can advise many students. • Each student is advised by one professor. 14. Create a Crow’s Foot ERD to include the following business rules for the ProdCo company: a. Each sales representative writes many invoices. b. Each invoice is written by one sales representative. c. Each sales representative is assigned to one department. d. Each department has many sales representatives. e. Each customer can generate many invoices. f. Each invoice is generated by one customer. The Crow’s Foot ERD is shown in Figure P2.23. Note that a 1:M relationship is always read from the one (1) to the many (M) side. Therefore, the customer-invoice relationship is read as “one customer generates many invoices.” Figure P2.14 Crow’s Foot ERD for the ProdCo Company 15. Write the business rules that are reflected in the ERD shown in Figure P2.15. (Note that the ERD reflects some simplifying assumptions. For example, each book is written by only one author. Also, remember that the ERD is always read from the “1” to the “M” side, regardless of the orientation of the ERD components.) FIGURE P2.15 The Crow’s Foot ERD for Problem 15 The relationships are best described through a set of business rules: • One publisher can publish many books. • Each book is published by one publisher. • A publisher can submit many (book) contracts. • Each (book) contract is submitted by one publisher. • One author can sign many contracts. • Each contract is signed by one author. • One author can write many books. • Each book is written by one author. This ERD will be a good basis for a discussion about what happens when more realistic assumptions are made. For example, a book – such as this one – may be written by more than one author. Therefore, a contract may be signed by more than one author. Your students will learn how to model such relationships after they have become familiar with the material in Chapter 3. 16. Create a Crow’s Foot ERD for each of the following descriptions. (Note: The word many merely means “more than one” in the database modeling environment.) a. Each of the MegaCo Corporation’s divisions is composed of many departments. Each of those departments has many employees assigned to it, but each employee works for only one department. Each department is managed by one employee, and each of those managers can manage only one department at a time. The Crow’s Foot ERD is shown in Figure P2.16a. FIGURE P2.16a The MegaCo Crow’s Foot ERD As you discuss the contents of Figure P2.16a, note the 1:1 relationship between the EMPLOYEE and the DEPARTMENT in the “manages” relationship and the 1:M relationship between the DEPARTMENT and the EMPLOYEE in the “is assigned to” relationship. b. During some period of time, a customer can download many ebooks from BooksOnline. Each of the ebooks can be downloaded by many customers during that period of time. The solution is presented in Figure P2.16b. Note the M:N relationship between CUSTOMER and EBOOK. Such a relationship is not implementable in a relational model. FIGURE P2.16b The BigVid Crow’s Foot ERD If you want to let the students convert Figure P2.16b’s ERD into an implementable ERD, add a third DOWNLOAD entity to create a 1:M relationship between CUSTOMER and DOWNLOAD and a 1:M relationship between EBOOK and DOWNLOAD. (Note that such a conversion has been shown in the next problem solution.) c. An airliner can be assigned to fly many flights, but each flight is flown by only one airliner. FIGURE P2.16c The Airline Crow’s Foot ERD We have created a small Ch02_Airline database to let you explore the implementation of the model. (Check the data files available for Instructors at www.cengagebrain.com.) The tables and the relational diagram are shown in the following two figures. FIGURE P2.16c The Airline Database Tables FIGURE P2.16c The Airline Relational Diagram d. The KwikTite Corporation operates many factories. Each factory is located in a region. Each region can be “home” to many of KwikTite’s factories. Each factory employs many employees, but each of those employees is employed by only one factory. The solution is shown in Figure P2.16d. FIGURE P2.16d The KwikTite Crow’s Foot ERD e. An employee may have earned many degrees, and each degree may have been earned by many employees. The solution is shown in Figure P2.16e. FIGURE P2.16e The Earned Degree Crow’s Foot ERD Note that this M:N relationship must be broken up into two 1:M relationships before it can be implemented in a relational database. Use the Airline ERD’s decomposition in Figure P2.16c as the focal point in your discussion. 17. Write the business rules that are reflected in the ERD shown in Figure P2.17. A theater show many movies. A movie can be shown in many theaters. A movie can receive many reviews. Each review is for a single movie. A reviewer can write many reviews. Each review is written by a single reviewer. Note that the M:N relationship between theater and movie must be broken into two 1:M relationships using a bridge table before it can be implemented in a relational database. Solution Manual for Database Systems: Design, Implementation, and Management Carlos Coronel, Steven Morris 9781337627900, 9781305627482
Close