Chapter 9 Designing Databases Chapter Overview The purpose of this chapter is to present a detailed description of the concepts and techniques used to translate a conceptual data model into a form necessary for database design. This chapter introduces the relational data model—the most common notation used for representing detailed data requirements necessary for database design. Concepts of the relational data model, normalization principles for creating relational models with desirable properties, a process for combining different relational data models into a consolidated one, and how to translate an entity-relationship data model into a relational data model are presented. This chapter provides a transition from typical systems analysis to data analysis methodologies, often discussed in a database course. You should coordinate the teaching of this chapter with the faculty member(s) teaching database-related courses so that purposeful redundancy occurs and important topics are not missed across the courses in your curriculum. Instructional Objectives Specific student learning objectives are included in the beginning of the chapter. From an instructor’s point of view, the objectives of this chapter are to: 1. Show the relationship between systems analysis and design and logical database design. The philosophy of this textbook is that logical database design is a topic of joint interest to both systems analysts and database specialists. In most cases, there is considerable interaction between conceptual and logical database design, which combines the top-down approach of conceptual data modeling with a bottom-up approach using logical data modeling tools. Application area knowledge as well as enterprise database knowledge (often found in a data administration group) are necessary to converge to a viable logical design for an application’s database. Remember, an application’s logical database design does not imply a separate physical database for that application, but rather only a separate view of data, which may be part of a more comprehensive physical database. 2. Present the relational data model as a logical data model that can be used to capture the structure of data in a very fundamental, stable form and that suggests some ways to organize data during physical database design that will result in desirable data maintenance properties (which avoid certain data anomalies). 3. Show students how a conceptual data model can obscure some of the details about data requirements that must be better understood to do physical database design. 4. Show students, using an example from Hoosier Burger, how to translate a conceptual data model into a logical data model and how to incorporate the data requirements of specific system outputs into the process of forming a logical data model. Classroom Ideas 1. This chapter should be read following the discussion on the design of the human interface (Chapter 8), because some logical database design activities, (specifically view integration and merging normalized views of individual system inputs and outputs) build on the designs of system interfaces. Since an entity-relationship diagram is often modified once specific system inputs and outputs have been studied and a more precise understanding of attributes, entities, and relationships is reached, logical database design will begin with a possibly incomplete view of data if done before the design of system inputs and outputs. 2. This chapter, like Chapter 6, covers a topic also addressed in most database management courses. Depending on your curriculum, this chapter may review previously covered material or may be covered (in more depth) in a subsequent course. We believe that logical data modeling is not strictly a database topic, but is essential for thorough systems analysis, thus it is an activity that should not be assigned to only specialists (database analysts). Although we strongly encourage you to cover this chapter in your systems analysis and design course, you should coordinate how you address this topic with those who teach database courses. Even though this chapter has considerable similarity with the chapter on logical data modeling in the companion text Modern Database Management by Hoffer, Prescott, and McFadden, Chapter 9 is carefully written for the systems analysis and design student. We believe that this chapter is an excellent refresher for those who have already studied the relational data model and normalization, and will provide a solid introduction to these topics for those students who will address this topic later in a database course. 3. You should emphasize with your students that logical data modeling is still technology independent. A logical data model is not necessarily mapped on a one-to-one basis into a physical database design. The purpose of logical data modeling is to prepare the description of stored data requirements into a format that makes it easier for physical database design decisions to be made. Students may want to go directly from conceptual data modeling to physical database design, so spend some time motivating logical data modeling. 4. Be sure to review for your students the key steps of logical data modeling outlined in the section “The Process of Database Design” and discussed throughout the chapter. It is important that novice data modelers understand that all four steps are necessary to produce a thorough, logical data model as input to physical database design. 5. Understanding the relational data model is critical for doing logical data modeling. The relational data model is fairly simple, and since some students will have had experience with a PC database management system, this model will be rather intuitive for them. Emphasize the five properties of relations and the concept of anomalies. Use figures in the chapter to illustrate anomalies. 6. A discussion of anomalies naturally leads to normalization and functional dependencies. You can introduce normalization to your students from two perspectives. One approach is to introduce first, second, and third normal forms and teach your students to transform unnormalized data into third normal form by stepping through each normal form in sequence. Another approach is to emphasize functional dependencies and determinants (see Problem and Exercise 6). Use whichever approach is most comfortable to you; the chapter supports either approach. 7. We suggest spending a large portion of your class periods (allocated to Chapter 9) working problems to translate between E-R and relational data models. Table 9–1 is a compact summary of how to map E-R constructs into relational constructs. Your students should become competent with translating in either direction. Problems and Exercises 9-18, 9-19, 9-20,9-21 and 9-22 are suitable for in-class exercises, but you may want to create many other examples. Work a few examples for your students, and then have them work (either individually or in small teams) on some problems in class and then share their answers. Practice is absolutely the best teacher of both normalization and translating between relational and E-R models. 8. Emphasize with your students that most E-R models developed during analysis are incomplete since system inputs and outputs are not designed in detail until the logical design phase. Use this to motivate the need for view integration. Be sure to discuss the potential pitfalls (view integration problems) that make view integration more than a mechanical process. Again, use many examples; Problem and Exercise 9-20 is a fairly simple one that students could work on inside or outside of class. Ask your students why view integration problems arise; for example, sample problems include independent analysts or project teams of different subsystems with slightly different data semantics, the integration of multiple independently-developed applications coming together to create an enterprise data model, and imprecision or lack of naming standards by analysts. 9. You may want to review in class the Hoosier Burger example presented in the chapter. This example illustrates how to deal with both translating an E-R model into 3NF relations as well as integrating specific system requirements into the data model during logical database design. Lecture Notes As Figure 9–1 shows, designing databases is one of the primary design phase activities. The five purposes of logical and physical database design are to: (1) structure the data in stable structures that are not likely to change over time and that have minimal redundancy; (2) develop a logical database design that reflects the actual data requirements that exist in the forms and reports of an information system; (3) develop a logical database design from which we can do physical database design; (4) translate a relational database model into a technical file and database design; and (5) choose data-storage technologies that will efficiently, accurately, and securely process database activities. Database Design Logical design and physical design are the two primary file and database design steps. Although database modeling and design activities are present in all SDLC phases, this chapter focuses primarily on the logical data model, physical file, and database design. Figure 9–2 associates the various database models with their corresponding SDLC phases. The four steps in logical database modeling and design are: (1) develop a logical data model for each known user interface for the application using normalization principles; (2) combine normalized data requirements from all user interfaces into one consolidated logical database model; (3) translate the conceptual E-R data model for the application into normalized data requirements; and (4) compare the consolidated logical database design with the translated E-R model and produce one final logical database model for the application. Important physical database design decisions include: (1) choosing the data type for each attribute from the logical database model; (2) grouping attributes from the logical database model into physical records; (3) arranging related records in secondary memory so that individual and groups of records can be stored, retrieved, and update rapidly; and (4) selecting media and structures for storing data to make access more efficient. A primary key is an attribute whose value is unique across all occurrences of a relation. Figure 9–3 provides an example of logical data modeling. The primary deliverable from logical database design is a set of normalized relations. During physical database design, the normalized relations are translated into computer file specifications. Coding the database definitions is performed during implementation; Figure 9–4 shows a table in design view. Relational Database Model The relational database model represents data as a set of related tables or relations. Figure 9–5 shows an example of a relation. A shorthand notation is often used to represent the relation’s structure. A relation has several properties which include: (1) entries in cells are simple; (2) entries in columns are from the same set of values; (3) each row is unique; (4) the sequence of columns can be interchanged without changing the meaning or use of the relation; and (5) the rows may be interchanged or stored in any sequence. A well-structured relation contains a minimum amount of redundancy and allows users to insert, modify, and delete the rows without errors or inconsistencies. Figure 9–6 shows a relation with redundancy. Normalizing the relation in Figure 9–6 results in two relations; these relations are shown in Figure 9–5 and Figure 9–7. Normalization Normalization converts complex data structures into simple, stable data structures. The process of normalizing relations requires the relations to be placed in first, second, and third normal forms. Normalization results in every nonprimary key attribute depending on the whole primary key and not on other nonkey attributes. Normalization analyzes the functional dependencies existing among attributes. A functional dependency is noted by using an arrow, such as AB. If an attribute is functionally dependent on two attributes, the notation is A, BC. Figure 9–8 can be used to illustrate the presence or nonpresence of functional dependencies. A relation is in second normal form (2NF) when the nonkey attributes are functionally dependent on the entire primary key, not just part of it. A relation is in second normal form when: (1) the primary key consists of only one attribute; (2) no nonprimary key attributes exist in the relation, and (3) every nonprimary key attribute is functionally dependent on the full set of primary key attributes. Converting a relation to second normal form requires decomposing the relation into new relations and using the determinants as the primary keys for the new relations. A relation is in third normal form (3NF) if it is in second normal form and does not have transitive dependencies. Figure 9–9 illustrates the removal of transitive dependencies from a relation. A foreign key is an attribute that appears as a nonprimary key attribute in one relation and as a primary key attribute (or part of a primary key) in another relation. Referential integrity is an integrity constraint specifying that the value of an attribute in one relation depends on the value of the same attribute in another relation. Transforming E-R Diagrams Into Relations Transforming the conceptual E-R diagram into normalized relations requires four steps; (1) representing entities; (2) representing relationships; (3) normalizing the relations; and (4) merging the relations. Each regular entity in an E-R diagram becomes a relation; the entity type’s identifier becomes the primary key for the new relation; the entity type’s nonkey attributes become nonkey attributes of the relation. The new primary key must uniquely identify every row in the relation and be nonredundant. Figure 9–10 shows how to transform an entity type to a relation. Relationship representation depends on the degree of the relationship and the cardinalities of the relationship. A binary 1:N relationship requires adding the primary key attribute of the entity on the one side of the relationship as a foreign key in the relation on the many side of the relationship. Figure 9–11 illustrates this rule. A binary or unary 1:1 relationship is represented by: (1) adding the primary key of A as a foreign key of B; (2) adding the primary key of B as a foreign key of A; or (3) doing both of the above. For binary and higher-degree M:N relationships, a separate relation is created, using as its primary key a composite of the primary keys of the entities participating in the relationship. Figure 9–12 shows how to represent an M:N relationship. Figure 9–13 provides two examples of unary relationships. To represent a unary 1:N relationship, use a recursive foreign key. For a unary M:N relationship, a separate relation represents the relationship. Table 9–1 summarizes the rules for transforming E-R diagrams into equivalent relations. Merging Relations View integration is the last step in logical database design. View integration involves merging relations that describe the same entity type. View integration problems include synonyms, homonyms, and dependencies between nonkeys. Logical Database Design for Hoosier Burger Hoosier Burger is used to illustrate logical database design. Figures 9–14 to 9–16 support this example. Physical File and Database Design Designing physical files and databases requires information about: (1) normalized relations, including volume estimates; (2) definitions of each attribute; (3) descriptions of where and when data are used; (4) expectations or requirements for response time and data integrity; and (5) descriptions of the technologies used for implementing the files and database. Designing Fields A field is the smallest unit of named application data recognized by system software; an attribute is represented by one or more fields. Basic decisions address data type and data integrity controls. Table 9–2 identifies several Microsoft Access data types. When selecting a data type, the analyst must balance four objectives: (1) minimizing storage space; (2) representing all possible values of the field; (3) improving data integrity for the field; and (4) supporting all data manipulations desired on the field. A calculated field is a field that can be derived from other database fields. Coding and compression techniques can reduce storage space and increase data integrity. Five popular data integrity control methods are default value, input mask, range control, referential integrity, and null value control. Figure 9–17 provides examples of referential integrity field controls. A default value is a value a field will assume unless an explicit value is entered for that field. An input mask is a pattern of codes that restricts the width and possible values for each position of a field. A null value is a special field value, distinct from 0, blank, or any other value, that indicates that the value for the field is missing or otherwise unknown. Designing Physical Tables While a relation contains attributes describing some unifying business concept, a physical table is a named set of rows and columns that specifies the fields in each row of a table. A physical table may or may not correspond to one relation. The efficient use of secondary storage space and data processing speed are the two goals of physical table design. Physical table design may require the denormalization of relations. Figure 9–18 illustrates the denormalization process. An analyst may choose to denormalize data when there are two entities with a one-to-one relationship, is a many-to-many relationship with nonkey attributes, and is reference data. Denormalization results in one or more physical files. Figure 9–19 shows possible denormalization situations. A file organization is a technique for physically arranging the records of a file. When selecting a file organization, the choice is based on fast data retrieval, high throughput for processing transactions, efficient use of storage space, protection from failures or data loss, minimizing need for reorganization, accommodating growth, and security from unauthorized use. Many file organizations use a pointer, a field of data that can be used to locate a related field or row of data. Three file organizations are discussed in the chapter; these are sequential, indexed, and hashed. Figure 9–20 compares these three file organizations. Table 9–3 summarizes the comparative features of the sequential, indexed, and hashed file organization methods. When specifying indexes for a relational database, the analyst should specify a unique index for the primary key of each table, an index for foreign keys, and an index for nonkey fields that are referenced in qualification, sorting, and grouping commands for the purpose of retrieving data. A secondary key refers to one or a combination of fields for which more than one row may have the same combination of values. Physical Database Design for Hoosier Burger The authors use Hoosier Burger to illustrate how to translate a logical database design into a physical database design. References are made to several figures that appeared earlier in the chapter. Tables 9–4 — 9–6 support this discussion. PVF WebStore: Designing Databases Important database design concepts are illustrated through Valley Furniture’s WebStore. As suggested in the chapter, database design for an Internet-based electronic commerce application is similar to the process followed for other types of applications. Key Term Checkpoint Solutions Answers for the Key Terms Checkpoint section are provided below. The number following each key term indicates its location in the key term list. Review Questions Solutions 9-1. What is the purpose of normalization? Answer: The purpose of normalization is to rid relations of anomalies. The goal is to form well-structured relations that are simple and stable when data values change or data are added or deleted. 9-2. List five properties of relations. Answer: The five properties of relations are: (1) entries in cells are simple, (2) entries in columns are from the same set of values, (3) each row is unique, (4) the sequence of columns is insignificant, and (5) the sequence of rows is insignificant. 9-3. What problems can arise during view integration or merging relations? Answer: Synonyms, homonyms, and dependencies between nonkeys can arise during view integration. Synonyms occur when two or more different names are used for the same attribute from different user views. Homonyms occur when two or more attributes from different user views have the same name. Functional dependencies between nonkey attributes arise when functionally dependent nonkeys come from different user views. 9-4. How are relationships between entities represented in the relational data model? Answer: Relationships between entities are represented in several ways in the relational data model. A binary 1:M relationship is represented by placing a foreign key (the primary key of the entity on the one-side of the relationship) in the relation for the entity on the many-side of the relationship. In a binary 1:1 relationship, a foreign key is placed in the relation on either side of the relationship or on both sides. For a binary and higher degree M:N relationship, a relation is created with a primary key which is the concatenation of the primary keys from the related entities. In a unary relationship, a recursive foreign key is added to the relation. 9-5. What is the relationship between the primary key of a relation and the functional dependencies among all attributes within that relation? Answer: The fundamental rule of normalization is that each non-key attribute must be a fully functionally dependent on the whole primary key attribute (a nonkey is dependent on the whole key and nothing but the key). Thus, there can be no functional dependencies between nonkeys. 9-6. How is a foreign key represented in relational notation? Answer: A foreign key is identified by using a dashed underline. 9-7. Can instances of a relation (sample data) prove the existence of a functional dependency? Why or why not? Answer: Instances in a relation cannot prove that a functional dependency exists; however, you can use sample data to demonstrate that a functional dependency does not exist. The sample data does not show you every possible instance, only a sampling. Knowledge of the problem domain is a reliable method for identifying functional dependencies. 9-8. In what way does the choice of a data type for a field help to control the integrity of that field? Answer: The choice of data type often limits the possible values that may be stored for a field. For example, a numeric data type forbids alphabetic characters. Some data types have an assumed length, which places an implicit range control on values. Data type may also limit the kinds of data manipulations possible, thus further controlling the integrity of the data or results from manipulating the data. For example, a DATE data type causes addition and subtraction to be limited by rules about dates. 9-9. Contrast the differences between range control and referential integrity when controlling data integrity. Answer: Range controls identify a set of permissible values, such as college ranking (senior, junior, sophomore, and freshman). Referential integrity requires a foreign key to match a primary key in another relation. Alternatively, referential integrity would require a recursive foreign key to match the primary key of that same relation. A referential integrity control requires that the data management software access other data records to determine if the value is permitted, whereas a range control is checked by looking up values outside the files and database, in a repository or other source of metadata. 9-10. What is the purpose of denormalization? Why might you not want to create one physical table or file for each relation in a logical data model? Answer: Denormalization provides for more efficient data processing. Denormalization enables data to be stored in physical tables based on affinity of use; this feature reduces the number of I/O operations. 9-11. What factors influence the decision to create an index on a field? Answer: The factors that influence the decision to create an index are the data retrieval, insertion, deletion, and updating costs with and without the index. Indexes allow for rapid random retrieval and sorting of data, but indexes create additional storage and maintenance costs. The guidelines presented in the chapter offer suggestions for when to use an index; these guidelines include specifying a unique index for the primary key of each table; specifying an index for foreign keys, and specifying an index for nonkey fields that are referenced in qualification, sorting, and grouping commands for the purpose of retrieving data. 9-12. Explain the purpose of data compression techniques. Answer: Coding and compression techniques reduce storage space and can increase data integrity. Data compression techniques are pattern matching and other methods that replace repeating strings of characters with codes of shorter lengths, thus reducing data storage requirements. 9-13. What are the goals of designing physical tables? Answer: The two goals mentioned in the chapter are efficient use of secondary storage space and data processing speed. 9-14. What are the seven factors that should be considered in selecting a file organization? Answer: When selecting a file organization, the analyst should consider fast data retrieval, high throughput for processing transactions, efficient use of storage space, protection from failures or data loss, minimizing need for reorganization, accommodating growth, and security from unauthorized use. 9-15. What are the four steps in logical database modeling and design? Answer: The four steps in logical database modeling and design are: (1) develop a logical data model for each known user interface for the application using normalization principles; (2) combine normalized data requirements from all user interfaces into one consolidated logical database model; (3) translate the conceptual E-R data model for the application into normalized data requirements, and (4) compare the consolidated logical database design with the translated E-R model and produce one final logical database model for the application. 9-16. What are the four steps in transforming an E-R diagram into normalized relations? Answer: The four steps are (1) represent entities; (2) represent relationships; (3) normalize relations; and (4) merge the relations. Problems and Exercises Solutions 9-17. Assume that at Pine Valley Furniture products are comprised of components, products are assigned to salespersons, and components are produced by vendors. Also assume that in the relation PRODUCT(Prodname, Salesperson, Compname, Vendor) Vendor is functionally dependent on Compname, and Compname is functionally dependent on Prodname. Eliminate the transitive dependency in this relation and form 3NF relations. Answer: The wording in the problem clarifies some of the relationships and associated cardinalities. Since VENDOR is functionally dependent on COMPNAME, there can be at most one vendor for each component (and we will assume that there may be no vendor for a given component). Also, since COMPNAME is functionally dependent on PRODNAME, there can be at most one component per product (a rather odd situation, but that is what the wording says; we will assume that some products have no components). Although not clarified in the problem, we will assume that a product is assigned to exactly one salesperson, while a salesperson can be assigned to one-to-many products. Given these clarifications, the 3NF relations are (foreign keys are shown with a dashed underline): PRODUCT (Prodname, Salesperson, Compname) SALESPERSON (Salesperson) COMPONENT (Compname, Vendor) VENDOR (Vendor) 9-18. Transform the E-R Diagram of Figure 7-20 into a set of 3NF relations. Make up a primary key and one or more nonkeys for each entity that does not already have them listed. Answer: Listed below are 3NF relations for the E–R diagram. Foreign keys are shown with a dashed underline. The SHOPPING_CART entity is called a weak or attributive entity, since its existence depends on instances of all three entities CUSTOMER, ORDER and INVENTORY; however, the SHOPPING_CART entity has its own primary key. The 3NF relations are: CUSTOMER (Cust_ID, Name, Address) ORDER (Order_ID, Date) INVENTORY (Inv_ID, Item_name, Weight) SHOPPING_CART (Cart_ID Cust_ID, Order ID, INV ID, Qty) 9-19. Transform the E-R diagram of Figure 9–21 into a set of 3NF relations. Answer: Listed below are 3NF relations for the E–R diagram. Foreign keys are shown with a dashed underline. QUOTE QUANTITY is not sufficient as the primary key of the PRICEQUOTE relation, but this attribute along with the primary keys of the two associated entities is a sufficient concatenated (or composite) primary key for this associative entity. The PART RECEIPT entity is called a weak or attributive entity, since its existence depends on a PRICE QUOTE entity instance. In this case, however, the PART RECEIPT entity has its own primary key, ORDER NO. The 3NF relations are: VENDOR (Vendor No, Address) PRICEQUOTE (Vendor No, Item No, Quote Quantity, Price) PART (Item No, Desc) PART RECEIPT (Order No, Vendor No, Item No, Quote Quantity, Date, Order_Quantity) 9-20. Consider the list of individual 3NF relations that follow. These relations were developed from several separate normalization activities. PATIENT(Patient_ID, Room_Number, Admit_Date, Address) ROOM(Room_Number, Phone, Daily_Rate) PATIENT(Patient_Number, Treatment_Description, Address) TREATMENT(Treatment_ID, Description, Cost) PHYSICIAN(Physician_ID, Name, Department) PHYSICIAN(Physician_ID, Name, Supervisor_ID) a. Merge these relations into a consolidated set of 3NF relations. Make and state whatever assumptions you consider necessary to resolve any potential problems you identify in the merging process. b. Draw an E-R diagram for your answer to part a. Answer: Listed below are merged 3NF relations for this hospital example. This is an interesting exercise because it points out how semantically lacking the relational data model is, since questions arise about functional dependencies across separately developed relations. One observation is clear: The second 3NF PATIENT relation in the exercise has only one value of TREATMENT DESCRIPTION, so each patient must be associated with only one treatment, otherwise this relation would not be in 3NF. But, we must also assume that each department has only one supervisor and each supervisor can supervise only one department. This last assumption means that we could create only a supervisor or a department relation, but not both. This is sufficient because, if the original set of six 3NF relations is comprehensive, there are no nonkey attributes dependent on either DEPARTMENT or SUPERVISOR ID. We, however, create both supervisor and department relations, with a one-to-one relationship between them, to allow for some evolution of the data model. One additional assumption about supervisors: SUPERVISOR is a separate entity from PHYSICIAN. We also assume that the attribute ADDRESS means the same address in both PATIENT relations, and further there are no other synonyms or homonyms across the relations. Interestingly, there is no relationship between patient and physician implied in the original 3NF relations, and we assume none exists. With these assumptions, the merged relations are (foreign keys are shown with a dashed underline): PATIENT (Patient Number, Address, Room No, Admit Date, Treatment ID) ROOM (Room No, Phone, Daily Rate) PHYSICIAN (Physician ID, Name, Department ID) TREATMENT (Treatment ID, Description, Cost) SUPERVISOR (Supervisor ID, Department ID) DEPARTMENT (Department ID, Supervisor ID) To create the E-R diagram from these 3NF relations, we have to make additional assumptions about minimum cardinalities. We assume that every patient is assigned a room, but a room may be empty; not all treatments have to be associated with a patient, but a patient has to have a treatment; and that each department has one supervisor and each supervisor has one department. We show relationships from both a department and a supervisor to a physician, but only one is necessary; we also assume that a physician must be associated with both a department and a supervisor. This is an interesting E-R diagram since it contains two, disconnected parts. This is possible, although rare in actual organizations. The following is an E-R diagram, including attributes, for this situation. Microsoft Visio was used to create the E-R diagram. 9.21 Consider the following 3NF relations about a sorority or fraternity: MEMBER(Member_ID, Name, Address, Dues_Owed) OFFICE(Office_Name, Officer_ID, Term_Start_Date, Budget) EXPENSE(Ledger_Number, Office_Name, Expense_Date, Amt_Owed) PAYMENT(Check_Number, Expense_Ledger_Number, Amt_Paid) RECEIPT(Member_ID, Receipt_Date, Dues_Received) COMMITTEE(Committee_ID, Officer_in_Charge) WORKERS(Committee_ID, Member_ID) a. Foreign keys are not indicated in these relations. Decide which attributes are foreign keys and justify your decisions. b. Draw an E-R diagram for these relations, using your answer to part a. c. Explain the assumptions you made about cardinalities in your answer to part b. Explain why it is said that the E-R data model is more expressive or more semantically rich than the relational data model. Answer: There are several foreign keys in these relations. OFFICER_ID is a foreign key in OFFICE referencing MEMBER_ID from the MEMBER relation. OFFICE_NAME is a foreign key in EXPENSE referencing OFFICE_NAME in the OFFICE relation. OFFICER_IN_CHARGE is a foreign key in COMMITTEE referring to OFFICER_ID or OFFICE_NAME (which is not clear from simply the relations) in the OFFICE relation. EXPENSE_LEDGER_NUMBER is a foreign key in PAYMENT referencing LEDGER_NUMBER in the EXPENSE relation. MEMBER_ID in both RECEIPT and WORKERS cross references MEMBER_ID in the MEMBER relation. COMMITTEE_ID in WORKERS cross references COMMITTEE_ID in COMMITTEE. It is inferred that a member sometimes has many receipts, but a receipt must have a member. An expense sometimes has multiple payments, but each payment must have an expense. Each office sometimes has multiple expenses, but each expense must have an office. Each office may have a member as an officer–in–charge, and each member sometimes holds many offices. An office sometimes is responsible for many committees, and each committee must have an office in charge (although that office may not have a member assigned as officer). Committees sometimes have many workers, and each worker sometimes works on many committees. The E-R diagram is more expressive in that it displays explicitly the minimum cardinalities of relationships and shows exactly which entities are related. A suggested E-R diagram is provided below. Microsoft Visio was used to prepare this diagram. 9-22. Consider the following functional dependencies: Applicant_ID Applicant_Name Applicant_ID Applicant_Address Position_ID Position_Title Position_ID Date_Position_Opens Position_ID Department Applicant_ID + Position_IDDate_Applied Applicant_ID + Position_ID + Date_Interviewed a. Represent these attributes with 3NF relations. Provide meaningful relation names. b. Represent these attributes using an E-R diagram. Provide meaningful entity and relationship names. Answer: Since there are four determinants among the functional dependencies, there will be four relations. The last functional dependency, the one with only a three-key composite determinant, signifies all the dates on which a particular applicant interviewed for a particular position. This functional dependency does not signify a many-to-many relationship, like many composite keys do, since date interviewed is itself not a determinant. It signifies an entity with a three-component composite key. The four 3NF relations are: APPLICANT (Applicant ID, Applicant Name, Applicant Address) POSITION (Position ID, Position Title, Date Position Opens, Department) APPLICATION (Applicant ID, Position ID, Date Applied) INTERVIEW (Applicant ID, Position ID, Date Interviewed) See the accompanying E-R diagram corresponding to these four relations. Microsoft Visio was used to prepare the following E-R diagram. 9-23. Suppose you were designing a file of student records for your university's placement office. One of the fields that would likely be in this file is the student's major. Develop a coding scheme for this field that achieves the objectives outlined in this chapter for field coding. Answer: The objectives of a good coding scheme are to minimize storage space and to increase data integrity. Student major is a classical example of a sparse field that could benefit from being codified for storage and data entry. For online data entry, we would probably provide a list of possible majors from which the data entry person must choose; this legal list of codes will need to be updated, but this can probably be done separately from data entry of student data. A fundamental coding choice is whether to use codes that are as dense as possible or to try to use reasonably dense codes that have some meaning to most users (e.g., MIS vs. 31 for a Management Information Systems major). Short character string codes (e.g., three alphabetic characters) may take as little storage as a two digit numeric code, so a short character string may achieve both objectives. An interesting way to approach this question is to have your students identify how their university and at least two other universities represent a student’s major. Have your students compare and contrast these coding schemes. It is likely that they will locate universities where the code is numeric, alphabetic, or alphanumeric. 9-24. In Problem and Exercise 9-19 you developed integrated normalized relations. Choose primary keys for the files that would hold the data for these relations. Did you use attributes from the relations for primary keys or did you design new fields? Why or why not? Answer: Suggestions for the primary keys are underlined in the following relations. Foreign keys are shown with dashed underlines. VENDOR(Vendor No, Address) PRICEQUOTE(Vendor No, Item No, Quote Quantity, Price) PART(Item No, Desc) PART RECEIPT(Order No, Vendor No, Item No, Quote Quantity, Date, Order Quantity) If Vendor No, Item No, and Order No are numeric values assigned without any relationship to the associated entities, then these numbers would not change as the real world changes, and they would be acceptable as primary keys or components of composite primary keys. Quote Quantity, on the other hand, is very likely to be volatile, and is not suitable as part of the primary key for the PRICEQUOTE table. We need to create a nonintelligent primary key for the PRICEQUOTE table. Also, we still may need individual secondary index keys on Vendor No and Item No attributes from this relation to facilitate joining the PRICEQUOTE table with the VENDOR and PART tables, respectively. 9.25. Suppose you created a file for each relation in your answer to Problem and Exercise 9-19. If the following queries represented the complete set of accesses to this database, suggest and justify what primary and secondary key indexes you would build. a. For each PART in Item_Number order list in Vendor_ID, sequence all the vendors and their associated prices for that part. b. List all PART RECEIPTs, including related PART fields for all the parts received on a particular day. c. For a particular VENDOR, list all the PARTs and their associated prices that VENDOR can supply. Answer: The guidelines for identifying keys for indexing suggest that attributes used for selection, sorting, grouping, and joining are potential candidates. We are asked to consider the three queries as the only accesses to the database, so we do not implicitly need primary key indexes. We do not know the frequency of the three queries compared to update operations, so it is impossible to make precise, optimal decisions on the most economical indexes. Space does not seem to be an issue since, as we will see, none of the qualifying fields are very long. So, we indicate all possible indexes that might speed query processing. The first query, Query A, appears to need to access the PART and PRICEQUOTE tables, but actually the PRICEQUOTE table is sufficient. The E-R diagram indicates that every part has at least one vendor, so every part has at least one price quote. Thus, all parts appear at least once in the PRICEQUOTE table. The data to be reported in Query A (Item No, Vendor No, Quote Quantity, and Price) can all be found in the PRICEQUOTE table. The query asks for the results to be sorted by Item No and by Vendor No (but not by quote quantity). A composite index on these two attributes would be the most efficient way to directly produce this sorted output, thus avoiding the need to do a sort of the data once it is retrieved. This assumes that the DBMS or file system can utilize a composite index when it sees the compound sorting condition. Thus, for Query A, a secondary composite index on first Item No and then Vendor No would be ideal. The second query, Query B, asks to display all the attributes from the PART RECEIPT and PART tables; since ITEM NO is an attribute in the PART RECEIPT relation, we do not need the PRICEQUOTE relation to link the PART RECEIPT and PART tables. No sorting is done, but the query wants data for only a specified part receipt date. Assuming that many days of part receipt data are kept, then we would want to create a secondary key index on DATE and a secondary key index on ITEM NO in the PART RECEIPT table and a primary key index on ITEM NO in the PART table to efficiently support the selection and joining needed for Query B. The third query, Query C, involves selection of a particular vendor and display of attributes from all associated PRICEQUOTE rows for that vendor; thus, all the data needed for this query can be found in the PRICEQUOTE table. No sorting is mentioned. Thus, for Query C, only a secondary key index on Vendor No in the PRICEQUOTE table would support efficient processing. 9-26. Suppose you were designing a default value for the marriage status field in a student record at your university. What possible values would you consider and why? How would the default value change depending on other factors, such as type of student (undergraduate, graduate, professional)? Answer: You might set the default value to Single for undergrad student (maybe even grad students). Perhaps if your university has an executive or professional program (such as an executive MBA), the default marital status might change. 9-27. Consider Figure 9–19(B). Explain a query that would likely be processed more quickly using the denormalized relations rather than the normalized relations. Answer: One suggested answer is a situation where an employee needs to know a contact name for a vendor, a quoted price for a particular item, and a description for the item. Since the tables are denormalized, this query would require the joining of only two tables; however, if we are using normalized relations, this query would require the joining of all three relations. A query that would likely be processed more quickly using denormalized relations rather than normalized relations is one that involves complex joins across multiple tables. For example: Query Example: Retrieving a detailed customer order report that includes customer details, order details, and product information. In a denormalized database, this query can be processed more quickly because the data is consolidated into fewer tables, reducing the need for multiple joins and complex retrieval operations. In contrast, a normalized database requires joining multiple related tables, which can slow down query performance. 9-28. Consider your answers to parts a and b of Problem and Exercise 7-27 in Chapter 7. a. Transform the E-R diagram you developed in part a into a set of 3NF relations. Clearly identify primary and foreign keys. Explain how you determined the primary key for any many-to-many relationships or associative entities. b. Transform the E-R diagram you developed in part b into a set of 3NF relations. Clearly identify primary and foreign keys. Explain how you determined the primary key for any many-to-many relationships or associative entities. Answer: Suggested answers are provided below. Primary keys and foreign keys were identified using the guidelines provided in the chapter. The ASSIGNMENT relation includes Project_ID and Employee_ID as foreign keys. These attributes enable relationships to be established between the PROJECT and CHEMIST relations. For the ASSIGNMENT relation, Assignment_No is used as the primary key, guaranteeing a unique value for each record. Part a: 1. Identify Entities and Relationships: • Entities: Convert each entity into a table. • Attributes: Include attributes as columns in the table. • Primary Key: Select a unique identifier for each table. 2. Transform Many-to-Many Relationships: • Create an associative table with foreign keys referencing the primary keys of the related entities. • Primary Key: Typically, a composite key of the foreign keys or a unique surrogate key. 3. Example: • Entity: `Student` (Primary Key: `StudentID`) • Entity: `Course` (Primary Key: `CourseID`) • Many-to-Many: `Enrollment` (Primary Key: `EnrollmentID`, Foreign Keys: `StudentID`, `CourseID`) Part b: 1. Identify Entities and Relationships: • Follow the same process as in part a. 2. Handle Complex Relationships: • If there are more complex relationships or additional attributes, incorporate these into the relevant tables or associative tables. 3. Example: • Entity: `Project` (Primary Key: `ProjectID`) • Entity: `Employee` (Primary Key: `EmployeeID`) • Many-to-Many: `ProjectAssignment` (Primary Key: `AssignmentID`, Foreign Keys: `ProjectID`, `EmployeeID`) Explanation: • Primary Key for Many-to-Many: Use a unique surrogate key or a composite key of foreign keys. • Foreign Keys: Ensure they correctly reference the primary keys of the related entities. These transformations ensure the database is in 3NF, minimizing redundancy and dependency issues. 9-29. Model a set of typical family relationships—spouse, father, and mother—in a single 3NF relation. Also include nonkey attributes name and birth date. Assume that each person has only one spouse, one father, and one mother. Show foreign keys with dotted underlining. Answer: Person (Person_ID, Name, Birth_date, Spouse_ID, Father_ID, Mother_ID) Discussion Questions Solutions 9-30. Many database management systems offer the ability to enforce referential integrity. Why would using such a feature be a good idea? Are there any situations in which referential integrity might not be important? Answer: If the DBMS offers this ability, it should always be used. The possibility that referential integrity will never be violated as engineers interact with a database is very low, and having the system enforce the principle will reduce human error. There should be no situations in which referential integrity is not important. 9-31. Assume you are part of the systems development team at a medium-sized organization. You have just completed the database design portion of the systems design phase, and the project sponsor would like a status update. Assuming the project sponsor is a VP in the marketing department, with only a high-level understanding of technical subjects, how would you go about presenting the database design you have just completed? How would your presentation approach change if the project sponsor were the manager of the database team? Answer: Answers will vary. Presenting to a non-technical manager should be much more high-level, with few if any technical details. The focus should be on how the database design will benefit the business (for example, with faster report generation). Presenting to a technical manager would be focused much more on the technical aspects of the database, with less of a focus on the business benefits of the design. For the VP in Marketing: • Focus on High-Level Overview: Present the database design using simple terms and visual aids like diagrams or charts. • Emphasize Benefits: Highlight how the design supports business goals, improves efficiency, and enhances data accessibility. • Avoid Technical Jargon: Use layman’s language and relate the design to marketing outcomes and strategic goals. For the Database Team Manager: • Provide Technical Details: Include detailed information about schema design, table structures, relationships, and constraints. • Discuss Implementation: Address technical aspects such as indexing strategies, normalization, and data integrity. • Engage in Technical Dialogue: Be prepared for in-depth discussions and address technical concerns or suggestions. Summary: • For a VP, keep it high-level and business-focused. • For a database manager, provide detailed technical information and engage in technical discussions. 9-32. Discuss what additional information should be collected during requirements analysis that is needed for file and database design and that is not very useful for earlier phases of systems development. Answer: Students should identify several types of information that are most useful for file and database design. Information as it relates to primary keys, secondary keys, data types, business rules, relationships, data integrity control methods, volume, present and future required data storage space, and file organization should be collected. During requirements analysis, additional information needed for file and database design includes: 1. Data Relationships: Detailed information on how data entities are related (e.g., one-to-many, many-to-many). 2. Data Integrity Rules: Specific constraints and rules for data validation and consistency (e.g., unique keys, foreign keys). 3. Data Volume Estimates: Expected size and growth of data to plan for performance and storage. 4. Access Patterns: How and by whom data will be accessed, including read/write frequency and query types. 5. Normalization Needs: Specific requirements for data normalization to minimize redundancy. This detailed information is crucial for creating a robust database design but is less relevant during earlier phases like initial concept or feasibility analysis. 9-33. Find out what database management systems are available at your university for student use. Investigate which data types these DBMSs support. Compare these DBMSs based upon data types supported and suggest which types of applications each DBMS is best suited for based on this comparison. Answer: There are probably several DBMSs available at the student’s university. For example, the student’s PC laboratory is likely to have at least one DBMS available; perhaps Microsoft Access for Windows is available. A client-server or object-oriented DBMS might be available as well. In addition, the university’s administrative database applications are probably run on a mainframe or minicomputer. A DBMS such as IBM’s DB2 or the Oracle DBMS are probably being used. The data types supported by these DBMSs will provide important criteria for determining what types of applications each of these DBMSs is best suited. PC DBMSs tend to have fewer physical file and database design options than do mini- or mainframe computer-based DBMSs. In addition, students may also have to consider the number of records and physical file organizations that the DBMS allows, the available storage space on physical storage mediums, the record locking and other security features that the DBMS provides, the availability of a CASE tool, and other factors. Available DBMSs: 1. MySQL: • Data Types: INT, VARCHAR, DATE, FLOAT, TEXT • Best For: Web applications and e-commerce. 2. PostgreSQL: • Data Types: INTEGER, VARCHAR, DATE, NUMERIC, JSONB, ARRAY • Best For: Complex applications with advanced data needs (e.g., GIS, JSON). 3. Microsoft SQL Server: • Data Types: INT, NVARCHAR, DATE, FLOAT, XML • Best For: Enterprise applications, business intelligence. 4. SQLite: • Data Types: INTEGER, TEXT, BLOB, REAL • Best For: Lightweight, mobile, and desktop applications. Comparison: • MySQL and PostgreSQL are versatile; PostgreSQL handles complex data types. • SQL Server integrates well with Microsoft tools; SQLite is ideal for simple, portable applications. 9-34. Find out what database management systems are available at your university for student use. Investigate what physical file and database design decisions need to be made. Compare this list of decisions to those discussed in this chapter. For physical database and design decisions (or options) not discussed in the chapter, investigate what choices you have and how you should choose among these choices. Submit a report to your instructor with your findings. Answer: Your students should find that many of the physical file and database design issues presented in the chapter are dealt with at their university as well. For instance, storage space, file organizations, and data types are issues that must be addressed. Available DBMSs at University: 1. MySQL 2. PostgreSQL 3. Microsoft SQL Server 4. SQLite Physical File and Database Design Decisions: 1. Storage Structure: Deciding on table organization (heap, clustered, or non-clustered indexing). 2. Indexing: Choosing appropriate indexes for performance (single-column, composite, unique). 3. Partitioning: Implementing table partitioning for large datasets to improve performance. 4. Data Redundancy: Managing data duplication and normalization to avoid redundancy. 5. Backup and Recovery: Setting up backup strategies and recovery plans. 6. Concurrency Control: Deciding on methods to handle concurrent data access (locking mechanisms, transaction isolation levels). Comparison to Chapter Discussions: • Chapter Focus: Typically covers fundamental concepts like normalization and basic indexing. • Additional Decisions: Consideration of advanced indexing, partitioning, and concurrency control may not be detailed in basic chapters. Additional Options: • Storage Structure: Choose based on query patterns and performance requirements. • Indexing: Select indexes that align with query needs and performance goals. • Partitioning: Use for large datasets to manage and improve query performance. • Backup and Recovery: Implement robust strategies to ensure data safety and availability. • Concurrency Control: Choose based on application requirements and database load. Report Findings: • Include: Overview of available DBMSs, key physical design decisions, comparison with chapter content, and additional design options and recommendations. Case Problems Solutions 9-35. Pine Valley Case Exercises Solutions a. Develop logical data models for the form and two reports mentioned in the case scenario. Answer: Students should use the Customer Profile Form, Products By Demographics Summary Report, and the Customer Purchasing Frequency Report that they created in the previous chapter. Therefore, the answers to this question are dependent on the data collected on the form and shown on the two reports. For discussion purposes, you can use the following entities. For the Customer Profile Form, the following relations are identified. Also note that only one profile relation is identified. However, separate profile relations could be created based on the type of customer. For clarification purposes, the ChildNo references the birth order of the customer’s children and is unique for each customer; however, this attribute’s value is not unique within the DEPENDENTS relation. b. Perform view integration on the logical models developed for part a. A suggested answer is provided below. c. What view integration problems, if any, exist? How should you correct these problems? Answer: Student answers for this question will vary. However, your students should identify several occurrences of synonyms, homonyms, and dependencies between nonkeys. Students could easily use the attribute name ‘description’ within the CUSTOMERTYPE, REGION, ADVERTISING, and CATEGORY relations, thus creating a homonym. The attribute ‘address’ may also exist as a homonym. The customer identification number can be represented in several ways, such as CID, CustNo, and Cust#. View Integration Problems: 1. Data Redundancy: Duplicate data across different views can lead to inconsistencies. 2. Conflicting Data Definitions: Different views may use varying terminologies or formats. 3. Inconsistent Constraints: Different views might impose conflicting data constraints or validation rules. 4. Overlapping Data: Multiple views may cover overlapping data, leading to integration issues. Correction Methods: 1. Normalize Data: Eliminate redundancy by designing a unified schema with shared definitions. 2. Standardize Definitions: Ensure consistent terminology and data formats across all views. 3. Harmonize Constraints: Align constraints and validation rules across views. 4. Merge Overlaps: Integrate overlapping data into a consolidated view to prevent conflicts. These steps help create a coherent, consistent database schema that addresses integration issues. d. Have a fellow classmate critique your logical data model. Make any necessary corrections. Answer: Student answers for this question will vary. However, this exercise provides a valuable learning experience for your students. Fellow students may recognize synonyms, homonyms, and data dependencies between nonkeys that were overlooked by their designers. If this is the case, ask your students why they think this happened. Classmate's Critique: 1. Entity Redundancy: Identified duplicate entities or attributes that should be consolidated. 2. Relationship Clarity: Noted unclear or missing relationships between entities. 3. Attribute Completeness: Pointed out attributes that were missing or incomplete. Corrections Made: 1. Consolidated Entities: Merged redundant entities or attributes to avoid duplication. 2. Clarified Relationships: Defined and illustrated missing or ambiguous relationships clearly. 3. Enhanced Attributes: Added missing attributes and ensured completeness and accuracy. Result: The logical data model is now more accurate, consistent, and clear, addressing the critique and improving overall design. 9-36. Hoosier Burger Case Exercises Solutions a. Develop logical models for each of the interfaces mentioned in the case scenario. Answer: Students should use the Hoosier Burger forms and reports that they created in the last chapter. Therefore, their answers for this question will vary. However, to facilitate discussion a suggested answer is provided below. Also, attributes whose values are derived are not indicated in the following relations. b. Integrate the logical models prepared for part a into a consolidated logical model. c. What types of problems can arise from view integration? Did you encounter any of these problems when preparing the consolidated logical model? Answer: View integration can create several problems, including synonyms, homonyms, and dependencies between nonkey attributes. Students may recognize date as a homonym; they may also recognize several synonyms for the customer’s identification number. One area of concern is how to treat the Customer’s account information. Students will probably identify alternative ways for modeling this data. d. Using your newly constructed logical model, determine which fields should be indexed. Which fields should be designated as calculated fields? Answer: Student recommendations will vary. However, the chapter's guidelines suggest specifying a unique index for the primary key of each table, foreign keys, and nonkey fields that are referenced in qualification, sorting, and grouping commands for the purpose of retrieving data. At a minimum indexes should be established for the primary keys and foreign keys. The current balance, dollar sales, and reorder amount fields are calculated fields. Fields to be Indexed: 1. Primary Keys: Automatically indexed for efficient record retrieval. 2. Foreign Keys: Indexed to speed up joins and ensure referential integrity. 3. Frequently Queried Fields: Columns often used in search conditions or as filters (e.g., `EmailAddress` in a user table). Fields to be Designated as Calculated Fields: 1. Derived Data: Fields that are calculated from other fields, such as `TotalPrice` derived from `Quantity` and `UnitPrice`. 2. Aggregate Data: Fields storing summary information like `AverageScore` or `TotalSales`, based on aggregation of other data. Summary: • Index: Primary and foreign keys, and frequently queried fields. • Calculated Fields: Derived or aggregate data to optimize performance and avoid redundant calculations. 9-37. Plow Masters Case Exercises Solutions a. What are the four steps in logical database modeling and design? Answer: (See items 1-4 under “The Process of Database Design” in the first section of the chapter.) The four steps in logical database modeling and design are: 1. Conceptual Design: Develop an ER diagram to represent entities, relationships, and attributes. 2. Logical Design: Convert the ER diagram into a logical schema, specifying tables, columns, and relationships. 3. Normalization: Organize the schema to reduce redundancy and improve data integrity through normalization processes. 4. Schema Refinement: Review and refine the schema to ensure it meets all requirements and optimizes performance. These steps help ensure a well-structured and efficient database design. b. Several relations have been identified for this project, including removal technician, customer, service provided, equipment inventory, and services offered. What relationships exist among these relations? How should these relationships be represented? Answer: Service provided would be related to one or more removal technicians, to one (and only one) customer, one or more pieces of equipment, and to one or more services offered. Services offered might also be related to one or more pieces of equipment. c. Think of the attributes that would most likely be associated with the relations identified in the part b. For each data integrity control method discussed in the chapter, provide a specific example. Answer: Default value: service provided would have a date attribute, and the default value should logically be the current date. Input mask: the customer phone number field in the customer relation would have an input mask, forcing all numbers to follow a certain format for uniformity. Range control: the date attribute in the service provided relation should be controlled such that dates in the future are not permitted. Referential integrity: since a service provided must be related to a customer, the attribute referencing the customer (e.g., customer_id) should be constrained the set of customer_id values from the customer relation. Null value control: some null values would be disallowed (like the customer_id field from the previous example). Others might make sense, like a fuel type attribute in the equipment relation. Not all pieces of equipment will take fuel (e.g., a shovel), so leaving the fuel type attribute as null would be acceptable. d. What are the guidelines for choosing indexes? Identify several fields that should be indexed. Answer: For guidelines for choosing indexes, see the three steps in the “Indexed File Organizations” subsection in the “Arranging Table Rows” section. There should be an index for the each of the primary keys. There should be an index for the foreign keys. And if other data fields are used often, an index can be specified to aid in the retrieval of that information. For example, if there is a frequent need to know the date on which a service was provided for a given customer name, then indexes for the date attribute in service provided, and the name attribute in customer should be created. Petrie’s Electronics Case Question Solutions 9-38. In the questions associated with the Petrie’s Electronics case at the end of Chapter 7, you were asked to modify the E-R diagram given in Figure 7-1 to include any other entities and the attributes you identified from the Petrie’s cases. Review your answers to these questions, and add any additional needed relations to the document in Figure 9-1. Answer: Additional Entities and Attributes: 1. Customer: CustomerID, Name, Address, PhoneNumber, Email 2. Product: ProductID, Name, Description, Price, StockQuantity 3. Order: OrderID, CustomerID, OrderDate, TotalAmount 4. OrderDetail: OrderID, ProductID, Quantity, UnitPrice 5. Supplier: SupplierID, Name, ContactInfo 6. PurchaseOrder: PurchaseOrderID, SupplierID, OrderDate, TotalAmount 7. PurchaseOrderDetail: PurchaseOrderID, ProductID, Quantity, UnitPrice Additional Relations: 1. Customer -< Order 2. Order -< OrderDetail 3. Product -< OrderDetail 4. Supplier -< PurchaseOrder 5. PurchaseOrder -< PurchaseOrderDetail 6. Product -< PurchaseOrderDetail 9-39. Verify that the relations you say represent the Petrie’s Electronics database are in third normal form. If they are, explain why. If they are not, change them so that they are. Answer: 3NF Verification: 1. Customer: In 3NF (all attributes depend on CustomerID). 2. Product: In 3NF (all attributes depend on ProductID). 3. Order: In 3NF (attributes depend on OrderID; CustomerID is a foreign key). 4. OrderDetail: In 3NF (attributes depend on composite key: OrderID, ProductID). 5. Supplier: In 3NF (all attributes depend on SupplierID). 6. PurchaseOrder: In 3NF (attributes depend on PurchaseOrderID; SupplierID is a foreign key). 7. PurchaseOrderDetail: In 3NF (attributes depend on composite key: PurchaseOrderID, ProductID). Summary: All relations are in 3NF. 9-40. The E-R diagram you developed in questions in the Petrie’s Electronics case at the end of Chapter 7 should have shown minimum cardinalities on both ends of each relationship. Are minimum cardinalities represented in some way in the relations in your answer to Question 9-39? If not, how are minimum cardinalities enforced in the database? Answer: Answers will vary, according to previous answers in the case at the end of Chapter 7. Minimum Cardinalities in Relations: • Not Explicitly Represented: The relations themselves don’t explicitly show minimum cardinalities. Enforcement Methods: 1. Database Constraints: Use constraints to enforce minimum cardinalities, such as: • NOT NULL Constraints: Ensure that essential attributes (like foreign keys) are always populated. • Foreign Key Constraints: Ensure required relationships between tables, implying minimum cardinalities. 2. Application Logic: Implement rules in the application layer to enforce minimum cardinalities and ensure data integrity. Summary: Minimum cardinalities are enforced through constraints and application logic rather than being explicitly represented in the relational schema. 9-41. Using your answer to Case Question 9-39, select data types, formats, and lengths for each attribute of each relation. Use the data types and formats supported by Microsoft Access. What data type should be used for nonintelligent primary keys? Answer: Answers will vary, according to previous answers in the case at the end of Chapter 7. Nonintelligent primary keys should be numeric data types. Data Types for Microsoft Access: 1. Customer: • CustomerID: AutoNumber • Name, Address, Email: Short Text • PhoneNumber: Short Text 2. Product: • ProductID: AutoNumber • Name: Short Text • Description: Long Text • Price: Currency • StockQuantity: Number (Integer) 3. Order: • OrderID: AutoNumber • CustomerID: Number (Long Integer) • OrderDate: Date/Time • TotalAmount: Currency 4. OrderDetail: • OrderID, ProductID: Number (Long Integer) • Quantity: Number (Integer) • UnitPrice: Currency 5. Supplier: • SupplierID: AutoNumber • Name: Short Text • ContactInfo: Long Text 6. PurchaseOrder: • PurchaseOrderID: AutoNumber • SupplierID: Number (Long Integer) • OrderDate: Date/Time • TotalAmount: Currency 7. PurchaseOrderDetail: • PurchaseOrderID, ProductID: Number (Long Integer) • Quantity: Number (Integer) • UnitPrice: Currency Non-intelligent Primary Keys: Use AutoNumber. 9-42. Complete all table and field definitions for the Petrie’s Electronics case database using Microsoft Access. Besides the decisions you have made in answers to the preceding questions, fill in all other field definition parameters for each field of each table. Answer: Table and Field Definitions for Petrie’s Electronics in Microsoft Access: 1. Customer: • CustomerID: AutoNumber, Primary Key • Name: Short Text, 255 characters, Required • Address: Short Text, 255 characters, Optional • PhoneNumber: Short Text, 20 characters, Optional • Email: Short Text, 255 characters, Optional 2. Product: • ProductID: AutoNumber, Primary Key • Name: Short Text, 255 characters, Required • Description: Long Text, 65,535 characters, Optional • Price: Currency, Format: $#,##0.00, Required • StockQuantity: Number (Integer), Required 3. Order: • OrderID: AutoNumber, Primary Key • CustomerID: Number (Long Integer), Foreign Key, Required • OrderDate: Date/Time, Format: mm/dd/yyyy, Required • TotalAmount: Currency, Format: $#,##0.00, Required 4. OrderDetail: • OrderID: Number (Long Integer), Foreign Key, part of Composite Key • ProductID: Number (Long Integer), Foreign Key, part of Composite Key • Quantity: Number (Integer), Required • UnitPrice: Currency, Format: $#,##0.00, Required 5. Supplier: • SupplierID: AutoNumber, Primary Key • Name: Short Text, 255 characters, Required • ContactInfo: Long Text, 65,535 characters, Optional 6. PurchaseOrder: • PurchaseOrderID: AutoNumber, Primary Key • SupplierID: Number (Long Integer), Foreign Key, Required • OrderDate: Date/Time, Format: mm/dd/yyyy, Required • TotalAmount: Currency, Format: $#,##0.00, Required 7. PurchaseOrderDetail: • PurchaseOrderID: Number (Long Integer), Foreign Key, part of Composite Key • ProductID: Number (Long Integer), Foreign Key, part of Composite Key • Quantity: Number (Integer), Required • UnitPrice: Currency, Format: $#,##0.00, Required Summary: Define each field with appropriate data types, sizes, formats, and constraints to ensure data integrity and usability. 9-43. The one decision for a relational database that usually influences efficiency the most is index definition. What indexes do you recommend for this database? Justify your selection of each index. Answer: Recommended Indexes for Petrie’s Electronics Database: 1. CustomerID (Customer table): • Index: Primary Key (AutoNumber) • Justification: Ensures unique identification and fast lookups for customer records. 2. ProductID (Product table): • Index: Primary Key (AutoNumber) • Justification: Ensures unique identification and fast lookups for product records. 3. OrderID (Order table): • Index: Primary Key (AutoNumber) • Justification: Ensures unique identification and fast lookups for order records. 4. CustomerID (Order table): • Index: Non-clustered • Justification: Speeds up queries and joins based on customer information. 5. OrderID and ProductID (OrderDetail table): • Index: Composite Index (OrderID, ProductID) • Justification: Improves query performance for accessing specific order details and efficiently handles joins with Order and Product tables. 6. SupplierID (PurchaseOrder table): • Index: Non-clustered • Justification: Speeds up queries and joins involving supplier information. 7. PurchaseOrderID and ProductID (PurchaseOrderDetail table): • Index: Composite Index (PurchaseOrderID, ProductID) • Justification: Enhances performance for queries and joins involving purchase orders and products. Summary: Indexes are chosen to optimize query performance and ensure quick access to frequently searched or joined fields. 9-44. Using Microsoft Visio, develop an E-R diagram with all the supporting database properties for decisions you made in Case Questions 9-38 to 9-43. Can all the database design decisions you made be documented in Visio? Finally, use Visio to generate Microsoft Access table definitions. Did the table generation create the table definitions you would create manually? Answer: E-R Diagram in Microsoft Visio: 1. Entities and Attributes: Include all entities (Customer, Product, Order, etc.) with their attributes and data types as defined. 2. Relationships: Show all relationships, including primary and foreign keys, and cardinalities. 3. Supporting Properties: Document attributes such as data types, formats, and constraints. Documentation in Visio: • Possible to Document: Most database design decisions can be documented in Visio, including entities, attributes, relationships, and constraints. Table Generation in Visio: • Automatic Generation: Visio can generate Microsoft Access table definitions from the E-R diagram. • Comparison: The generated table definitions typically align with manual definitions but may require minor adjustments for specifics like field properties or constraints. Summary: Visio can effectively document and generate E-R diagrams and corresponding Access table definitions, with a close match to manually created definitions. Solution Manual for Essentials of Systems Analysis and Design Joseph S. Valacich, Joey F. George, Jeffrey A. Hoffer 9780133546231
Close