Preview (11 of 34 pages)

This Document Contains Chapters 10 to 11 Chapter 10 Transaction Management and Concurrency Control Discussion Focus Why does a multi-user database environment give us a special interest in transaction management and concurrency control? Begin by exploring what a transaction is, what its components are, and why it must be managed carefully even in a single user database environment. Then explain why a multi user database environment makes transaction management even more critical. Emphasize the following points: • A transaction represents a real world event such as the sale of a product. • A transaction must be a logical unit of work. That is, no portion of a transaction stands by itself. For example, the product sale has an effect on inventory and, if it is a credit sale, it has an effect on customer balances. • A transaction must take a database from one consistent state to another. Therefore, all parts of a transaction must be executed or the transaction must be aborted. (A consistent state of the database is one in which all data integrity constraints are satisfied.) All transactions have four properties: Atomicity, Consistency, Isolation, and Durability. (These four properties are also known as the ACID test of transactions.) In addition, multiple transactions must conform to the property of serializability. Table IM10.1 provides a good summary of transaction properties. Table IM10.1 Transaction Properties. Multi-user Databases Single-user Databases atomicity: Unless all parts of the executed, the transaction is aborted consistency. Indicates the permanence of the database’s consistent state. durability: Once a transaction is committed, it cannot be rolled back isolation: Data used by one transaction cannot be used by another transaction until the first transaction is completed. serializability: The result of the concurrent execution of transactions is the same as though the transactions were executed in serial order. Note that SQL provides transaction support through COMMIT (permanently saves changes to disk) and ROLLBACK (restores the previous database state) Each SQL transaction is composed of several database requests, each one of which yields I/O operations. A transaction log keeps track of all transactions that modify the database. The transaction log data may be used for recovery (ROLLBACK) purposes. Next, explain that concurrency control is the management of concurrent transaction execution. (Therefore, a single-user database does not require concurrency control!) Specifically, explore the concurrency control issues concerning lost updates, uncommitted data, and inconsistent databases. Note that multi user DBMSs use a process known as a scheduler to enforce concurrency control. Since concurrency control is made possible through the use of locks, examine the various levels and types of locks. Because the use of locks creates a possibility of producing deadlocks, discuss the detection, prevention, and avoidance strategies that enable the system to manage deadlock conditions. Answers to Review Questions 1. Explain the following statement: a transaction is a logical unit of work. A transaction is a logical unit of work that must be entirely completed of aborted; no intermediate states are accepted. In other words, a transaction, composed of several database requests, is treated by the DBMS as a unit of work in which all transaction steps must be fully completed if the transaction is to be accepted by the DBMS. Acceptance of an incomplete transaction will yield an inconsistent database state. To avoid such a state, the DBMS ensures that all of a transaction's database operations are completed before they are committed to the database. For example, a credit sale requires a minimum of three database operations: 1. An invoice is created for the sold product. 2. The product's inventory quantity on hand is reduced. 3. The customer accounts payable balance is increased by the amount listed on the invoice. If only parts 1 and 2 are completed, the database will be left in an inconsistent state. Unless all three parts (1, 2, and 3) are completed, the entire sales transaction is canceled. 2. What is a consistent database state, and how is it achieved? A consistent database state is one in which all data integrity constraints are satisfied. To achieve a consistent database state, a transaction must take the database from one consistent state to another. (See the answer to question 1.) 3. The DBMS does not guarantee that the semantic meaning of the transaction truly represents the real-world event. What are the possible consequences of that limitation? Give an example. The database is designed to verify the syntactic accuracy of the database commands given by the user to be executed by the DBMS. The DBMS will check that the database exists, that the referenced attributes exist in the selected tables, that the attribute data types are correct, and so on. Unfortunately, the DBMS is not designed to guarantee that the syntactically correct transaction accurately represents the real world event. For example, if the end user sells 10 units of product 100179 (Crystal Vases), the DBMS cannot detect errors such as the operator entering 10 units of product 100197 (Crystal Glasses). The DBMS will execute the transaction, and the database will end up in a technically consistent state but in a real world inconsistent state because the wrong product was updated. 4. List and discuss the five transaction properties. The five transaction properties are: Atomicity requires that all parts of a transaction must be completed or the transaction is aborted. This property ensures that the database will remain in a consistent state. Consistency Indicates the permanence of the database consistent state. Isolation means that the data required by an executing transaction cannot be accessed by any other transaction until the first transaction finishes. This property ensures data consistency for concurrently executing transactions. Durability indicates that the database will be in a permanent consistent state after the execution of a transaction. In other words, once a consistent state is reached, it cannot be lost. Serializability means that a series of concurrent transactions will yield the same result as if they were executed one after another. All five transaction properties work together to make sure that a database maintains data integrity and consistency for either a single-user or a multi-user DBMS. 5. What does serializability of transactions mean? Serializability of transactions means that a series of concurrent transactions will yield the same result as if they were executed one after another 6. What is a transaction log, and what is its function? The transaction log is a special DBMS table that contains a description of all the database transactions executed by the DBMS. The database transaction log plays a crucial role in maintaining database concurrency control and integrity. The information stored in the log is used by the DBMS to recover the database after a transaction is aborted or after a system failure. The transaction log is usually stored in a different hard disk or in a different media (tape) to prevent the failure caused by a media error. 7. What is a scheduler, what does it do, and why is its activity important to concurrency control? The scheduler is the DBMS component that establishes the order in which concurrent database operations are executed. The scheduler interleaves the execution of the database operations (belonging to several concurrent transactions) to ensure the serializability of transactions. In other words, the scheduler guarantees that the execution of concurrent transactions will yield the same result as though the transactions were executed one after another. The scheduler is important because it is the DBMS component that will ensure transaction serializability. In other words, the scheduler allows the concurrent execution of transactions, giving end users the impression that they are the DBMS's only users. 8. What is a lock, and how, in general, does it work? A lock is a mechanism used in concurrency control to guarantee the exclusive use of a data element to the transaction that owns the lock. For example, if the data element X is currently locked by transaction T1, transaction T2 will not have access to the data element X until T1 releases its lock. Generally speaking, a data item can be in only two states: locked (being used by some transaction) or unlocked (not in use by any transaction). To access a data element X, a transaction T1 first must request a lock to the DBMS. If the data element is not in use, the DBMS will lock X to be used by T1 exclusively. No other transaction will have access to X while T1 is executed. 9. What are the different levels of lock granuality? Lock granularity refers to the size of the database object that a single lock is placed upon. Lock granularity can be: Database-level, meaning the entire database is locked by one lock. Table-level, meaning a table is locked by one lock. Page-level, meaning a diskpage is locked by one lock. Row-level, meaning one row is locked by one lock. Field-level, meaning one field in one row is locked by one lock. 10. Why might a page-level lock be preferred over a field-level lock? Smaller lock granualarity improves the concurrency of the database by reducing contention to lock database objects. However, smaller lock granularity also means that more locks must be maintained and managed by the DBMS, requiring more processing overhead and system resources for lock management. Concurrency demands and system resource usage must be balanced to ensure the best overall transaction performance. In some circumstances, page-level locks, which require fewer system resources, may produce better overall performance than field-level locks, which require more system resources. 11. What is concurrency control, and what is its objective? Concurrency control is the activity of coordinating the simultaneous execution of transactions in a multiprocessing or multi user database management system. The objective of concurrency control is to ensure the serializability of transactions in a multi user database management system. (The DBMS's scheduler is in charge of maintaining concurrency control.) Because it helps to guarantee data integrity and consistency in a database system, concurrency control is one of the most critical activities performed by a DBMS. If concurrency control is not maintained, three serious problems may be caused by concurrent transaction execution: lost updates, uncommitted data, and inconsistent retrievals. 12. What is an exclusive lock, and under what circumstances is it granted? An exclusive lock is one of two lock types used to enforce concurrency control. (A lock can have three states: unlocked, shared (read) lock, and exclusive (write) lock. The "shared" and "exclusive" labels indicate the nature of the lock.) An exclusive lock exists when access to a data item is specifically reserved for the transaction that locked the object. The exclusive lock must be used when a potential for conflict exists, e.g., when one or more transactions must update (WRITE) a data item. Therefore, an exclusive lock is issued only when a transaction must WRITE (update) a data item and no locks are currently held on that data item by any other transaction. To understand the reasons for having an exclusive lock, look at its counterpart, the shared lock. Shared locks are appropriate when concurrent transactions are granted READ access on the basis of a common lock, because concurrent transactions based on a READ cannot produce a conflict. A shared lock is issued when a transaction must read data from the database and no exclusive locks are held on the data to be read. 13. What is a deadlock, and how can it be avoided? Discuss several strategies for dealing with deadlocks. Base your discussion on Chapter 10’s Section 10-3d, Deadlocks. Start by pointing out that, although locks prevent serious data inconsistencies, their use may lead to two major problems: 1. The transaction schedule dictated by the locking requirements may not be serializable, thus causing data integrity and consistency problems. 2. The schedule may create deadlocks. Database deadlocks are the equivalent of a traffic gridlock in a big city and are caused by two transactions waiting for each other to unlock data. Use Table 10.13 in the text to illustrate the scenario that leads to a deadlock.) The table has been reproduced below for your convenience. TABLE 10.13 How a Deadlock Condition is Created In a real world DBMS, many more transactions can be executed simultaneously, thereby increasing the probability of generating deadlocks. Note that deadlocks are possible only if one of the transactions wants to obtain an exclusive lock on a data item; no deadlock condition can exist among shared locks. Three basic techniques exist to control deadlocks: Deadlock Prevention A transaction requesting a new lock is aborted if there is a possibility that a deadlock may occur. If the transaction is aborted, all the changes made by this transaction are rolled back and all locks are released. The transaction is then re-scheduled for execution. Deadlock prevention works because it avoids the conditions that lead to deadlocking. Deadlock Detection The DBMS periodically tests the database for deadlocks. If a deadlock is found, one of the transactions (the "victim") is aborted (rolled back and rescheduled) and the other transaction continues. Note particularly the discussion in Section 10-4a, Wait/Die and Wound/Wait Schemes. Deadlock Avoidance The transaction must obtain all the locks it needs before it can be executed. This technique avoids rollback of conflicting transactions by requiring that locks be obtained in succession. However, the serial lock assignment required in deadlock avoidance increases the response times. The best deadlock control method depends on the database environment. For example, if the probability of deadlocks is low, deadlock detection is recommended. However, if the probability of deadlocks is high, deadlock prevention is recommended. If response time is not high on the system priority list, deadlock avoidance may be employed. 14. What are some disadvantages of time-stamping methods for concurrency control? The disadvantages are: 1) each value stored in the database requires two additional time stamp fields – one for the last time the field was read and one for the last time it was updated, 2) increased memory and processing overhead requirements, and 3) many transactions may have to be stopped, rescheduled, and restamped. 15. Why might it take a long time to complete transactions when an optimistic approach to concurrency control is used? Because the optimistic approach makes the assumption that conflict from concurrent transactions is unlikely, it does nothing to avoid conflicts or control the conflicts. The only test for conflict occurs during the validation phase. If a conflict is detected, then the entire transaction restarts. In an environment with few conflicts from concurrency, this type of single checking scheme works well. In an environment where conflicts are common, a transaction may have to be restarted numerous times before it can be written to the database. 16. What are the three types of database critical events that can trigger the database recovery process? Give some examples for each one. Backup and recovery functions constitute a very important component of today’s DBMSs. Some DBMSs provide functions that allow the database administrator to perform and schedule automatic database backups to permanent secondary storage devices, such as disks or tapes. Critical events include: • Hardware/software failures. hard disk media failure, a bad capacitor on a motherboard, or a failing memory bank. Other causes of errors under this category include application program or operating system errors that cause data to be overwritten, deleted, or lost. • Human-caused incidents. This type of event can be categorized as unintentional or intentional.  An unintentional failure is caused by carelessness by end-users. Such errors include deleting the wrong rows from a table, pressing the wrong key on the keyboard, or shutting down the main database server by accident.  Intentional events are of a more severe nature and normally indicate that the company data are at serious risk. Under this category are security threats caused by hackers trying to gain unauthorized access to data resources and virus attacks caused by disgruntled employees trying to compromise the database operation and damage the company. • Natural disasters. This category includes fires, earthquakes, floods, and power failures. 17. What are the four ANSI transaction isolation levels? What type of reads does each level allow? The four ANSI transaction isolation levels are 1) read uncommitted, 2) read committed, 3) repeatable read, and 4) Serializable. These levels allow different “questionable” reads. A read is questionable if it can produce inconsistent results. Read uncommitted isolation will allow dirty reads, non-repeatable reads and phantom reads. Read committed isolation will allow non-repeatable reads and phantom reads. Repeatable read isolation will allow phantom reads. Serializable does not allow any questionable reads. Problem Solutions 1. Suppose you are a manufacturer of product ABC, which is composed of parts A, B, and C. Each time a new product is created, it must be added to the product inventory, using the PROD_QOH in a table named PRODUCT. And each time the product ABC is created, the parts inventory, using PART_QOH in a table named PART, must be reduced by one each of parts A, B, and C. The sample database contents are shown in Table P10.1 Table P10.1 The Database for Problem 1 Table name: PRODUCT Table name: PART PROD_CODE PROD_QOH PART_CODE PART_QOH ABC 1,205 A 567 B 98 C 549 Given that information, answer Questions a through e. a. How many database requests can you identify for an inventory update for both PRODUCT and PART? Depending in how the SQL statements are written, there are two correct answers: 4 or 2. b. Using SQL, write each database request you identified in step a. The database requests are shown in the following table. Four SQL statements Two SQL statements UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘A’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘B’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘C’ UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR PART_CODE = ‘C’ c. Write the complete transaction(s). The transactions are shown in the following table. Four SQL statements Two SQL statements BEGIN TRANSACTION UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘A’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘B’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘C’ COMMIT; BEGIN TRANSACTION UPDATE PRODUCT SET PROD_QOH = PROD_OQH + 1 WHERE PROD_CODE = ‘ABC’ UPDATE PART SET PART_QOH = PART_OQH - 1 WHERE PART_CODE = ‘A’ OR PART_CODE = ‘B’ OR PART_CODE = ‘C’ COMMIT; d. Write the transaction log, using Table 10.1 as your template. We assume that product ‘ABC’ has a PROD_QOH = 23 at the start of the transaction and that the transaction is representing the addition of 1 new product. We also assume that PART components “A”, “B” and “C” have a PROD_QOH equal to 56, 12, and 45 respectively. TRL ID TRX NUM PREV PTR NEXT PTR OPERATION TABLE ROW ID ATTRIBUTE BEFORE VALUE AFTER VALUE 1 1A3 NULL 2 START **START TRANSACTION 2 1A3 1 3 UPDATE PRODUCT ‘ABC’ PROD_QOH 23 24 3 1A3 2 4 UPDATE PART ‘A’ PART_QOH 56 55 4 1A3 3 5 UPDATE PART ‘B’ PART_QOH 12 11 5 1A3 4 6 UPDATE PART ‘C’ PART_QOH 45 44 6 1A3 5 NULL COMMIT ** END TRANSACTION e. Using the transaction log you created in Step d, trace its use in database recovery. Begin with the last trl_id (trl_id 6) for the transaction (trx_num 1A3) and work backward using the prev_ptr to identify the next step to undo moving from the end of the transaction back to the beginning. Trl_ID 6: Nothing to change because it is a end of transaction marker. Trl_ID 5: Change PART_QOH from 44 to 45 for ROW_ID 'C' in PART table. Trl_ID 4: Change PART_QOH from 11 to 12 for ROW_ID 'B' in PART table. Trl_ID 3: Change PART_QOH from 55 to 56 for ROW_ID 'A' in PART table. Trl_ID 2: Change PROD_QOH from 24 to 23 for ROW_ID 'ABC' in PRODUCT table. Trl_ID 1: Nothing to change because it is a beginning of transaction marker. 2. Describe the three most common problems with concurrent transaction execution. Explain how concurrency control can be used to avoid those problems. The three main concurrency control problems are triggered by lost updates, uncommitted data, and inconsistent retrievals. These control problems are discussed in detail in Section 10-2. Note particularly Section 10-2a, Lost Updates, Section 10-2b, Uncommitted Data, and Section 10-2c, Inconsistent Retrievals. 3. What DBMS component is responsible for concurrency control? How is this feature used to resolve conflicts? Severe database integrity and consistency problems can arise when two or more concurrent transactions are executed. In order to avoid such problems, the DBMS must exercise concurrency control. The DBMS's component in charge of concurrency control is the scheduler. The scheduler is discussed in Section 10-2d. Note particularly the Read/Write conflict scenarios illustrated with the help of Table 10.11, Read/Write Conflict Scenarios: Conflicting Database Operations Matrix. 4. Using a simple example, explain the use of binary and shared/exclusive locks in a DBMS. Discuss Section 10-3, Concurrency Control with Locking Methods. Binary locks are discussed in Section 10-3b, Lock Types. 5. Suppose that your database system has failed. Describe the database recovery process and the use of deferred-write and write-through techniques. Recovery restores a database from a given state, usually inconsistent, to a previously consistent state. Depending on the type and the extent of the failure, the recovery process ranges from a minor short-term inconvenience to a major long-term rebuild action. Regardless of the extent of the required recovery process, recovery is not possible without backup. The database recovery process generally follows a predictable scenario: 1. Determine the type and the extent of the required recovery. 2. If the entire database needs to be recovered to a consistent state, the recovery uses the most recent backup copy of the database in a known consistent state. 3. The backup copy is then rolled forward to restore all subsequent transactions by using the transaction log information. 4. If the database needs to be recovered, but the committed portion of the database is usable, the recovery process uses the transaction log to "undo" all the transactions that were not committed. Recovery procedures generally make use of deferred-write and write-thru techniques. In the case of the deferred-write or deferred-update, the transaction operations do not immediately update the database. Instead: • All changes (previous and new data values) are first written to the transaction log • The database is updated only after the transaction reaches its commit point. • If the transaction fails before it reaches its commit point, no changes (no roll-back or undo) need to be made to the database because the database was never updated. In contrast, if the write-thru or immediate-update technique is used: • The database is immediately updated by transaction operations during the transaction's execution, even before the transaction reaches its commit point. • The transaction log is also updated, so if a transaction fails, the database uses the log information to roll back ("undo") the database to its previous state. ONLINE CONTENT The Ch10_ABC_Markets database is available at www.cengagebrain.com. This database is stored in Microsoft Access format. 6. ABC Markets sell products to customers. The relational diagram shown in Figure P10.6 represents the main entities for ABC’s database. Note the following important characteristics: • A customer may make many purchases, each one represented by an invoice.  The CUS_BALANCE is updated with each credit purchase or payment and represents the amount the customer owes.  The CUS_BALANCE is increased (+) with every credit purchase and decreased (-) with every customer payment.  The date of last purchase is updated with each new purchase made by the customer.  The date of last payment is updated with each new payment made by the customer. • An invoice represents a product purchase by a customer.  An INVOICE can have many invoice LINEs, one for each product purchased.  The INV_TOTAL represents the total cost of invoice including taxes.  The INV_TERMS can be “30,” “60,” or “90” (representing the number of days of credit) or “CASH,” “CHECK,” or “CC.”  The invoice status can be “OPEN,” “PAID,” or “CANCEL.” • A product’s quantity on hand (P_QTYOH) is updated (decreased) with each product sale. • A customer may make many payments. The payment type (PMT_TYPE) can be one of the following:  “CASH” for cash payments.  “CHECK” for check payments  “CC” for credit card payments • The payment details (PMT_DETAILS) are used to record data about check or credit card payments:  The bank, account number, and check number for check payments  The issuer, credit card number, and expiration date for credit card payments. Note: Not all entities and attributes are represented in this example. Use only the attributes indicated. FIGURE P10.6 The ABC Markets Relational Diagram Using this database, write the SQL code to represent each one of the following transactions. Use BEGIN TRANSACTION and COMMIT to group the SQL statements in logical transactions. a. On May 11, 2016, customer ‘10010’ makes a credit purchase (30 days) of one unit of product ‘11QER/31’ with a unit price of $110.00; the tax rate is 8 percent. The invoice number is 10983, and this invoice has only one product line. a. BEGIN TRANSACTION b. INSERT INTO INVOICE i. VALUES (10983, ‘10010’, ‘11-May-2018’, 118.80, ‘30’, ‘OPEN’); c. INSERT INTO LINE i. VALUES (10983, 1, ‘11QER/31’, 1, 110.00); d. UPDATE PRODUCT i. SET P_QTYOH = P_QTYOH – 1 ii. WHERE P_CODE = ‘11QER/31’; e. UPDATE CUSTOMER f. SET CUS_DATELSTPUR = ‘11-May-2018’, CUS_BALANCE = CUS_BALANCE +118.80 g. WHERE CUS_CODE = ‘10010’; h. COMMIT; b. On June 3, 2016, customer ‘10010’ makes a payment of $100 in cash. The payment ID is 3428. a. BEGIN TRANSACTION b. INSERT INTO PAYMENTS VALUES (3428, ‘03-Jun-2018’, ‘10010’, 100.00, ‘CASH’, 'None'); UPDATE CUSTOMER; SET CUS_DATELSTPMT = ‘03-Jun-2018’, CUS_BALANCE = CUS_BALANCE -100.00 WHERE CUS_CODE = ‘10010’; COMMIT 7. Create a simple transaction log (using the format shown in Table 10.13) to represent the actions of the two previous transactions. The transaction log is shown in Table P10.7 Table P10.7 The ABC Markets Transaction Log TRL ID TRX NUM PREV PTR NEXT PTR OPERATION TABLE ROW ID ATTRIBUTE BEFORE VALUE AFTER VALUE 987 101 Null 1023 START * Start Trx. 1023 101 987 1026 INSERT INVOICE 10983 10983, 10010, 11-May-2018, 118.80, 30, OPEN 1026 101 1023 1029 INSERT LINE 10983, 1 10983, 1, 11QER/31, 1, 110.00 1029 101 1026 1031 UPDATE PRODUCT 11QER/31 P_QTYOH 47 46 1031 101 1029 1032 UPDATE CUSTOMER 10010 CUS_BALANCE 345.67 464.47 1032 101 1031 1034 UPDATE CUSTOMER 10010 CUS_DATELSTPUR 5-May-2014 11-May-2018 1034 101 1032 Null COMMIT * End Trx. * 1089 102 Null 1091 START * Start Trx. 1091 102 1089 1095 INSERT PAYMENT 3428 3428, 3-Jun-2018, 10010, 100.00, CASH, None 1095 102 1091 1096 UPDATE CUSTOMER 10010 CUS_BALANCE 464.47 364.47 1096 102 1095 1097 UPDATE CUSTOMER 10010 CUS_DATELSTPMT 2-May-2014 3-Jun-2018 1097 102 1096 Null COMMIT * End Trx. Note: Because we have not shown the table contents, the "before" values in the transaction can be assumed. The "after" value must be computed using the assumed "before" value, plus or minus the transaction value. Also, in order to save some space, we have combined the "after" values for the INSERT statements into a single cell. Actually, each value could be entered in individual rows. 8. Assuming that pessimistic locking is being used, but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Time Action 1 Lock INVOICE 2 Insert row 10983 into INVOICE 3 Unlock INVOICE 4 Lock LINE 5 Insert tow 10983, 1 into LINE 6 Unlock LINE 7 Lock PRODUCT 8 Update PRODUCT 11QER/31, P_QTYOH from 47 to 46 9 Unlock PRODUCT 10 Lock CUSTOMER 11 Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47 12 Update CUSTOMER 10010, CUS_DATELSTPUR from 05-May-2018 to 11-May-2018 13 Unlock CUSTOMER 9. Assuming that pessimistic locking with the two-phase locking protocol is being used, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6a. Time Action 1 Lock INVOICE 2 Lock LINE 3 Lock PRODUCT 4 Lock CUSTOMER 5 Insert row 10983 into INVOICE 6 Insert tow 10983, 1 into LINE 7 Update PRODUCT 11QER/31, P_QTYOH from 47 to 46 8 Update CUSTOMER 10010, CUS_BALANCE from 345.67 to 464.47 9 Update CUSTOMER 10010, CUS_DATELSTPUR from 05-May-2018 to 11-May-2018 10 Unlock INVOICE 11 Unlock LINE 12 Unlock PRODUCT 13 Unlock CUSTOMER 10. Assuming that pessimistic locking is being used, but the two-phase locking protocol is not, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Time Action 1 Lock PAYMENT 2 Insert row 3428 into PAYMENT 3 Unlock PAYMENT 4 Lock CUSTOMER 5 Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47 6 Update CUSTOMER 10010, CUS_DATELSTPMT from 02-May-2018 to 03-Jun-2018 7 Unlock CUSTOMER 11. Assuming that pessimistic locking with the two-phase locking protocol is being used, create a chronological list of the locking, unlocking, and data manipulation activities that would occur during the complete processing of the transaction described in Problem 6b. Time Action 1 Lock PAYMENT 2 Lock CUSTOMER 3 Insert row 3428 into PAYMENT 4 Update CUSTOMER 10010, CUS_BALANCE from 464.47 to 364.47 5 Update CUSTOMER 10010, CUS_DATELSTPMT from 02-May-2018 to 03-Jun-2018 6 Unlock PAYMENT 7 Unlock CUSTOMER Additional Problems and Answers The following problems are designed to give your students additional practice … or you can use them as test questions. 1. Write the SQL statements that might be used in transaction management and explain how they work. The following transaction registers the credit sale of a product to a customer. Comment BEGIN TRANSACTION Start transaction INSERT INTO INVOICE Add record to invoice (INV_NUM, INV_DATE, ACCNUM, TOTAL) VALUES (1020,’15-MAR-2018’,'60120010',3500); UPDATE INVENTRY Update the quantity on hand of the SET ON_HAND = ON_HAND – 100 product WHERE PROD_CODE = '345TYX'; UPDATE ACCREC Update the customer balance SET BALANCE = BALANCE + 3500 account WHERE ACCNUM = '60120010'; COMMIT; The credit sale transaction must do all of the following: 1. Create the invoice record. 2. Update the inventory data. 3. Update the customer account data. In SQL, the transaction begins automatically with the first SQL statement, or the user can start with the BEGIN TRANSACTION statement. The SQL transaction ends when • the last SQL statement is found and/or the program ends • the user cancels the transaction • COMMIT or ROLLBACK statements are found The DBMS will ensure that all SQL statements are executed and completed before committing all work. If, for any reason, one or more of the SQL statements in the transaction cannot be completed, the entire transaction is aborted and the database is rolled back to its previous consistent state. 2. Starting with a consistent database state, trace the activities that are required to execute a set of transactions to produce an updated consistent database state. The following example traces the execution of problem 1's credit sale transaction. We will assume that the transaction is based on a currently consistent database state. We will also assume that the transaction is the only transaction being executed by the DBMS. Time Transaction Table Operation Comment 0 Database is in a consistent state. 1 Write INVOICE INV_NUM = 1020 INSERT Invoice number into the INVOICE table 2 Read INVENTORY ON_HAND = 134 3 ON_HAND = 134 - 100 UPDATE the quantity on hand of product 345TYX 4 Write ON_HAND = 34 5 Read ACCREC ACC_BALANCE = 900 6 ACC_BALANCE = 900 + 3500 7 Write ACC_BALANCE = 4400 8 COMMIT Permanently saves all changes to the database. The database is in a consistent state. 3. Write an example of a database request. A database transaction is composed of one or more database requests. A database request is the equivalent of an SQL statement, i.e. SELECT, INSERT, UPDATE, PROJECT, etc. Some database transactions are as simple as: SELECT * FROM ACCOUNT; A database request can include references to one or more tables. For example, the request SELECT ACCT_NUM, CUSTOMER.CUS_NUM, INV_NUM, INV_DATE, INV_TOTAL FROM ACCOUNT, INVOICE WHERE ACCOUNT.ACCT_NUM = INVOICE.ACCT_NUM AND ACCCOUNT.ACCT_NUM = '60120010' will list all the invoices for customer '60120010'. Note that the preceding query accesses two tables: ACCOUNT and INVOICE. Note also that, if an attribute shows up in different places (as a primary key and as a foreign key), its source must be specified to avoid an ambiguity error message. A database request may update a database or insert a new row in a table. For example the database request INSERT INTO INVOICE (INV_NUM, INV_DATE, ACCT_NUM, INV_TOTAL) VALUES (1020,’10-Feb-2018’,'60120010',3500); will insert a new row into the INVOICE table. 4. Trace the use of the transaction log in database recovery. The following transaction log traces the database transaction explained in problem 1. Transaction Log TRL ID TRX NUM PREV PTR NEXT PTR OPERATION TABLE ROW ID ATTRIBUTE BEFORE VALUE AFTER VALUE 1 101 NULL 2 * Start TRX * 2 101 1 3 INSERT INVOICE 1020, ’10-Feb-2018’, '60120010', 3500 3 101 2 4 UPDATE PRODUCT 345TYX PROD_ON_HAND 134 34 4 101 3 5 UPDATE ACCOUNT 60120010 ACCT_BALANCE 900 4,400 5 101 4 NULL COMMIT * The TID (Transaction ID) is automatically assigned by the DBMS The transaction log maintains a record of all database transactions that changed the database. For example, the preceding transaction log records • the insertion of a new row to the INVOICE table • the update to the P_ON_HAND attribute for the row identified by '345TYX' in the PRODUCT table • and the update of ACCT_BALANCE attribute for the row identified by '60120010' in the ACCOUNT table. The transaction log also records the transaction's beginning and end in order to help the DBMS to determine the operations that must be rolled back if the transaction is aborted. Note: Only the current transaction may be rolled back, not all the previous transactions. If the database must be recovered, the DBMS will: • Change the BALANCE attribute of the row '60120010' from 4400 to 900 in the ACCREC table. • Change the ON_HAND attribute of the row '345TYX' from 34 to 134 in the INVENTORY table. • Delete row 1020 of the INVOICE table. 5. Suppose you are asked to evaluate a DBMS in terms of lock granularity and the different locking levels. Create a simple database environment in which these features would be important. Lock granularity describes the different lock levels supported by a DBMS. The lock levels are: Database-level • The DBMS locks the entire database. If transaction T1 locks the database, transaction T2 cannot access any database tables until T1 releases the lock. • Database-level locks work well in a batch processing environment when the bulk of the transactions are executed off line and a batch process is used to update the master database at off peak times such as midnight, weekends, etc. Table-level • The DBMS locks an entire table within a database. This lock level prevents access to any row by a transaction T2 while transaction T1 is using the table. However, two transactions can access the database as long as they access different tables. • Table-level locks are only appropriate when table sharing is not an issue. For example, if researchers pursue unrelated research projects within a research database, the DBMS will be able to allow such users to access their data without affecting the work of other users. Page-level • The DBMS will lock an entire disk-page. A disk page or page is the equivalent of a disk block, which may be described as a (referenced) section of a disk. A page has a fixed size, such as 4K, 8K, 16K, etc. A table may span several pages, and a page may contain several rows of one or more tables. Page-level locks are (currently) the most frequently used of the multi user DBMS locking devices. • Page-level locks are particularly appropriate in a multi-user DBMS system in which data sharing is a crucial component. For example, page-level locks are common in accounting systems, sales systems, and payroll systems. In fact, just about any business DBMS application that runs in a multi-user environment benefits from page-level locking. Row-level • The row level lock is much less restrictive than the locks we have just discussed. Row level locks permit the DBMS to allow concurrent transactions to access different rows of the same table, even if these rows are located on the same page. Although the row level locking approach improves the availability of data, its management requires high overhead cost because a lock exists for each row in each table of the database. • Row-level locking is yet to be implemented in most of the currently available DBMS systems. Consequently, row level locks are mostly a curiosity at this point. Nevertheless, its very high degree of shareability makes it a potential option for multi user database applications like the ones that currently use page level locking. Field-level • The field-level locking approach allows concurrent transactions to access the same row as long as they use different attributes within that row. Although field level locking clearly yields the most flexible multi user data access, it requires too much computer overhead to be practical at this point. Chapter 11 Database Performance Tuning And Query Optimization Discussion Focus This chapter focuses on the factors that directly affect database performance. Because performance-tuning techniques can be DBMS-specific, the material in this chapter may not be applicable under all circumstances, nor will it necessarily pertain to all DBMS types. This chapter is designed to build a foundation for the general understanding of database performance-tuning issues and to help you choose appropriate performance-tuning strategies. (For the most current information about tuning your database, consult the vendor’s documentation.) • Start by covering the basic database performance-tuning concepts. Encourage students to use the web to search for information about the internal architecture (internal process and database storage formats) of various database systems. Focus on the similarities to lay a common foundation. • Explain how a DBMS processes SQL queries in general terms and stress the importance of indexes in query processing. Emphasize the generation of database statistics for optimum query processing. • Step through the query processing example in section 11-4, Optimizer Choices. • Discuss the common practices used to write more efficient SQL code. Emphasize that some practices are DBMS-specific. As technology advances, the query optimization logic becomes increasingly sophisticated and effective. Therefore, some of the SQL practices illustrated in this chapter may not improve query performance as dramatically as it does in older systems. • Finally, illustrate the chapter material using the query optimization example in section 11-8. Answers to Review Questions 1. What is SQL performance tuning? SQL performance tuning describes a process – on the client side – that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end. 2. What is database performance tuning? DBMS performance tuning describes a process – on the server side – that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible, while making optimum use of existing resources. 3. What is the focus of most performance tuning activities, and why does that focus exist? Most performance-tuning activities focus on minimizing the number of I/O operations, because the I/O operations are much slower than reading data from the data cache. 4. What are database statistics, and why are they important? The term database statistics refers to a number of measurements gathered by the DBMS to describe a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such as tables, indexes, and available resources -- such as number of processors used, processor speed, temporary space available, and so on. Such statistics are used to make critical decisions about improving the query processing efficiency. 5. How are database statistics obtained? Database statistics can be gathered manually by the DBA or automatically by the DBMS. For example, many DBMS vendors support the SQL’s ANALYZE command to gather statistics. In addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization parameters. 6. What database statistics measurements are typical of tables, indexes, and resources? For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, what columns have indexes, and so on. For indexes, typical measurements include the number and name of columns in the index key, the number of key values in the index, the number of distinct key values in the index key, histogram of key values in an index, etc. For resources, typical measurements include the logical and physical disk block size, the location and size of data files, the number of extends per data file, and so on. 7. How is the processing of SQL DDL statements (such as CREATE TABLE) different from the processing required by DML statements? A DDL statement actually updates the data dictionary tables or system catalog, while a DML statement (SELECT, INSERT, UPDATE and DELETE) mostly manipulates end user data. 8. In simple terms, the DBMS processes queries in three phases. What are those phases, and what is accomplished in each phase? The three phases are: 1. Parsing. The DBMS parses the SQL query and chooses the most efficient access/execution plan. 2. Execution. The DBMS executes the SQL query using the chosen execution plan. 3. Fetching. The DBMS fetches the data and sends the result set back to the client. Parsing involves breaking the query into smaller units and transforming the original SQL query into a slightly different version of the original SQL code -- but one that is “fully equivalent” and more efficient. Fully equivalent means that the optimized query results are always the same as the original query. More efficient means that the optimized query will, almost always, execute faster than the original query. (Note that we say almost always because many factors affect the performance of a database. These factors include the network, the client’s computer resources, and even other queries running concurrently in the same database.) After the parsing and execution phases are completed, all rows that match the specified condition(s) have been retrieved, sorted, grouped, and/or – if required – aggregated. During the fetching phase, the rows of the resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data. 9. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.) Indexing every column in every table will tax the DBMS too much in terms of index-maintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes. One measure to determine the need for an index is the data sparsity of the column you want to index. Data sparsity refers to the number of different values a column could possibly have. For example, a STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student date of birth can have many different date values; therefore, this column is said to have high sparsity. Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For example, when you perform a search in a column with low sparsity, you are very likely to read a high percentage of the table rows anyway; therefore index processing may be unnecessary work. 10. What is the difference between a rule-based optimizer and a cost-based optimizer? A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of the execution plan. A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being accessed to determine the best approach to execute a query. In this case, the optimizer process adds up the processing cost, the I/O costs and the resource costs (RAM and temporary space) to come up with the total cost of a given execution plan. 11. What are optimizer hints and how are they used? Hints are special instructions for the optimizer that are embedded inside the SQL command text. Although the optimizer generally performs very well under most circumstances, there are some circumstances in which the optimizer may not choose the best execution plan. Remember, the optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer choice may not be the most efficient one. There are some occasions when the end-user would like to change the optimizer mode for the current SQL statement. In order to accomplish this task, you have to use hints. 12. What are some general guidelines for creating and using indexes? Create indexes for each single attribute used in a WHERE, HAVING, ORDER BY, or GROUP BY clause. If you create indexes in all single attributes used in search conditions, the DBMS will access the table using an index scan, instead of a full table scan. For example, if you have an index for P_PRICE, the condition P_PRICE > 10.00 can be solved by accessing the index, instead of sequentially scanning all table rows and evaluating P_PRICE for each row. Indexes are also used in join expressions, such as in CUSTOMER.CUS_CODE = INVOICE.CUS_CODE. Do not use indexes in small tables or tables with low sparsity. Remember, small tables and low sparsity tables are not the same thing. A search condition in a table with low sparsity may return a high percentage of table rows anyway, making the index operation too costly and making the full table scan a viable option. Using the same logic, do not create indexes for tables with few rows and few attributes—unless you must ensure the existence of unique values in a column. Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural joins and old-style joins will benefit if you declare primary keys and foreign keys because the optimizer will use the available indexes at join time. (The declaration of a PK or FK will automatically create an index for the declared column. Also, for the same reason, it is better to write joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”) Declare indexes in join columns other than PK/FK. If you do join operations on columns other than the primary and foreign key, you may be better off declaring indexes in such columns. 13. Most query optimization techniques are designed to make the optimizer’s work easier. What factors should you keep in mind if you intend to write conditional expressions in SQL code? Use simple columns or literals as operands in a conditional expression—avoid the use of conditional expressions with functions whenever possible. Comparing the contents of a single column to a literal is faster than comparing to expressions. Numeric field comparisons are faster than character, date, and NULL comparisons. In search conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster by the CPU than character and date comparisons. Because indexes do not store references to null values, NULL conditions involve additional processing and therefore tend to be the slowest of all conditional operands. Equality comparisons are faster than inequality comparisons. As a general rule, equality comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is processed faster because the DBMS can do a direct search using the index in the column. If there are no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>, >=, <, <=) the DBMS must perform additional processing to complete the request. This is because there would almost always be more “greater than” or “less than” values and perhaps only a few exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the “not equal” symbol () yields slower searches, especially if the sparsity of the data is high; that is, if there are many more different values than there are equal values. Whenever possible, transform conditional expressions to use literals. For example, if your condition is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such as: P_QOH

10 When using multiple conditional expressions, write the equality conditions first. (Note that we did this in the previous example.) Remember, equality conditions are faster to process than inequality conditions. Although most RDBMSs will automatically do this for you, paying attention to this detail lightens the load for the query optimizer. (The optimizer won’t have to do what you have already done.) If you use multiple AND conditions, write the condition most likely to be false first. If you use this technique, the DBMS will stop evaluating the rest of the conditions as soon as it finds a conditional expression that is evaluated to be false. Remember, for multiple AND conditions to be found true, all conditions must be evaluated as true. If one of the conditions evaluates to false, everything else is evaluated as false. Therefore, if you use this technique, the DBMS won’t waste time unnecessarily evaluating additional conditions. Naturally, the use of this technique implies an implicit knowledge of the sparsity of the data set. Whenever possible, try to avoid the use of the NOT logical operator. It is best to transform a SQL expression containing a NOT logical operator into an equivalent expression. For example: NOT (P_PRICE > 10.00) can be written as P_PRICE = (SELECT AVG(P_PRICE) FROM PRODUCT); 7. Assuming that there are no table statistics, what type of optimization will the DBMS use? The DBMS will use the rule-based optimization. 8. What type of database I/O operations will likely be used by the query? (See Table 11.3.) The DBMS will likely use a full table scan to compute the average price in the inner subquery. The DBMS is also very likely to use another full table scan of PRODUCT to execute the outer query. (We have reproduced the table for your convenience.) TABLE 11.3 Sample DBMS Access Plan I/O Operations Operation Description Table Scan (Full) Reads the entire table sequentially, from the first row to the last row, one row at a time (slowest) Table Access (Row ID) Reads a table row directly, using the row ID value (fastest) Index Scan (Range) Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan) Index Access (Unique) Used when a table has a unique index in a column Nested Loop Reads and compares a set of values to another set of values, using a nested loop style (slow) Merge Merges two data sets (slow) Sort Sorts a data set (slow) 9. What is the likely data sparsity of the P_PRICE column? Because each product is likely to have a different price, the P_PRICE column is likely to have high sparsity. 10. Should you create an index? Why or why not? Yes, you should create an index because the column P_PRICE has high sparsity and the column is very likely to be used in many different SQL queries as part of a conditional expression. Problems 11-14 are based on the following query: SELECT P_CODE, SUM(LINE_UNITS) FROM LINE GROUP BY P_CODE HAVING SUM(LINE_UNITS) > (SELECT MAX(LINE_UNITS) FROM LINE); 11. What is the likely data sparsity of the LINE_UNITS column? The LINE_UNITS column in the LINE table represents the quantity purchased of a given product in a given invoice. This column is likely to have many different values and therefore, the column is very likely to have high sparsity. 12. Should you create an index? If so, what would the index column(s) be, and why would you create that index? If not, explain your reasoning. Yes, you should create an index on LINE_UNITS. This index is likely to help in the execution of the inner query that computes the maximum value of LINE_UNITS. 13. Should you create an index on P_CODE? If so, write the SQL command to create that index. If not, explain your reasoning. Yes, creating an index on P_CODE will help in query execution. However, most DBMSs automatically index foreign key columns. If this is not the case in your DBMS, you can manually create an index using the CREATE INDEX LINE_NDX1 ON LINE(P_CODE) command. (Note that we have named the index LINE_NDX1.) 14. Write the command to create statistics for this table. ANALYZE TABLE LINE COMPUTE STATISTICS; Problems 15-16 are based on the following query: SELECT P_CODE, P_QOH*P_PRICE FROM PRODUCT WHERE P_QOH*P_PRICE > (SELECT AVG(P_QOH*P_PRICE) FROM PRODUCT) 15. What is the likely data sparsity of the P_QOH and P_PRICE columns? The P_QOH and P_PRICE are likely to have high data sparsity. 16. Should you create an index? If so, what would the index column(s) be, and why should you create that index? In this case, creating an index on P_QOH or on P_PRICE will not help the query execute faster for two reasons: first, the WHERE condition on the outer query uses an expression and second, the aggregate function also uses an expression. When using expressions in the operands of a conditional expression, the DBMS will not use indexes available on the columns that are used in the expression. Problems 17-19 are based on the following query: SELECT V_CODE, V_NAME, V_CONTACT, V_STATE FROM VENDOR WHERE V_STATE = ‘TN’ ORDER BY V_NAME; 17. What indexes should you create and why? Write the SQL command to create the indexes. You should create an index on the V_STATE column in the VENDOR table. This new index will help in the execution of this query because the conditional operation uses the V_STATE column in the conditional criteria. In addition, you should create an index on V_NAME, because it is used in the ORDER BY clause. The commands to create the indexes are: CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_NAME); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 18. Assume that 10,000 vendors are distributed as shown in Table P11.18. What percentage of rows will be returned by the query? Table P11.18 Distribution of Vendors by State State Number of Vendors State Number of Vendors AK 15 MS 47 AL 55 NC 358 AZ 100 NH 25 CA 3244 NJ 645 CO 345 NV 16 FL 995 OH 821 GA 75 OK 62 HI 68 PA 425 IL 89 RI 12 IN 12 SC 65 KS 19 SD 74 KY 45 TN 113 LA 29 TX 589 MD 208 UT 36 MI 745 VA 375 MO 35 WA 258 Given the distribution of values in Table P11.18, the query will return 113 of the 10,000 rows, or 1.13% of the total table rows. 19. What type of I/O database operations would be most likely to be used to execute that query? Assuming that you create the index on V_STATE and that you generate the statistics on the VENDOR table, the DBMS is very likely to use the index scan range to access the index data and then use the table access by row ID to get the VENDOR rows. 20 Using Table 11.4 as an example, create two alternative access plans. The two access plans are shown in Table P11.20. Table P11.20 Comparing Access Plans and I/O Costs Plan Step Operation I/O Operations I/O Cost Resulting Set Rows Total I/O Cost A A1 Full table scan VENDOR Select only rows with V_STATE=’TN’ 10,000 10,000 113 10,000 A A2 SORT Operation 113 113 113 10,113 B B1 Index Scan Range of VEND_NDX1 113 113 113 113 B B2 Table Access by RowID VENDOR 113 113 113 226 B B3 SORT Operation 113 113 113 339 In Plan A, the DBMS uses a full table scan of VENDOR. The SORT operation is done to order the output by vendor name. In Plan B, the DBMS uses an Index Scan Range of the VEND_NDX1 index to get the VENDOR RowIDs. Next, the DBMS uses the RowIDs to get the EMPLOYEE rows. Finally, the DBMS sorts the result set by V_NAME. 21. Assume that you have 10,000 different products stored in the PRODUCT table and that you are writing a Web-based interface to list all products with a quantity on hand (P_QOH) that is less than or equal to the minimum quantity, P_MIN. What optimizer hint would you use to ensure that your query returns the result set to the Web interface in the least time possible? Write the SQL code. You will write your query using the FIRST_ROWS hint to minimize the time it takes to return the first set of rows to the application. The query would be SELECT /*+ FIRST_ROWS */* FROM PRODUCT WHERE P_QOH <= P_MIN; Problems 22-24 are based on the following query: SELECT P_CODE, P_DESCRIPT, P_PRICE, PRODUCT.V_CODE, V_STATE FROM PRODUCT P, VENDOR V WHERE P.V_CODE = V.V_CODE AND V_STATE = ‘NY’ AND V_AREACODE = ‘212’; ORDER BY P_PRICE; 22. What indexes would you recommend? This query uses the V_STATE and V_AREACODE attributes in its conditional criteria. Furthermore, the conditional criteria use equality comparisons. Given these conditions, an index on V_STATE and another index on V_AREACODE are highly recommended. 23. Write the commands required to create the indexes you recommended in Problem 22. CREATE INDEX VEND_NDX1 ON VENDOR(V_STATE); CREATE INDEX VEND_NDX2 ON VENDOR(V_AREACODE); Note that we have used the index names VEND_NDX1 and VEND_NDX2, respectively. 24. Write the command(s) used to generate the statistics for the PRODUCT and VENDOR tables. ANALYZE TABLE PRODUCT COMPUTE STATISTICS; ANALYZE TABLE VENDOR COMPUTER STATISTICS; Problems 25 and 26 are based on the following query: SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE FROM PRODUCT WHERE V_CODE = ‘21344’ ORDER BY P_CODE; 25. What index would you recommend, and what command would you use? This query uses one WHERE condition and one ORDER BY clause. The conditional expression uses the V_CODE column in an equality comparison. In this case, creating an index on the V_CODE attribute is recommended. If V_CODE is declared to be a foreign key, the DBMS may already have created such an index automatically. If the DBMS does not generate the index automatically, create one manually. The ORDER BY clause uses the P_CODE column. Create an index on the columns used in an ORDER BY is recommended. However, because the P_CODE column is the primary key of the PRODUCT table, a unique index already exists for this column and therefore, it is not necessary to create another index on this column. 26. How should you rewrite the query to ensure that it uses the index you created in your solution to Problem 25? In this case, the only index that should be created is the index on the V_CODE column. Assuming that such an index is called PROD_NDX1, you could use an optimizer hint as shown next: SELECT /*+ INDEX(PROD_NDX1)*/P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE FROM PRODUCT WHERE V_CODE = ‘21344’ ORDER BY P_CODE; Problems 27 and 28 are based on the following query: SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE FROM PRODUCT WHERE P_QOH < P_MIN AND P_MIN = P_REORDER’ AND P_REORDER = 50; ORDER BY P_QOH; 27. Use the recommendations given in Section 11-5b to rewrite the query to produce the required results more efficiently. SELECT P_CODE, P_DESCRIPT, P_QOH, P_PRICE, V_CODE FROM PRODUCT WHERE P_REORDER = 50 AND P_MIN = 50 AND P_QOH < 50 ORDER BY P_QOH; This new query rewrites some conditions as follows: • Because P_REORDER must be equal to 50, it replaces P_MIN = P_REORDER with P_MIN = 50. • Because P_MIN must be 50, it replaces P_QOH<P_MIN with P_QOH<50. Having literals in the query conditions make queries more efficient. Note that you still need all three conditions in the query conditions. 28. What indexes you would recommend? Write the commands to create those indexes. Because the query uses equality comparison on P_REORDER, P_MIN and P_QOH, you should have indexes in such columns. The commands to create such indexes are: CREATE INDEX PROD_NDX1 ON PRODUCT(P_REORDER); CREATE INDEX PROD_NDX2 ON PRODUCT(P_MIN); CREATE INDEX PROD_NDX3 ON PRODUCT(P_QOH); Problems 29-32 are based on the following query: SELECT CUS_CODE, MAX(LINE_UNITS*LINE_PRICE) FROM CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE WHERE CUS_AREACODE = ‘615’ GROUP BY CUS_CODE; 29. Assuming that you generate 15,000 invoices per month, what recommendation would you give the designer about the use of derived attributes? This query uses the MAX aggregate function to compute the maximum invoice line value by customer. Because this table increases at a rate of 15,000 rows per month, the query would take considerable amount of time to run as the number of invoice rows increases. Furthermore, because the MAX aggregate function uses an expression (LINE_UNITS*LINE_PRICE) instead of a simple table column, the query optimizer is very likely to perform a full table scan in order to compute the maximum invoice line value. One way to speed up the query would be to store the derived attribute LINE_TOTAL in the LINE_TABLE and create an index on LINE_TOTAL. This way, the query would benefit by using the index to execute the query. 30. Assuming that you follow the recommendations you gave in Problem 29, how would you rewrite the query? SELECT CUS_CODE, MAX(LINE_TOTAL) FROM CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE WHERE CUS_AREACODE = ‘615’ GROUP BY CUS_CODE; 31. What indexes would you recommend for the query you wrote in Problem 30, and what SQL commands would you use? The query will benefit from having an index on CUS_AREACODE and an index on CUS_CODE. Because CUS_CODE is a foreign key on invoice, it’s very likely that an index already exists. In any case, the query uses the CUS_AREACODE in an equality comparison and therefore, an index on this column is highly recommended. The command to create this index would be: CREATE INDEX CUS_NDX1 ON CUSTOMER(CUS_AREACODE); 32. How would you rewrite the query to ensure that the index you created in Problem 31 is used? You need to use the INDEX optimizer hint: SELECT /*+ INDEX(CUS_NDX1) */ CUS_CODE, MAX(LINE_TOTAL) FROM CUSTOMER NATURAL JOIN INVOICE NATURAL JOIN LINE WHERE CUS_AREACODE = ‘615’ GROUP BY CUS_CODE; Solution Manual for Database Systems: Design, Implementation, and Management Carlos Coronel, Steven Morris 9781337627900, 9781305627482

Document Details

Related Documents

person
Ethan Williams View profile
Close

Send listing report

highlight_off

You already reported this listing

The report is private and won't be shared with the owner

rotate_right
Close
rotate_right
Close

Send Message

image
Close

My favorites

image
Close

Application Form

image
Notifications visibility rotate_right Clear all Close close
image
image
arrow_left
arrow_right