CH-14-16 Big Data Analytics and NoSQL – Database Systems: Design, Implementation, and Management : Book Guide

Preview (12 of 38 pages)

This Document Contains Chapters 14 to 16 Chapter 14 Big Data Analytics and NoSQL Discussion Focus Start by explaining that Big Data is a nebulous term. Its definition and the composition of the techniques and technologies that are covered under this umbrella term are constantly changing and being redefined. There is no standardizing body for Big Data or NoSQL so there is no one in charge to make a definitive statement about exactly what qualifies as Big Data. This is made worse by the fact that most technologies for big data problems and the NoSQL movement are open-source so even the developers working in the arena are often a loose community without hierarchy or structure. As a generic definition, Big Data is data of such volume, velocity, and/or variety that it is difficult for traditional relational database technologies to store and process it. Students need to understand that the definition of Big Data is relative, not absolute. We cannot look at a collection of data and state categorically that it is Big Data now and for all time. We may categorize a set of data or a data storage and processing requirement as a Big Data problem today. In three years, or even in one year, relational database technologies may have advanced to the point where that same problem is no longer a Big Data problem. NoSQL has the same problem in terms of its definition. Since Big Data and NoSQL are both defined in terms of a negative statement that says what they are not instead of a positive statement that says what they are, they both suffer from being ill-defined and overly broad. Discuss the many V’s of Big Data. The basic V’s, volume, velocity, and variety are key to Big Data. Again, because of the lack of an authority to define what Big Data is, other V’s are added by writers and thinkers who like to jump on the alliteration of the 3 V’s. Beyond the 3 V’s, the other V’s that are proposed by various sources are often not really unique to Big Data. For example, all data have Volume. Big Data problems require Volume that is too large for relational database technologies to support. Veracity is the trustworthiness of the data. All data needs to be trustworthy. Big Data problems do not require support for a higher level of trustworthiness than relational database technologies can support. Therefore, the argument can be made that veracity is a characteristic of all data, not just Big Data. Students should understand that critical thinking about Big Data is necessary when assessing claims and technologies in this fast-changing arena. Discuss that Hadoop has been the beneficiary of great marketing and widespread buy-in from pundits. Hadoop has become synonymous with Big Data in the minds of many people that are passingly familiar with data management. However, Hadoop is a very specialized technology that is aimed at very specific tasks associated with storing and processing very large data sets in non-integrative ways. This makes the Hadoop ecosystem very important because the ecosystem can expand the basic HDFS and MapReduce capabilities to support a wider range of needs and allow greater integration of the data. Stress to students that the NoSQL landscape is constantly changing. There are about 100 products competing in the NoSQL environment as any point in time, with new entrants emerging almost daily and other products disappearing at about the same rate. The text follows the standard categories of NoSQL databases that appear in the literature, as shown below, but many products do not fit neatly into only one category: • Key-value • Document • Column family • Graph Each category attempts to deal with non-relational data in different ways. Data analysis focuses on attempting to generate knowledge to expand and inform the organization’s decision making processes. These topics were covered to a great extent in Chapter 13 when analyzing data from transactional databases integrated into data warehouses. In this chapter, the use of exploratory and predictive analytics are applied to non-relational databases. Answers to Review Questions 1. What is Big Data? Give a brief definition. Big Data is data of such volume, velocity, and/or variety that it is difficult for traditional relational database technologies to store and process it. 2. What are the traditional 3 Vs of Big Data? Briefly, define each. Volume, velocity, and variety are the traditional 3 Vs of Big Data. Volume refers to the quantity of the data that must be stored. Velocity refers to the speed with which new data is being generated and entering the system. Variety refers to the variations in the structure, or the lack of structure, in the data being captured. 3. Explain why companies like Google and Amazon were among the first to address the Big Data problem. In the 1990s, the use of the Internet exploded and commercial websites helped attract millions of new consumers to online transactions. When the dot-com bubble burst at the end of the 1990s, the millions of new consumers remained but the number of companies providing them services reduced dramatically. As a result, the surviving companies, like Google and Amazon experienced exponential growth in a very short time. This lead to these companies being among the first to experience the volume, velocity, and variety of data that is associated with Big Data. 4. Explain the difference between scaling up and scaling out. Scaling up involves improving storage and processing capabilities through the use of improved hardware, software, and techniques without changing the quantity of servers. Scaling out involves improving storage and processing capabilities through the use of more servers. 5. What is stream processing, and why is it sometimes necessary? Stream processing is the processing of data inputs to make decisions on which data should be stored and which data should be discarded. In some situations, large volumes of data can enter the system as such a rapid pace that it is not feasible to try to actually store all of the data. The data must be processed and filtered as it enters the system to determine which data to keep and which data to discard. 6. How is stream processing different from feedback loop processing? Stream processing focuses on inputs, while feedback loop processing focuses on outputs. Stream processing is performed on the data as it enters the system to decide which data should be stored and which should be discarded. Feedback loop processing uses data after it has been stored to conduct analysis for the purpose of making the data actionable by decision makers. 7. Explain why veracity, value, and visualization can also be said to apply to relational databases as well as Big Data. Veracity of data is an issue with even the smallest of data stores, which is why data management is so important in relational databases. Value of data also applies to traditional, structured data in a relational database. One of the keys to data modeling is that only the data that is of interest to the users should be included in the data model. Data that is not of value should not be recorded in any data store – Big Data or not. Visualization was discussed and illustrated at length in Chapter 13 as an important tool in working with data warehouses, which are often maintained as structured data stores in relational DBMS products. 8. What is polyglot persistence, and why is it considered a new approach? Polyglot persistence is the idea that an organization’s data storage solutions will consist of a range of data storage technologies. This is a new approach because the relational database has previously dominated the data management landscape to the point that the use of a relational DBMS for data storage was taken for granted in most cases. With Big Data problems, the reliance on only relational databases is no longer valid. 9. What are the key assumptions made by the Hadoop Distributed File System approach? HDFS is designed around the following assumptions: High volume Write-once, read-many Streaming access Fault tolerance HDFS assumes that the massive volumes of data will need to be stored and retrieved. HDFS assumes that data will be written once, that is, there will very rarely be a need to update the data once it has been written to disk. However, the data will need to be retrieved many times. HDFS assumes that when a file is retrieved, the entire contents of the file will need to be streamed in a sequential fashion. HDFS does not work well when only small parts of a file are needed. Finally, HDFS assumes that failures in the servers will be frequent. As the number of servers increases, the probability of a failure increases significantly. HDFS assumes that servers will fail so the data must be redundant to avoid loss of data when servers fail. 10. What is the difference between a name node and a data node in HDFS? The name node stores the metadata that tracks where all of the actual data blocks reside in the system. The name node is responsible for coordinating tasks across multiple data nodes to ensure sufficient redundancy of the data. The name node does not store any of the actual user data. The data nodes store the actual user data. A data node does not store metadata about the contents of any data node other than itself. 11. Explain the basic steps of MapReduce processing. • A client node submits a job to the Job Tracker. • Job Tracker determines where the data to be processed resides. • Job Tracker contacts the Task Tracker on the nodes as close as possible to the data. • Each Task Tracker creates mappers and reducers as needed to complete the processing of each block of data and consolidate that data into a result. • Task Trackers report results back to the Job Tracker when the mappers and reducers are finished. • The Job Tracker updates the status of the job to indicate when it is complete. 12. Briefly explain how HDFS and MapReduce are complementary to each other. Both HDFS and MapReduce rely on the concept of massive, relatively independent, distributions. HDFS decomposes data into large, independent chunks of data that are then distributed across a number of independent servers. MapReduce decomposes processing into independent tasks that are distributed across a number of independent servers. The distribution of data in HDFS is coordinated by a name node server that collects data from each server about the state of the data that it holds. The distribution of processing in MapReduce is coordinated by a job tracker that collects data from each server about the state of the processing it is performing. 13. What are the four basic categories of NoSQL databases? Key-value database, document databases, column family databases, and graph databases. 14. How are the value components of a key-value database and a document database different? In a key-value database, the value component is non intelligible for the database. In other words, the DBMS is unaware of the meaning of any of the data in the value component – it is treated as an indecipherable mass of data. All processing of the data in the value component must be accomplished by the application logic. In a document database, the value component is partially interpretable by the DBMS. The DBMS can identify and search for specific tags, or subdivisions, within the value component. 15. Briefly explain the difference between row-centric and column-centric data storage. Row-centric storage treats a row as the smallest data storage unit. All of the column values associated with a particular row of data are stored together in physical storage. This is the optimal storage approach for operations that manipulate and retrieve all columns in a row, but only a small number of rows in a table. Column-centric storage treats a row as a divisible collection of values that are stored separately with the values of a single column across many rows being physically stored together. This is optimal when operations manipulate and retrieve a small number of columns in a row for all rows in the table. 16. What is the difference between a column and a super column in a column family database? Columns in a column family database are relatively independent of each other. A super column is a group of columns that are logically related. This relationship can be based on the nature of the data in the columns, such as a group of columns that comprise an address, or it can be based on application processing requirements. 17. Explain why graph databases tend to struggle with scaling out? Graph databases are designed to address problems with highly related data. The data that appears in a graph database are tightly integrated and queries that traverse a graph focus on the relationships among the data. Scaling out requires moving data to number of different servers. As a general rule, scaling out is recommended when the data on each server is relatively independent of the data on other servers. Due to the dependencies among the data on different servers in a graph database, the inter-server communication overhead is very high with a graph database. This has a significant negative impact on the performance of graph databases in a scaled out environment. 18. Explain what it means for a database to be aggregate aware. Aggregate aware means that the designer of the database has to be aware of the way the data in the database will be used, and then design the database around whichever component would be central to that usage. Instead of decomposing the data structures to eliminate redundancy, an aggregate aware database is collects, or aggregates, all of the data around a central component to minimize the structures required during processing. Chapter 15 Database Connectivity and Web Technologies Discussion Focus Begin by making sure that the students are familiar with the basic vocabulary. The following two questions are a good place to start. (You may want to examine the contents of Appendix F, “Client/Server Systems.”) There is some irony in the Web development arena … the microcomputer was supposed to liberate the end user from the mainframe computer’s “fat server, thin client” environment. However, the Web has, in effect, brought us back to the old mainframe structure in which most processing is done on the server side, while the client is the source and recipient of data/information requests and returns. 1. Describe the following: TCP/IP, Router, HTML, HTTP, and URL. Transmission Control Protocol/Internet Protocol (TCP/IP) is the basic network protocol that determines the rules used to create and route “packets” of data between computers in the same or different networks. Each computer connected to the Internet has a unique TCP/IP address. The TCP/IP address is divided in two parts used to identify the network and the computer (or host) Router is a special hardware/software equipment that connects multiple and diverse networks. The router is in charge of delivering packets of data from a local network to a remote network. Routers are the traffic cops of the Internet, monitoring all traffic and moving data from one network to another. HTML stands for Hyper Text Markup Language is the standard document-formatting language for Web pages. HTML allows documents to be presented in a Web browser in a standard manner. URL stands for Uniform Resource Locator. An URL identifies the address of a resource on the Internet. The URL is an abbreviation (ideally easily remembered) that uniquely identifies an internet resource, for example www.amazon.com, www.faa.gov, and www.mtsu.edu.) HTTP stands for Hyper Text Transfer Protocol. HTTP is the standard protocol used by the Web browser and Web server to communicate—that is, to send requests and replies between servers and browsers. HTTP uses TCP/IP to transmit the data between computers. 2. Describe the client/server model for application processing. Client/server is a term used to describe a computing model for the development of computerized systems. This model is based on the distribution of functions between two types of independent and autonomous processes: servers and clients. A client is any process that requests specific services from server processes. A server is a process that provides requested services for clients. Client and server processes can reside in the same computer or in different computers connected by a network. The client/server model makes possible the division of the application processing tasks into three main components: presentation logic, processing logic, and data storage. • The presentation logic formats and presents data in output devices, such as the screen, and manages the end-user input. The application uses presentation logic to manage the graphical user interface at the client end. • The processing logic component refers to the application code that performs data validation, error checking, and business logic processing. The processing logic component represents the business rules. For example, the processing logic “knows” that a sales transaction generates an invoice record, an inventory update, and a customer’s account receivable update. The processing logic performs several functions, including enforcement of business rules, managing information flows within the business, and mapping the real-world business transactions to the actual computer database. • The data storage component deals with the actual data storage and retrieval from permanent storage devices. For example, the data manipulation logic is used to access the data in a database and to enforce data integrity. Although there is no methodology to dictate the precise distribution of the logic components among clients and servers, the client/server architectural principles of process distribution (autonomy, resource maximization, scalability, and interoperability) and hardware and software independence facilitate the creation of distributed applications running across multiple servers. Those applications provide services that communicate with each other in order to carry out specific function, therefore the term multi-tier applications. So, where should the services be placed? With the probable exception of the presentation logic, which should go on the client side, each of the remaining service components may be placed on the server side, thus becoming a service for many clients. Answers to Review Questions 1. Give some example of database connectivity options and what they are used for. Database connectivity refers to the mechanisms through which application programs connect and communicate with data repositories. The database connectivity software is also known as database middleware, because it represents a piece of software that interfaces between the application program and the database. The data repository is also known as the data source, because it represents the data management application (i.e. an Oracle RDBMS, SQL Server DBMS, or IBM DBMS) that will be used to store the data generated by the application program. Ideally, a data source or data repository could be located anywhere and hold any type of data. For example, the data source could be a relational database, a hierarchical database, a spreadsheet, a text data file, and so on. Although there are many different ways to achieve database connectivity, this section will cover only the following interfaces: native SQL connectivity (vendor provided), Microsoft’s Open Database Connectivity (ODBC), Data Access Objects (DAO) and Remote Data Objects (RDO), Microsoft’s Object Linking and Embedding - Databases (OLE-DB) and Microsoft’s ActiveX Data Objects (ADO.NET) and Java Database Connectivity (JDBC). 2. What are ODBC, DAO, and RDO? How are they related? Open Database Connectivity (ODBC) is Microsoft’s implementation of a superset of the SQL Access Group Call Level Interface (CLI) standard for database access. ODBC allows any Windows application to access relational data sources using SQL via a standard application programming interface (API). ODBC was the first widely adopted database middleware standard and enjoyed rapid adoption in Windows applications. As programming languages evolved, ODBC did not provide significant functionality beyond the ability to execute SQL to manipulate relational style data. Therefore, programmers needed a better way to access data. To answer this need, Microsoft developed two other data access interfaces:  Data Access Objects (DAO) is an object oriented API used to access MS Access, MS FoxPro and dBase databases (using the Jet data engine) from Visual Basic programs. DAO provided an optimized interface that exposed the functionality of the Jet data engine (on which MS Access database if based on) to programmers. The DAO interface can also be used to access other relational style data sources.  Remote Data Objects (RDO) is a higher-level object oriented application interface used to access remote database servers. RDO uses the lower-level DAO and ODBC for direct access to databases. RDO was optimized to deal with server based databases, such as MS SQL Server, Oracle, DB2, and so on. 3. What is the difference between DAO and RDO? DAO is uses the MS Jet engine to access file-based relational databases such as MS Access, MS FoxPro and Dbase. In contrast, RDO allows to access relational database servers such as SQL Server, DB2, and Oracle. RDO uses DAO and ODBC to access remote database server data. 4. What are the three basic components of the ODBC architecture? The basic ODBC architecture is composed of three main components: • A high level ODBC API trough which application programs access ODBC functionality. • A Driver Manager component that is in charge of managing all database connections. • An ODBC Driver component that talks directly to the DBMS (data source). 5. What steps are required to create an ODBC data source name? To define a data source you must create a data source name (DSN) for the data source. To create a DSN you have to provide: • An ODBC driver. You must identify the driver to use to connect to the data source. The ODBC driver is normally provided by the database vendor; although h Microsoft provides several drives to connect to the most common databases. For example, if you are using an Oracle DBMS you will select the Oracle ODBC drive provided by Oracle or, if desired, the Microsoft-provided ODBC Driver for Oracle. • A DSN name. This is a unique name by which the data source will be known to ODBC and therefore, to the applications. ODBC offers two types of data sources: User and System. User data sources are only available to the user. System data sources are available to all users, including operating system services. • ODBC driver parameters. Most ODBC drivers require some specific parameters in order to establish a connection to the database. For example, if you are using a MS Access database you must point to the location of the MS Access (.mdb) file and, if necessary, provide the user name and password. If you are using a DBMS server, you must provide the server name, the database name, and the user name and password used to connect to the database. Figure 15.3 shows the ODBC screens required to create a System ODBC data source for an Oracle DBMS. Note that some ODBC drivers use the native driver provided by the DBMS vendor. 6. What is OLE-DB used for, and how does it differ from ODBC? Although ODBC, DAO, and RDO were widely used, they did not provide support for non-relational data. To answer the need for non-relational data access and to simplify data connectivity, Microsoft developed Object Linking and Embedding for Database (OLE-DB). Based on Microsoft’s Component Object Model (COM), OLE-DB is a database middleware that was developed to add object-oriented functionality for access to relational and non-relational data. OLE-DB was the first piece of Microsoft’s strategy to provide a unified object-oriented framework for the development of next-generation applications. 7. Explain the OLE-DB model based on its two types of objects. OLE-DB is composed of a series of COM objects that provide low-level database connectivity for applications. Because OLE-DB is based on the COM object model, the objects contain data and methods (also known as the interface.) The OLE-DB model is better understood when you divide its functionality in two types of objects: • Consumers are all those objects (applications or processes) that request and use data. The data consumers request data by invoking the methods exposed by the data provider objects (public interface) and passing the required parameters. • Providers are the objects that manage the connection with a data source and provide data to the consumers. Providers are divided in two categories: data providers and service providers.  Data providers provide data to other processes. Database vendors create data provider objects that expose the functionality of the underlining data source (relational, object-oriented, text, and so on.)  Service providers provide additional functionality to consumers. The service provider is located between the data provider and the consumer: The service provider requests data from the data provider; transforms the data and provides the transformed data to the data consumer. In other words, the service provider acts like a data consumer of the data provider and as a data provider for the data consumer (end-user application). For example, a service provider could offer cursor management services, transaction management services, query processing services, indexing services, and so on. 8. How does ADO complement OLE-DB? OLE-DB provided additional capabilities for the applications accessing the data. However, it did not provide support for scripting languages, especially the ones used for web development, such as Active Server Pages (ASP) and ActiveX. To provide such support, Microsoft developed a new object framework called ActiveX Data Objects (ADO). ADO provides a high level application-oriented interface to interact with OLE-DB, DAO, and RDO. ADO provided a unified interface to access data from any programming language that uses the underlying OLE-DB objects. Figure 15.5 – borrowed from the text and reproduced here for your convenience -- illustrates the ADO/OLE-DB architecture and how it interacts with ODBC and native connectivity options. Figure 15.5 OLE-DB ARCHITECTURE 9. What is ADO.NET, and what two new features make it important for application development? ADO.NET is the data access component of Microsoft’s .NET application development framework. Microsoft’s .NET framework is a component-based platform for the development of distributed, heterogeneous, interoperable applications aimed to manipulate any type of data, over any network, and under any operating system and programming language. The .NET framework is beyond the reach of this book. Therefore, this section will only introduce the basic data access component of the .NET architecture, ADO.NET. ADO.Net introduced two new features critical for the development of distributed applications: datasets and XML support. • A DataSet is a disconnected memory-resident representation of the database. • ADO.NET stores all its internal data in XML format. 10. What is a DataSet, and why is it considered to be disconnected? A DataSet is a disconnected memory-resident representation of the database. That is, the DataSet contains tables, columns, rows, relationships, and constraints. Once the data are read from a data provider, the data are placed on a memory-resident DataSet. The DataSet is then disconnected from the data provider. The data consumer application interacts with the data in the DataSet object to make changes (inserts, updates and deletes) in the dataset. Once the processing is done, the DataSet data are synchronized with the data source and the changes are made permanent. A DataSet is in fact a simple database with tables, rows and constraints. Even, more important, the DataSet doesn’t require keeping a permanent connection to the data source. The DataAdapter uses the SelectCommand to populate the DataSet from a data source. However, once the DataSet is populated, it is completely independent of the data source – that’s why it’s called “disconnected.” 11. What are Web server interfaces used for? Give some examples. Web server interfaces are used to extend the functionality of the web server to provide more services. If a Web server is to successfully communicate with other external programs to provide a service, both programs must use a standard way to exchange messages and respond to requests. A Web server interface defines how a Web server communicates with external programs. Currently there are two well-defined Web server interfaces: • Common Gateway Interface (CGI) • Application Programming Interface (API) Web server interfaces can be used to extend the services of a web server and provide support for access to external databases, fax services, telephony services, directory services, etc. 12. Search the Internet for Web application servers. Choose one and prepare a short presentation for your class. You are encouraged to use any web search engine to list multiple vendors. Examples of such vendors are: Oracle Application Server, IBM WebSphere, Sun Java, Microsoft, JBOSS, etc. We encourage the student to visit the web pages of the products and compare features of at least two products. Some of the many other Web application servers, as of this writing, include Oracle Application Server by Oracle Corp., WebLogic by BEA Systems, NetDynamics by Sun Microsystems, NetObjects’ Fusion, Microsoft’s Visual Studio.NET, and WebObjects by Apple. 13. What does this statement mean: The Web is a stateless system? What implications does a stateless system have for database applications developers? Simply put, the label stateless system indicates that, at any given time, a Web server does not know the status of any of the clients communicating with it. That is, there is no open communications line between the server and each client accessing it -- that, of course, is impractical in a worldwide Web! Instead, client and server computers interact in very short “conversations” that follow the request-reply model. For example, the browser is only concerned with the current page, so there is no way for the second page to know what was done in the first page. The only time the client and server computers communicate is when the client requests a page—when the user clicks a link—and the server sends the requested page to the client. Once the client receives the page and its components, the client/server communication is ended. Therefore, although you may be browsing a page and think that the communication is open, you are actually just browsing the HTML document stored in the local cache (temporary directory) of the client browser. The server does not have any idea what the end user is doing with the document, what data is entered in a form, what option is selected, and so on. In the Web, if we want to act on a client’s selection, we need to jump to a new page (go back to the Web server), therefore losing track of whatever was done before! Not knowing what was done before, or what a client selected before it got to this page, makes adding business logic to the Web cumbersome. For example, suppose that you need to write a program that performs the following steps: display a data entry screen, capture data, validate data, and save data. This entire sequence can be completed in a single COBOL program because COBOL uses a working storage section that holds in memory all variables used in the program. Now imagine the same COBOL program -- but each section (PERFORM statement) is now a separate program! That is precisely how the Web works. In short, the Web’s stateless nature means that extensive processing required by a program’s execution cannot be done directly in a single Web page; the client browser’s processing ability is limited by the lack of processing ability and the lack of a working storage area to hold variables used by all pages in a Web site. The browser does not have computational abilities beyond formatting output text and accepting form field inputs. Even when the browser accepts form field data, there is no way to perform immediate data entry validation. Therefore, to perform such crucial processing in the client, the Web defers to other Web programming languages such as Java, JavaScript, and VBScript. 14. What is a Web application server, and how does it work from a database perspective? Web application server extends the functionality of a web server and provides features such as: • An integrated development environment with session management and support for persistent application variables. • Security and authentication of users through user IDs and passwords. • Computational languages to represent and store business logic in the application server. • Automatic generation of HTML pages integrated with Java, JavaScript, VBScript, ASP, and so on. • Performance and fault-tolerant features. • Database access with transaction management capabilities. • Access to multiple services, such as file transfers (FTP), database connectivity, electronic mail, and directory services. The web application server interfaces with the database connectivity standards to access databases using any of the supported API. So, a web page will be processed by the web application server, the application server will connect to the database using the ADO, OLE-DB or ODBC standard (or any other standard supported by the application server). 15. What are scripts, and what is their function? (Think in terms of database applications development!) A script is a series of instructions executed in interpreter mode. The script is a plain text file that is not compiled like COBOL, C++, or Java. Scripts are normally used in web application development environments. For instance, ColdFusion scripts contain the code that is required to connect, query, and update a database from a Web front end. 16. What is XML, and why is it important? Extensible Markup Language (XML) is a meta-language used to represent and manipulate data elements. XML is designed to facilitate the exchange of structured documents such as orders or invoices over the Internet. The World Wide Web Consortium (W3C) published the first XML 1.0 standard definition in 1998. This standard sets the stage for giving XML the real-world appeal of being a true vendor-independent platform. Therefore, it is not surprising that XML is rapidly becoming the data exchange standard for e-commerce applications. XML is important because it provides the semantics that facilitate the sharing, exchange, and manipulation of structured documents over organizational boundaries. 17. What are document type definition (DTD) documents and what do they do? Companies that exchange data using XML must have a way to understand and validate each other’s tags. One way to accomplish that task is through the use of Document Type Definitions. A Document Type Definition (DTD) is a file with a .dtd extension that describes XML elements—in effect, a DTD file provides the composition of the database’s logical model, and defines the syntax rules or valid tags for each type of XML document. (The DTD component is very similar to having a public data dictionary for business data.) 18. What are XML schema definition (XSD) documents and what do they do? An XML Schema Definition (XSD) document is an advanced data definition language that is used to describe the structure (elements, data types, relationship types, ranges, and default values) of XML data documents. Unlike a DTD document, which uses a unique syntax, an XML Schema Definition (XSD) file uses a syntax that resembles an XML document. One of the main advantages of an XML schema is that it more closely maps to database terminology and features. For example, an XML schema will be able to define common database types, such as date, integer or decimal, minimum and maximum values, list of valid values, and required elements. Using the XML schema, a company would be able to validate the data for values that may be out of range, incorrect dates, valid values, and so on. For example, a university application must be able to specify that a GPA value must be between zero and 4.0 and it must be able to detect an invalid birth date such as “14/13/1987.” (There is no 14th month.) Many vendors are rapidly adopting this new standard and are supplying tools to translate DTD documents into XML Schema Definition (XSD) documents. It is widely expected that XML schemas will replace DTD as the method to describe XML data. 19. What is a JDBC, and what is it used for? JDBC (Java Database Connectivity) is discussed in detail in Section 15-1e. Java is an object-oriented programming language developed by Sun Microsystems that runs on top of Web browser software. Java is one of the most common programming languages for Web development. Sun Microsystems created Java as a “write once, run anywhere” environment. That means that a programmer can write a Java application once and then without any modification, run the application in multiple environments (Microsoft Windows, Apple OS X, IBM AIX, etc.). The cross-platform capabilities of Java are based on its portable architecture. Java code is normally stored in pre-processed chunks known as applets that run on a virtual machine environment in the host operating system. This environment has well-defined boundaries and all interactivity with the host operating system is closely monitored. Sun provides Java runtime environments for most operating systems (from computers to hand-held devices to TV set-top boxes.) Another advantage of using Java is its “on-demand” architecture. When a Java application loads, it can dynamically download all its modules or required components via the Internet. When Java applications want to access data outside the Java runtime environment, they use pre-defined application programming interfaces. Java Database Connectivity (JDBC) is an application programming interface that allows a Java program to interact with a wide range of data sources (relational databases, tabular data sources, spreadsheets, and text files). JDBC allows a Java program to establish a connection with a data source, prepare and send the SQL code to the database server, and process the result set. One of the main advantages of JDBC is that it allows a company to leverage its existing investment in technology and personnel training. JDBC allows programmers to use their SQL skills to manipulate the data in the company’s databases. As a matter of fact, JDBC allows direct access to a database server or access via database middleware. Furthermore, JDBC provides a way to connect to databases through an ODBC driver. (Figure 15.7 in the text illustrates the basic JDBC architecture and the various database access styles.) The database access architecture in JDBC is very similar to the ODBC/OLE/ADO.NET architecture. All database access middleware shares similar components and functionality. One advantage of JDBC over other middleware is that it requires no configuration on the client side. The JDBC driver is automatically downloaded and installed as part of the Java applet download. Because Java is a Web-based technology, applications can connect to a database directly using a simple URL. Once the URL is invoked, the Java architecture comes into place, the necessary applets are downloaded to the client (including the JDBC database driver and all configuration information), and then the applets are executed securely in the client’s runtime environment. Every day, more and more companies are investing resources in developing and expanding their Web presence and finding ways to do more business on the Internet. Such business will generate increasing amounts of data that will be stored in databases. Java and the .NET framework are part of the trend toward increasing reliance on the Internet as a critical business resource. In fact, it has been said that the Internet will become the development platform of the future. In the next section you will learn more about Internet databases and how they are used. 20. What is cloud computing, and why is it a “game changer”? According to the National Institute of Standards and Technology (NIST), cloud computing is “a computing model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computer resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” The term cloud services is used in this book to refer to the services provided by cloud computing. Cloud services allow any organization to quickly and economically add information technology services such as applications, storage, servers, processing power, databases, and infrastructure to its IT portfolio (see Figure 15.21). Cloud computing is important for database technologies because it has the potential to become a “game changer.” Cloud computing eliminates financial and technological barriers so organizations can leverage database technologies in their business processes with minimal effort and cost. In fact, cloud services have the potential to turn basic IT services into “commodity” services such as electricity, gas, and water, and to enable a revolution that could change not only the way that companies do business, but the IT business itself. As Nicholas Carr put it so vividly: “Cloud computing is for IT what the invention of the power grid was for electricity.” For example, imagine that the chief technology officer of a nonprofit organization wants to add e-mail services to the IT portfolio. A few years ago, this proposition would have implied building the e-mail system’s infrastructure from the ground up, including hardware, software, setup, configuration, operation, and maintenance. However, in today’s cloud computing era, you can use Google Apps for Business or Microsoft Exchange Online and get a scalable, flexible, and more reliable e-mail solution for a fraction of the cost. The best part is that you do not have to worry about the daily chores of managing and maintaining the IT infrastructure, such as OS updates, patches, security, fault tolerance, and recovery. What used to take months or years to implement can now be done in a matter of minutes. 21. Name and contrast the types of cloud computing implementation. Section 15-4a describes this item in detail. There are basically three cloud computing implementation types (based on who the target customers are): • Public cloud. This type of cloud infrastructure is built by a third-party organization to sell cloud services to the general public. The public cloud is the most common type of cloud implementation; examples include Amazon Web Services (AWS), Google Application Engine, and Microsoft Azure. In this model, cloud consumers share resources with other consumers transparently. The public cloud infrastructure is managed exclusively by the third-party provider. • Private cloud. This type of internal cloud is built by an organization for the sole purpose of servicing its own needs. Private clouds are often used by large, geographically dispersed organizations to add agility and flexibility to internal IT services. The cloud infrastructure could be managed by internal IT staff or an external third party. • Community cloud. This type of cloud is built by and for a specific group of organizations that share a common trade, such as agencies of the federal government, the military, or higher education. The cloud infrastructure could be managed by internal IT staff or an external third party. 22. Name and describe the most prevalent characteristics of cloud computing services. Section 15-4b describes the basic characteristics of cloud computing services. In summary, the characteristics are: • Ubiquitous access via Internet. • Shared infrastructure. • Lower costs and variable pricing. • Flexible and scalable services. • Dynamic provisioning. • Service orientation. • Managed operations. 23. Using the Internet, search for providers of cloud services. Then, classify the types of services they provide (SaaS, PaaS, and IaaS). A starting point will be the examples shown in Figure 15.22. Further examples are: • DropBox.com – a cloud service storage provider (IaaS) • Carbonite.com – provide online backup of data (SaaS) • iCloud.com (Apple) – provides storage and synchronization of Apple device data (contacts, music, apps, photos, documents, backups (IaaS and SaaS) • GoodData.com – Business intelligence platform (PaaS) • Heroku.com – Ruby web programing environment service (PaaS) 24. Summarize the main advantages and disadvantages of cloud computing services. Table 15.4 summarizes the main advantages and disadvantages of cloud computing services. 25. Define SQL data services and list their advantages. SQL data services refer to Internet-based data management services that provide access to hosted relational data management using standard protocols and common programming interfaces. The advantages of SQL data services include: • High reliability and scalability of relational database capabilities at a low cost • High level of failure tolerance • Dynamic and automatic load balancing • Automated data backup and recovery • Dynamic creation and allocation of database processes and storage. Problem Solutions ONLINE CONTENT The databases used in the Problems for this chapter can be found at www.cengagebrain.com. PROBLEMS In the following exercises, you set up database connectivity using MS Excel. NOTE: Although the precise steps to setup data connectivity vary slightly according to the version of Excel and operating system platform you are using, in general terms, the steps are outlined as indicated in the sections below. 1. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC, and retrieve all of the AGENTs. To perform this task, complete the following steps: • From Excel, select Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source. • Select the MS Access Database* option and click OK. • Select the Database file location and click OK. • Select the table and columns to use in the query (select all columns) and click Next. • On the Query Wizard – Filter Data click Next. • On the Query Wizard – Sort Order click Next. • Select Return Data to Microsoft Office Excel. • Position the cursor where you want the data to be placed on your spreadsheet and click OK. The solution is shown in Figure P15.1. Figure P15.1 Solution to Problem 1 – Retrieve all AGENTs 2. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC, and retrieve all of the CUSTOMERs. To perform this task, complete the following steps: • From Excel, select Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source. • Select the MS Access Database* option and click OK. • Select the Database file location and click OK. • Select the table and columns to use in the query (select all columns) and click Next. • On the Query Wizard – Filter Data click Next. • On the Query Wizard – Sort Order click Next. • Select Return Data to Microsoft Office Excel. • Position the cursor where you want the data to be placed on your spreadsheet and click OK. The solution is shown in Figure P15.2. Figure P15.2 Solution to Problem 2 – Retrieve all CUSTOMERs 3. Use MS Excel to connect to the Ch02_InsureCo MS Access database using ODBC, and retrieve the customers whose AGENT_CODE is equal to 503. To perform this task, complete the following steps: • From Excel, select Data, From Other Sources, From Microsoft Query options to retrieve data from an ODBC data source. • Select the MS Access Database* option and click OK. • Select the Database file location and click OK. • Select the table and columns to use in the query (select all columns) and click Next. • On the Query Wizard – Filter Data, select the AGENT_CODE column, select “equals” from the left drop down box, then select “503” from the right drop down box, and then click Next. • On the Query Wizard – Sort Order click Next. • Select Return Data to Microsoft Office Excel. • Position the cursor where you want the data to be placed on your spreadsheet and click OK. The results are shown in Figure P15.3. Figure P15.3 Solution to Problem 3 – Retrieve all CUSTOMERs with AGENT_CODE=503 4. Create a System DSN ODBC connection called Ch02_SaleCo using the Administrative Tools section of the Windows Control Panel. To create the DSN, complete the following steps: • Using Windows XP, open the Control Panel, open Administrative Tools, open Data Sources (ODBC). • Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish. • On the ODBC Microsoft Access Setup window, enter the Ch02_SaleCo on the Data Source Name field. • Under Database, click on the Select button, browse to the location of the MS Access file and click OK twice. • The new system DSN now appears in the list of system data sources. The results are shown in Figure P15.4. Figure P15.4 Solution to Problem 4 – Create Ch02_SaleCo System DSN 5. Use MS Excel to list all of the invoice lines for Invoice 1003 using the Ch02_SaleCo System DSN. To perform this task, complete the following steps: • From Excel, select Data, Import External Data and New Database Query options to retrieve data from an ODBC data source. • Select the Ch02_SaleCo data source and click OK. • Select the LINE table, then select all columns, and click Next. • On the Query Wizard – Filter Data, select the INV_NUMBER column, select “equals” from the left drop down box, then select “1003” from the right drop down box, and then click Next. • On the Query Wizard – Sort Order click Next. • Select Return Data to Microsoft Office Excel. • Position the cursor where you want the data to be placed on your spreadsheet and click OK. The results are shown in Figure P15.5. Figure P15.5 Solution to Problem 5 – Retrieve all invoice LINEs with INV_NUMBER=1003 6. Create a System DSN ODBC connection called Ch02_TinyCollege using the Administrative Tools section of the Windows Control Panel. To perform this task, complete the following steps: • Using Windows XP, open the Control Panel, open Administrative Tools, open Data Sources (ODBC). • Click on the System DSN tab, click on Add, select the Microsoft Access Drive (*.mdb) driver and click on Finish. • On the ODBC Microsoft Access Setup window, enter the Ch02_TinyCollege on the Data Source Name field. • Under Database, click on the Select button, browse to the location of the MS Access file and click OK twice. • The new system DSN now appears in the list of system data sources. 7. Use MS Excel to list all classes taught in room KLR200 using the Ch02_TinyCollege System DSN. To perform this task, complete the following steps: • From Excel, select Data, Import External Data and New Database Query options to retrieve data from an ODBC data source. • Select the Ch02_TinyCollege data source and click OK. • Select the CLASS table, select all columns and click Next. • On the Query Wizard – Filter Data, select the CLASS_ROOM column, select “equals” from the left drop down box, then select “KLR200” from the right drop down box, and then click Next. • On the Query Wizard – Sort Order click Next. • Select Return Data to Microsoft Office Excel. • Position the cursor where you want the data to be placed on your spreadsheet and click OK. The results of these actions are shown in Figure P15.7. Figure P15.7 Solution to Problem 7 – Retrieve all classes taught in room KLR200 8. Create a sample XML document and DTD for the exchange of customer data. The solutions are shown in Figures P15.8a and P15.8b. Figure P15.8a Customer DTD Solution Figure P15.8b Customer XML Solution 9. Create a sample XML document and DTD for the exchange of product and pricing data. The solutions are shown in Figures P15.9a and P15.9b. Figure P15.9a Product DTD Solution Figure P15.9b Product XML Solution 10. Create a sample XML document and DTD for the exchange of order data. The solutions are shown in Figures P15.10a and P15.10b. Figure P15.10a Order DTD Solution Figure P15.10b Order XML Solution 11. Create a sample XML document and DTD for the exchange of student transcript data. Use your college transcript as a sample. The solution to Problem 11 will follow the same format as the previous solutions. However, because Problem 11 requires the students to do some research regarding the information that goes in the transcript data, we have not included a specific solution here. Encourage the student to use his/her creativity and analytical skills to research and create a simple XML file containing the data that is customary on your university. Not all fields in the Student transcript must be included in this exercise. Allow the students to represent just the most important fields. Chapter 16 Database Administration and Security Discussion Focus The following discussion sequence is designed to fit the chapter objectives: • Illustrate the value of data as a corporate asset to be managed. • Explain the data-information-decision cycle and demonstrate how this cycle may be supported through the use of a DBMS. • Emphasize the role of databases within an organization and relate this role to the data-information-decision cycle; then show how this role is essential at all managerial levels. • Discuss the evolution of the data administration (DA) function, starting with the DP department and ending with the MIS department. During this discussion, emphasize the change in managerial emphasis from an operational orientation to a more tactical and strategic orientation. Illustrate how a DBMS can foster a company's success; examples from companies involved in banking, air services, and financial services are particularly illustrative. • Show the different ways of positioning the DBA function within an organization; emphasize how such positioning is a function of the company's internal organization. • Contrast the DBA and DA functions. • Discuss the DBA's technical and managerial roles. • Explain the importance of data security and database security. • Show how data dictionaries and CASE tools fit into data administration. Answers to Review Questions Note: To ensure in-depth chapter coverage, most of the following questions cover the same material that we covered in detail in the text. Therefore, in most cases, we merely cite the specific section, rather than duplicate the text presentation. 1. Explain the difference between data and information. Give some examples of raw data and information. Given the importance of the distinction between data and information, we addressed the topic in several chapters. This question was first addressed in Chapter 1, “Database Concepts,” Section 1-1, “Data vs. Information.” Emphasize that one of the key purposes of having an information system is to facilitate the transformation of data into information. In turn, information becomes the basis for decision making. (See Figure 16.1, “The data-information-decision making cycle”) We revisit the data/information transformation in Chapter 13, “Business Intelligence and Data Warehouses,” Section 13-1, “The Need for Data Analysis.” Section 13-2, “Business Intelligence,” addresses the Decision Support System (DSS),” addresses the use of a comprehensive, cohesive, and integrated framework , which is designed to assist managerial decision making within an organization and which, therefore, includes an extensive data-to-information transformation component. Figures 13.1 (Business Intelligence Framework) and 13.2 (Business Intelligence Components) illustrate the BI's main components, so use these figures as the focus for discussion. Finally, review the operational data transformation to decision support data, using Figure 13.3, “Transforming Operational Data into Decision Support Data,” as the basis for discussion. Data are raw facts of interest to an end user. Examples of data include a person's date of birth, an employee name, the number of pencils in stock, etc. Data represent a static aspect of a real world object, event, or thing. Information is processed data. That is, information is the product of applying some analytical process to data. Typically, we represent the information generation process as shown in Figure Q16.1. Figure Q16.1 Transformation of Data Into Information For example, invoice data may include the invoice number, customer, items purchased, invoice total, etc. The end user can generate information by tabulating such data and computing totals by customer, cash purchase summaries, credit purchase summaries, a list of most frequently purchased items, etc. Since the data-information transformation is crucial, it is important to keep emphasizing that data stored in the database constitute the raw material for the creation of information. For example, data in a CUSTOMER table might be transformed to provide customer information about age distribution and gender as shown in Figure Q16.2: Figure Q16.2 Customer Information Summary Similarly, data in a CAR table might be transformed to provide information that relates displacement to horsepower as shown in Figure Q16.3: Figure Q16.3 Engine Horsepower vs. Displacement Data transformations into information can be accomplished in many ways, using simple tabulation, graphics, statistical modeling, etc. 2. Define dirty data and identify some of its sources. Dirty data is data that contains inaccuracies or inconsistencies (i.e. data that lacks integrity). Dirty data may result from a lack of enforcement of integrity constraints, typographical errors, the use of synonyms and homonyms across systems, the use of nonstandard abbreviations, or differences in the decomposition of composite attributes. 3. What is data quality, and why is it important? Data quality is a comprehensive approach to ensuring the accuracy, validity, and timeliness of the data. Data quality is important because without quality data, accurate and timely information cannot be produced. Without accurate and timely information, it is difficult (impossible?) to make good decisions; and without good decisions, organizations will fail in their competitive environments. 4. Explain the interactions among end user, data, information, and decision-making. Draw a diagram and explain the interactions. See Section 16-1. The interactions are illustrated in Figure 16.1. Emphasize the end user's role throughout the process. It is the end user who must analyze data to produce the information that is later used in decision making. Most business decisions create additional data that will be used to monitor and evaluate the company situation. Thus data will be, or should be, recycled in order to produce feedback concerning an action's effectiveness and efficiency. 5. Suppose that you are a DBA. What data dimensions would you describe to top-level managers to obtain their support for endorsing the data administration function? The first step will be to emphasize the importance of data as a company asset, to be managed as any other asset. Top-level managers must understand this crucial notion and must be willing to commit company resources to manage data as an organizational asset. The next step is to identify and define the need for and role of the DBMS in the organization. Refer the student to Section 16-2 and apply the concepts discussed in this section to a teacher-selected organization. Managers and end users must understand how the DBMS can enhance and support the work of the organization at all levels (top management, middle management, and operational.) Finally, the impact of a DBMS introduction into an organization must be illustrated and explained. Refer to Section 16-3 to accomplish this task. Note particularly the technical, managerial, and cultural aspects of the process. 6. How and why did database management systems become the organizational data management standard in organizations? Discuss some of the advantages of the database approach over the file-system approach. Contrast the file system's "single-ownership" approach with the DBMS's "shared-ownership." Make sure that students are made aware of the change in focus or function when the shift from file system to the DBMS occurs. In other words, show what happens when the data processing (DP) department becomes a management information systems (MIS) department. Using Section 16-3, discuss how the change from DP to MIS shifts data management from an operational level to a tactical or strategic level. 7. Using a single sentence, explain the role of databases in organizations. Then explain your answer. The single sentence will be: The database's predominant role is to support managerial decision making at all levels in the organization. Refer to section 16-2 for a complete explanation of the role(s) played by an organization's DBMS. 8. Define security and privacy. How are these two concepts related? Security means protecting the data against accidental or intentional use by unauthorized users. Privacy deals with the rights of people and organizations to determine who accesses the data and when, where, and how the data are to be used. The two concepts are closely related. In a shared system, individual users must ensure that the data are protected from unauthorized use by other individuals. Also, the individual user must have the right to determine who, when, where, and how other users use the data. The DBMS must provide the tools to allow such flexible management of the data security and access rights in a company database. 9. Describe and contrast the information needs at the strategic, tactical, and operational levels in an organization. Use examples to explain your answer. See Section 16-2 to contrast the different DBMS roles at each managerial level. Use Figures 16.3-16.5 as the basis for your discussions. 10. What special considerations must you take into account when introducing a DBMS into an organization? See Section 16-3. We suggest that you start a discussion about the special considerations (managerial, technical, and cultural) to be taken into account when a new DBMS is to be introduced in an organization. For example, focus the discussion on such questions as: • What about retraining requirements for the new system?  Who needs to be retrained?  What must be the type and extent of the retraining? • Is it reasonable to expect some resistance to change  from the computer services department administrator(s)?  from secretaries?  from technical support personnel?  from other departmental end users? • How will the resistance in the preceding question be manifested? • How will you deal with such resistance? 11. Describe the DBA's responsibilities. The database administrator (DBA) is the person responsible for the control and management of the shared database within an organization. The DBA controls the database administration function within the organization. The DBA is responsible for managing the overall corporate data resource, both computerized and non-computerized. Therefore, the DA is given a higher degree of responsibility and authority than the DBA. Depending on organizational style, the DBA and DA roles may overlap and may even be combined in a single position or person. The DBA position requires both managerial and technical skills. Refer to section 16-5 and Table 16.1 to explain and illustrate the general responsibilities of the DA and DBA functions. 12. How can the DBA function be placed within the organization chart? What effect(s) will such placement have on the DBA function? The DBA function placement varies from company to company and may be either a staff or line position. In a staff position, the DBA function creates a consulting environment in which the DBA is able to devise the overall data administration strategy but does not have the authority to enforce it. In a line position, the DBA function has both the responsibility and the authority to plan, define, implement, and enforce the policies, standards and procedures. 13. Why and how are new technological advances in computers and databases changing the DBA's role? See Section 16-5, particularly Section 16-5b, "The DBA's Technical Role." Then tie this discussion to the increasing use of web applications. The DBA function is probably one of the most dynamic functions of any organization. New technological developments constantly change the DBA function. For example, note how each of the following has an effect on the DBA function: • the development of the DDBMS • the development of the OODBMS • the increasing use of LANs • the rapid integration of Intranet and Extranet applications and their effects on the database design, implementation, and management. (Security issues become especially important!) 14. Explain the DBA department's internal organization, based on the DBLC approach. See Section 16-4, especially Figures 16.4 and 16.5. 15. Explain and contrast the differences and similarities between the DBA and DA. See Section 16-5, especially Table 16.1. 16. Explain how the DBA plays an arbitration role for an organization's two main assets. Draw a diagram to facilitate your explanation. See Section 16-5, especially Figure 16.6. 17. Describe and characterize the skills desired for a DBA. See Section 16-5, especially Table 16.2. 18. What are the DBA's managerial roles? Describe the managerial activities and services provided by the DBA. See Section 16-5a, especially Table 16.3. 19. What DBA activities are used to support end users? See Section 16-5a. 20. Explain the DBA's managerial role in the definition and enforcement of policies, procedures, and standards. See Section 16-5a. 21. Protecting data security, privacy, and integrity are important database functions. What activities are required in the DBA's managerial role of enforcing these functions? See Section 16-5a. 22. Discuss the importance and characteristics of database data backup and recovery procedures. Then describe the actions that must be detailed in backup and recovery plans. See Section 16-5a. 23. Assume that your company assigned you the responsibility of selecting the corporate DBMS. Develop a checklist for the technical and other aspects involved in the selection process. See Section 16-5b. The checklist is shown in the "DBMS and Utilities Evaluation, Selection, and Installation" segment. 24. Describe the activities that are typically associated with the design and implementation services of the DBA technical function. What technical skills are desirable in the DBA's personnel? See Section 16-5b. 25. Why are testing and evaluation of the database and applications not done by the same people who are responsible for the design and implementation? What minimum standards must be met during the testing and evaluation process? See Section 16-5b. Note particularly the material in the "Testing and Evaluation of databases and Applications" segment. 26. Identify some bottlenecks in DBMS performance, and then propose some solutions used in DBMS performance tuning. See Section 16-5b. Also see Chapter 11, “Database Performance Tuning and Query Optimization.” 27. What are the typical activities involved in the maintenance of the DBMS and its utilities and applications? Would you consider application performance tuning to be part of the maintenance activities? Explain your answer. See Section 16-5b. Database performance tuning is part of the maintenance activities. As the database system enters into operation, the database starts to grow. Resources initially assigned to the application are sufficient for the initial loading of the database. As the system grows, the database becomes bigger, and the DBMS requires additional resources to satisfy the demands on the larger database. Database performance will decrease as the database grows and more users access it. 28. How do you normally define security? How is your definition of security similar to or different from the definition of database security in this chapter? See Section 16-6. The levels are highly restricted, confidential, and unrestricted. 29. What are the levels of data confidentiality? See Section 16-6. 30. What are security vulnerabilities? What is a security threat? Give some examples of security vulnerabilities that exist in different IS components. See Section 16-6b. 31. Define the concept of a data dictionary, and discuss the different types of data dictionaries. If you were to manage an organization's entire data set, what characteristics would you look for in the data dictionary? See Section 16-7a. 32. Using SQL statements, give some examples of how you would use the data dictionary to monitor the security of the database. NOTE If you use IBM's DB2, the names of the main tables are SYSTABLES, SYSCOLUMNS, and SYSTABAUTH. See Section 16-7a. 33. What characteristics do a CASE tool and a DBMS have in common? How can these characteristics be used to enhance the data administration? See Section 16-7b. 34. Briefly explain the concepts of information engineering (IE) and information systems architecture (ISA). How do these concepts affect the data administration strategy? See Section 16-8. 35. Identify and explain some of the critical success factors in the development and implementation of a good data administration strategy. See Section 16-8. 36. How have cloud-based data services affected the DBA’s role.? The answer to this question is explained in detail in section 16-9. 37. What is the tool used by Oracle to create users? See Section 16-10d. Note the Oracle Security Manager screen in Figure 16.13 and the Create user Dialog Box in Figure 16.14. 38. In Oracle, what is a tablespace? See Section 16-10c. The following summary is useful: • A tablespace is a logical storage space. • Tablespaces are primarily used to logically group related data. • Tablespace data are physically stored in one or more datafiles. 39. In Oracle, what is a database role? See Section 16-10d. A database role is a named collection of database access privileges that authorize a user to perform specified actions on the database. Examples of roles are CONNECT, RESOURCE, and DBA. 40. In Oracle, what is a datafile? How does it differ from a file systems file? See Section 16-10c. The following summary will be useful: • A database is composed of one or more tablespaces. Therefore, there is a 1:M relationship between the database and its tablespaces. • Tablespace data are physically stored in one or more datafiles. Therefore, there is a 1:M relationship between tablespaces and datafiles. • A datafile physically stores the database data. • Each datafile is associated with one and only one tablespace. (But each datafile can reside in a different directory on the same hard disk -- or even on different disks.) In contrast to the datafile, a file system's file is created to store data about a single entity, and the programmer can directly access the file. But file access requires the end user to know the structure of the data that are stored in the file. While a database is stored as a file, this file is created by the DBMS, rather than by the end user. Because the DBMS handles all file operations, the end user does not know -- nor does that end user need to know -- the database's file structure. When the DBA creates a database -- or, more accurately, uses the Oracle Storage Manager to let Oracle create a database -- Oracle automatically creates the necessary tablespaces and datafiles. We have summarized the basic database components logically in Figure Q16.39. Figure Q16.39 The Logical Tablespace and Datafile Components of an Oracle Database 41. In Oracle, what is a database profile? See Section 16-10d. A profile is a named collection of database settings that control how much of the database resource can be used by a given user. Solution Manual for Database Systems: Design, Implementation, and Management Carlos Coronel, Steven Morris 9781337627900, 9781305627482

Students Benifited: 78

CH-14-16 Big Data Analytics and NoSQL – Database Systems: Design, Implementation, and Management : Book Guide

Preview (12 of 38 pages)

Document Details

Related Documents

CH-12-13 Distributed Database Management Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-3 The Relational Database Model – Database Systems: Design, Implementation, and Management : Book Guide

CH-10-11 Transaction Management and Concurrency Control – Database Systems: Design, Implementation, and Management : Book Guide

CH-1-2 Database Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-7-9 Introduction to Structured Query Language (SQL) – Database Systems: Design, Implementation, and Management : Book Guide

App-I Databases in Electronic Commerce – Database Systems: Design, Implementation, and Management : Book Guide

CH-6 Normalization of Database Tables – Database Systems: Design, Implementation, and Management : Book Guide

App-F-G Client/Server Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-5 Advanced Data Modeling – Database Systems: Design, Implementation, and Management : Book Guide

App-C The University Lab: Conceptual Design, Verification, Logical Design, and Implementation – Database Systems: Design, Implementation, and Management : Book Guide

CH-4 Entity Relationship (ER) Modeling – Database Systems: Design, Implementation, and Management : Book Guide

App-B The University Lab: Conceptual Design – Database Systems: Design, Implementation, and Management : Book Guide

Resources

Pages

Help

Legal

CH-14-16 Big Data Analytics and NoSQL – Database Systems: Design, Implementation, and Management : Book Guide

Preview (12 of 38 pages)

Document Details

Related Documents

CH-12-13 Distributed Database Management Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-3 The Relational Database Model – Database Systems: Design, Implementation, and Management : Book Guide

CH-10-11 Transaction Management and Concurrency Control – Database Systems: Design, Implementation, and Management : Book Guide

CH-1-2 Database Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-7-9 Introduction to Structured Query Language (SQL) – Database Systems: Design, Implementation, and Management : Book Guide

App-I Databases in Electronic Commerce – Database Systems: Design, Implementation, and Management : Book Guide

CH-6 Normalization of Database Tables – Database Systems: Design, Implementation, and Management : Book Guide

App-F-G Client/Server Systems – Database Systems: Design, Implementation, and Management : Book Guide

CH-5 Advanced Data Modeling – Database Systems: Design, Implementation, and Management : Book Guide

App-C The University Lab: Conceptual Design, Verification, Logical Design, and Implementation – Database Systems: Design, Implementation, and Management : Book Guide

CH-4 Entity Relationship (ER) Modeling – Database Systems: Design, Implementation, and Management : Book Guide

App-B The University Lab: Conceptual Design – Database Systems: Design, Implementation, and Management : Book Guide

Send listing report

Resources

Pages

Help

Legal

Send Message

My favorites

Application Form