Huwebes, Agosto 4, 2011

Quiz 8 part 1

1. Define the term, database, and explain how a database interacts with data
and information.

database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality (for example, the availability of rooms in hotels), in a way that supports processes requiring this information (for example, finding a hotel with vacancies). The term "database" refers both to the way its users view it, and to the logical and physical materialization of its data, content, in files, computer memory, and computer data storage. This definition is very general, and is independent of the technology used. However, not every collection of data is a database; the term database implies that the data is managed to some level of quality (measured in terms of accuracy, availability, usability, and resilience) and this in turn often implies the use of a general-purpose Database management system (DBMS). A general-purpose DBMS is typically a complex software system that meets many usage requirements, and the databases that it maintains are often large and complex.
The term database is correctly applied to the data and data structures, and not to the DBMS which is a software system used to manage the data. The structure of a database is generally too complex to be handled without its DBMS, and any attempt to do otherwise is very likely to result in database corruption. DBMSs are packaged as computer software products: well-known and highly utilized products include the Oracle DBMS, Access and SQL Server from Microsoft, DB2 from IBM and the Open source DBMS MySQL. Each such DBMS product currently supports many thousands of databases all over the world. The stored data in a database is not generally portable across different DBMS, but can inter-operate to some degree (while each DBMS type controls a database of its own database type) using standards likeSQL and ODBC. A successful general-purpose DBMS is designed in such a way that it can satisfy as many different applications and application designers as possible. A DBMS also needs to provide effective run-time execution to properly support (e.g., in terms ofperformance, availability, and security) as many end-users (the database's application users) as needed. Sometimes the combination of a database and its respective DBMS is referred to as a Database system (DBS).


Database integrity ensures that data entered into the database is accurate, valid, and consistent. Any applicable integrity constraints anddata validation rules must be satisfied before permitting a change to the database.
Three basic types of database integrity constraints are:
  • Entity integrity, not allowing multiple rows to have the same identity within a table.
  • Domain integrity, restricting data to predefined data types, e.g.: dates.
  • Referential integrity, requiring the existence of a related row in another table, e.g. a customer for a given customer ID.


2. Describe file maintenance techniques (adding records, modifying records,deleting records) and validation techniques.

Database storage

Database storage is the container of the physical materialization of a database. It comprises the Internal (physical) level in the Database architecture above. It also contains all the information needed (e.g., metadata, "data about the data", and internal data structures) to reconstruct the Conceptual level and External level from the Internal level when needed. Though typically accessed by a DBMS through the underlying Operating system (and often utilizing the operating systems' File systems as intermediates for storage layout), storage properties and configuration setting are extremely important for the efficient operation of the DBMS, and thus are closely maintained by database administrators. A DBMS, while in operation, always has its database residing in several types of storage (e.g., Computer memory and external Computer data storage), as dictated by contemporary computer technology. The database data and the additional needed information, possibly in very large amounts, are coded into bits. Data typically reside in the storage in structures that look completely different from the way the data look in the conceptual and external levels, but in ways that attempt to optimize (the best possible) these levels' reconstruction when needed by users and programs, as well as for computing additional types of needed information from the data (e.g., when querying the database).
In principle the database storage (as computer data storage in general) can be viewed as a linear address space (a tree-like is a more accurate description), where every bit of data has its unique address in this address space. Practically only a very small percentage of addresses is kept as initial reference points (which also requires storage), and most of the database data is accessed by indirection using displacement calculations (distance in bits from the reference points) and data structures (see below) which define access paths (using pointers) to all needed data in effective manner, optimized for the needed data access operations.


The data

  • Data is encoded by assigning a bit pattern to each language alphabet character, digit, other numerical patterns, and multimedia object. Many standards exist for encoding (e.g., ASCII, JPEG, MPEG-4).
  • By adding bits to each encoded unit, the redundancy allows both to detect errors in coded data and to correct them based on mathematical algorithms. Errors occur regularly in low probabilities due to random bit value flipping, or "physical bit fatigue," loss of the physical bit in storage its ability to maintain distinguishable value (0 or 1), or due to errors in inter or intra-computer communication. A group of malfunctioning physical bits (not always the specific defective bit is known; group definition depends on specific storage device) is typically automatically fenced-out, taken out of use by the device, and replaced with another functioning equivalent group in the device, where the corrected bit values are restored (if possible; typically using the Cyclic redundancy check (CRC) method).

Data compression
Data compression methods allow in many cases to represent a string of bits by a shorter bit string ("compress") and reconstruct the original string ("decompress") when needed. This allows to utilize substantially less storage (tens of percents) for many types of data at the cost of more computation (compress and decompress when needed). Analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data in a database compressed or not.
Data compression is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.

Data encryption
For security reasons certain types of data (e.g., credit-card information) may be kept encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots (taken either via unforeseen vulnerabilities in a DBMS, or more likely, by bypassing it).
Data encryption is typically controlled through the DBMS's data definition interface, but in some cases may be a default and automatic.


Data storage types

As it is common in current computer technology the database consists of bits (a binary-digit which has two states: either 0 or 1). This collection of bits describes both the contained database data and its related metadata (i.e., data that describes the contained data and allows computer programs to manipulate the database data correctly). The size of a database can nowadays be tens of Terabytes, where a byte is eight bits. The physical materialization of a bit can employ various existing technologies, while new and improved technologies are constantly under development. Common examples are:
  • Magnetic medium (e.g., in Magnetic disk) - Orientation of magnetic regions on a surface of material (two directions, for 0 and 1).
  • Dynamic random-access memory (DRAM) - State of a miniature electronic circuit consists of few transistors (two states for 0 and 1).
These two examples are respectively for two major storage types:
  • Nonvolatile storage can maintain its bit states (0s and 1s) without electrical power supply, or when power supply is interrupted;
  • Volatile storage loses its bit values when power supply is interrupted (i.e., its content is erased).
Sophisticated storage units, which can, in fact, be effective dedicated parallel computers that support a large amount of nonvolatile storage, typically must include also components with volatile storage. Some such units employ batteries that can provide power for several hours in case of external power interruption (e.g., see the EMC Symmetrix) and thus maintain the content of the volatile storage parts intact. Just before such a device's batteries lose their power the device typically automatically backs-up its volatile content portion (into nonvolatile) and shuts off to protect its data.
Databases are usually too expensive (in terms of importance and needed investment in resources, e.g., time, money, to build them) to be lost by a power interruption. Thus at any point in time most of their content resides in nonvolatile storage. Even if for operational reason very large portions of them reside in volatile storage (e.g., tens of Gigabytes in computer volatile memory, for in-memory databases), most of this is backed-up in nonvolatile storage. A relatively small portion of this, which temporarily may not have nonvolatile backup, can be reconstructed by proper automatic database recovery procedures after volatile storage content loss.
More examples of storage types:
  • Volatile storage can be found in processors, computer memory (e.g., DRAM), etc.
  • Non-volatile storage types include ROM, EPROM, Hard disk drives, Flash memory and drives, Storage arrays, etc.

3. Discuss the terms character, field, record, and file

In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language.
Examples of characters include letters, numerical digits, and common punctuation marks (such as '.' or '-'). The concept also includes control characters, which do not correspond to symbols in a particular natural language, but rather to other bits of information used to process text in one or more languages. Examples of control characters include carriage return or tab, as well as instructions to printers or other devices that display or otherwise process text.
Characters are typically combined into strings.

In computer science, data that has several parts can be divided into fields. Relational databases arrange data as sets of database records, also called rows. Each record consists of several fields; the fields of all records form the columns.
In object-oriented programming, field (also called data member or member variable) is the data encapsulated within a class or object. In the case of a regular field (also called instance variable), for each instance of the object there is an instance variable: for example, an Employee class has a Name field and there is one distinct name per employee. A static field (also called class variable) is one variable, which is shared by all instances.

In computer science, records (also called tuplesstructs, or compound data) are among the simplest data structures. A record is a value that contains other values, typically in fixed number and sequence and typically indexed by names. The elements of records are usually called fields or members.
For example, a date may be stored as a record containing a numeric year field, a month field represented as a string, and a numeric day-of-month field. As another example, a Personnel record might contain a name, a salary, and a rank. As yet another example, a Circle might contain a center and a radius. In this instance, the center itself might be represented as a Point record containing x and y coordinates.
Records are distinguished from arrays by the fact that their number of fields is typically fixed, each field has a name, and that each field may have a different type.

 file is a block of arbitrary information, or resource for storing information, which is available to a computer program and is usually based on some kind of durable storage. A file is durable in the sense that it remains available for programs to use after the current program has finished. Computer files can be considered as the modern counterpart of paper documents which traditionally are kept in offices' and libraries' files, and this is the source of the term.

Walang komento:

Mag-post ng isang Komento