Readings in Database Systems, 3rd Edition is the most up-to-date compilation of papers to explore DBMS applications which were first published in the now classic "Red Book" in 1988. Dr. Stonebraker and Dr. Hellerstein have selected a spectrum of papers on the roots of the field, which include classic papers from the '70's on the relational model to timely discourses on future directions. This new streamlined edition includes 46 papers that cover much of the significant research and development in the database field, organized by area of technology. Expert introductory analysis of each section topic of the book is provided by leaders of the DBMS field along with a discussion of each reading.
From the Preface : "The main purpose of this collection is to present a technical context for research contributions and to make them accessible to anyone who is interested in database research. This book is intended as an introduction for students and professionals wanting an overview of the field. It is also designed to be a reference volume for anyone already active in database systems. This set of readings represents what we perceive to be the most important issues in the database the core material for any DBMS professional to study."
* Third edition is completely revised and streamlined to include the most significant new and classic papers along with introductory materials * Coverage spans the entire field of database, including relational implementation, transaction management, distributed database, parallel database, objects and databases, data analysis, and benchmarking * Offers a new section on objects and databases including selections on object oriented databases as well as Object-Relational databases * Lecture notes available on Morgan Kaufmann Web Site updated by the authors to include each paper * The definitive book on DBMS applications
Perfect. I wish seeing more frequent updates and more extended commentary.
* JSON is good for sparse data but not perfect for general hierarchical data so RDBMS will subsume it as a data type. * SQL could be cleaned but there was no time for it so COBOL for 2020. SQL won against natural language. * ODBC isn't the best interface to embed into programmign languages and to run queries: open db, open cursor, bind query, run fetches, etc. Looking at Linq. * PostgreSQL open source educated many that are influential in new systems. * Query planning (best-effort): cost estimation (catalog), equivalence, cost-based search (DP). Concurrency control: serializable but generally more than enough so finer grained locks (default today). There is no best, totally depends on workload. * Any performance test without crossover point is uninteresting (at worst trade-off for unlimited resources assumption). Resolving conflicts via blocking might make more sense since every system is limited by nature. * Durability: ARIES (no force to write dirty pages at commit time, can flush dirty pages at any time) * Distribution: brings its own set of problems. 2PC (atomic commit): presumed commit or abort. Consensus(Paxos, Raft, etc.) is generally used for replication where master executes transactions by itself and elect master on failure. * New architectures: column-store, main-memory systems (w/ concurrency control and recovery), semi-structured data (JSON) and dataflow (Hadoop, Spark, Naiad, etc.) * Dataflow: started with map-reduce (only 2 stage). Nowadays, higher level query language(SQL), general graph (not only 2 stage) and indexing (leverage of structured parts) are supported (Spark, Flink, etc.). Influential points are schema, interface and architecture flexibility. * Non-serializable isolation is active by default and solutions for it are difficult to use. Clear research interest of weak isolation is to find simpler ways to maintain semantics and to keep programmability easeness as in serializability. * Rethinking query optimizer due to streaming, errors in estimation, data outside of RDBMS, user-defined aggregates, etc. Extract optimizer from execution and then plan generates a data flow which is executed by execution later. There are two optimizations inter operator (due to blocking in the nature of operator such as hash join) and intra operator (feedback from execution to self adapt plan). * OLAP: Sample (online or materialized - BlinkDB; countmin, hyperloglog, bloom filters, etc.), precomputation (all or critical subset since it's lattice, others can be generated) and online aggregation (feedback to user and to stop when satisfied).
A relatively quick read of a collection of commentaries about foundational and state of the art papers on database design and problems. A mostly easy read for application programmers like my self. Those interested in low level details can optionally go to the mentioned sources.
Un libro que, creo, se sigue usando en el MIT (desde 1988) para la lectura sobre cuestiones relacionadas a las Bases de Datos.
Para comprender sobre la arquitectura basica de las bases de datos me sirvio. Especialmente las unidades 1: Data Models and DBMS Architecture, 2: Query Processing 4: Transaction Management.
La parte de Data Warehousing no fue tan interesante como me parecia. Se le puede sacar mayor provecho si se lo complementa con las clases de por ejemplo:
A fast and eye-opening read. It's a brief booklet with around 50 pages. About 11 commentary articles about selected important database papers on major database cutting edge areas. It is for database professionals not for newbie like me who has very limited knowledge about database except the few ones everyone uses. So I don't understand most of it. 🤨 But it did add lots of new terms to me conceptually.
The standard text for "introduction to database systems and theory" classes in an ICS environment. Even though it presumes a lot more technical background than many IS/LIS students have, that too is useful as it provides a great primary text for syntagmatic analysis. "What Goes Around Comes Around" and "Anatomy of a Database System" should be required reading in any Information Studies course focused on databases.