Database - Part 4: NoSQL

Overview

In contrast to the ACID model of RDBMS mentioned earlier, NoSQL’s consistency model is expressed as BASE.

ACID in Transactions

  • Atomic
    • Operations are not interrupted while being executed. If interrupted, they must return to the previous state.
  • Consistency
    • All data must remain consistent whenever a transaction succeeds. If integrity constraints are in place, transactions that violate those constraints are canceled.
  • Isolation
    • Other transactions cannot intervene in the middle of a transaction.
  • Durability
    • Successful transactions must be permanently reflected, even if system problems occur.

BASE

  • BA ( Basic Availability )
    • The database works most of the time.
  • Soft-state
    • Stores do not have to be write-consistent, nor do different replicas have to be mutually consistent all the time.
  • Eventual consistency
    • The store becomes consistent eventually.

Keeping the above characteristics in mind, let’s talk about the NoSQL databases below.

NoSQL Databases

Redis ( Remote Dictionary Server )

  • Open source
  • In-memory data structure store
  • NoSQL/Cache Solution (Also used as a database)
  • Support snapshotting ( RDB ) / AOF (Append on file) backup
    • RDB : Snapshot whole redis data. ( SAVE, BGSAVE )
      • Good : Fast. Restarts are fast because it loads the snapshot directly.
      • Bad : Loss. Data after the snapshot point may be lost.
    • AOF : log all of write/update operations
      • Good : Lossless. It records operations right up until the machine goes down, so there is no data loss.
      • Bad : Slow. Because it records every write/update operation, it requires more space than the RDB type, and it is slow upon restart because all recorded operations must be replayed.
    • Hybrid (Recommend) Mixing the two is recommended.
      • RDB + AOF : ~ Snapshot (+ AOF from after the snapshot)
  • Pub/Sub model
    • Supports both 1:1 queue and 1:N messaging forms.
    • Multiple messages can be received for a single topic.
      • ex) music.jazz, music.classic > music topic -> jazz,classic

memcached ( not NoSQL )

  • Open source
  • Distributed memory caching system
  • (Note) If there is no storage space in memory, Memcached uses the LRU Algorithm to delete existing data and reuse memory.

cassandra

Apache Cassandra is a massively scalable distributed NoSQL DB that started inside Facebook and was released as open source. It features the use of a P2P protocol (Gossip) to exchange state messages with up to 3 nodes in the cluster every second. Using this, all nodes quickly learn about other nodes in the cluster. It is important to note that when using multiple data center clusters, it is recommended to designate two or more seed nodes per data center for fault tolerance.

  • Decentralization : A single Cassandra cluster can be operated even between physically separated data centers.
  • Decentralized : All nodes are exactly the same.
  • Scalability : Can be expanded and contracted without cluster downtime.
  • High Performance : Designed to fully utilize multiprocessor/multicore machines and run across hundreds of machines installed in multiple data centers.
  • Row Oriented
    • Cassandra is not a relational structure but represents structure as a sparse multi-dimensional hash table.
    • “Sparse” means that while a row can have one or more columns, each row does not need to have all the same columns as other rows.
  • Companies using it
    • twitter : Used for analytics
    • Facebook : Used for inbox search
    • Reddit : Used as a persistent cache
    • Ooyala : Used for near real-time video analytics data service and storage
  • Disadvantages
    • Does not support Joins or Transactions
    • It is difficult to implement Paging like in RDBMS, and Memory Overflow may occur if Keyspaces or Tables are created excessively.

Additionally, Cassandra cluster setup and configuration is much easier than HBase cluster configuration. Cassandra generally shows more than 5x better performance in writes and more than 4x better performance in reads.

Thoughts

Most NoSQL is designed with large-scale processing in mind. Perhaps because of this, they usually do not perfectly support the Consistency guaranteed by RDBMS. It seems that trade-off settings are made for this.

Appendix