Spark, Cassandra and Python

In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and that its stunning performance thanks much to resilient distributed dataset (RDD) as its architectural foundation. In this hands-on guide, we expand on how to configure Spark, and use Python to … Read moreSpark, Cassandra and Python

Intro to Big Data Projects

Modern applications produce super large datasets beyond what traditional data-processing application can handle. Big data is a discipline that specialize in processing such data. For example, analysis, information extraction etc. The scale of large dataset grows well beyond the capacity of a single computer, which calls for computing power delivered by multi-node clustered systems. Intensive … Read moreIntro to Big Data Projects

DataStax Python Driver

For someone with relational database background, analyzing data in Cassandra isn’t intuitive. There are two reasons. First, Cassandra data table is hardly updated or deleted in avoidance of tombstones. Insertion is the only action on the table resulting in multiple versions of each record all stored in the same table, thus a much longer table … Read moreDataStax Python Driver

Cassandra data model (as opposed to relational model)

Bad data model design with Cassandra causes chronic pains as application scales. I had to re-read about data model design in “Cassandra – the Definitive Guide” and keep my notes and thoughts in this post. The data modelling in the relational world is indoctrinated to every students out of university. It embraces several things: Entity-Relation: … Read moreCassandra data model (as opposed to relational model)

Cassandra Architecture Summary

Disclaimer: many contents here are from Cassandra The Definitive Guide Gossip and Failure Detection Cassandra uses a gossip protocol that allows each node to keep track of state information about the other nodes in the cluster. The gossiper runs every second on a timer. Gossip protocols assumes a faulty network, are commonly commonly employed in … Read moreCassandra Architecture Summary