Hosting database on Kubernetes

Background “We want to host Postgres database on Kubernetes. Can you help us?”. The client appears assertive and reluctant to resort to managed services. So I did some homework and went through this tutorial. My thought: it’s doable, but don’t do it unless operating database as a service is your main business. I believed that … Read moreHosting database on Kubernetes

Census Data from Statistics Canada

Statistics Canada carries census every 5 years, with 2016 being the last run. The census data by Statistics Canada provides a wealth of insights but are published in raw format. Post-processing work is needed to extrapolate information, such as median income of a neighbourhood, age distribution of a city, etc. For someone like myself without … Read moreCensus Data from Statistics Canada

How imaging devices talk to each other (in DICOM)

Overview In the previous post I briefly touched on DICOM as the crucial standard in medical imaging for both data exchanging and data storage. It is important to understand that DICOM is such a massive standard that, beyond data exchanging and storage, has expanded into many different areas around imaging, that no device (or information … Read moreHow imaging devices talk to each other (in DICOM)

Spark, Cassandra and Python

In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and that its stunning performance thanks much to resilient distributed dataset (RDD) as its architectural foundation. In this hands-on guide, we expand on how to configure Spark, and use Python to … Read moreSpark, Cassandra and Python

DataStax Python Driver

For someone with relational database background, analyzing data in Cassandra isn’t intuitive. There are two reasons. First, Cassandra data table is hardly updated or deleted in avoidance of tombstones. Insertion is the only action on the table resulting in multiple versions of each record all stored in the same table, thus a much longer table … Read moreDataStax Python Driver

Cassandra data model (as opposed to relational model)

Bad data model design with Cassandra causes chronic pains as application scales. I had to re-read about data model design in “Cassandra – the Definitive Guide” and keep my notes and thoughts in this post. The data modelling in the relational world is indoctrinated to every students out of university. It embraces several things: Anybody … Read moreCassandra data model (as opposed to relational model)