Intro to Data Analytics Platform on Azure

Having been in transactional data world for almost the entire career, recently I have to pick up quite a few things to catch up on the analytical workload. The main purpose of data analytics project is to build analysis services models and manage deployed databases. Later in this post I’ll discuss some useful Azure resources for … Read moreIntro to Data Analytics Platform on Azure

Census Data from Statistics Canada

Statistics Canada carries census every 5 years, with 2016 being the last run. The census data by Statistics Canada provides a wealth of insights but are published in raw format. Post-processing work is needed to extrapolate information, such as median income of a neighbourhood, age distribution of a city, etc. For someone like myself without … Read moreCensus Data from Statistics Canada

Spark, Cassandra and Python

In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and that its stunning performance thanks much to resilient distributed dataset (RDD) as its architectural foundation. In this hands-on guide, we expand on how to configure Spark, and use Python to … Read moreSpark, Cassandra and Python

DataStax Python Driver

For someone with relational database background, analyzing data in Cassandra isn’t intuitive. There are two reasons. First, Cassandra data table is hardly updated or deleted in avoidance of tombstones. Insertion is the only action on the table resulting in multiple versions of each record all stored in the same table, thus a much longer table … Read moreDataStax Python Driver