Tag: hadoop

  • Intro to Data Analytics Platform on Azure

    Having been in transactional data world for almost the entire career, recently I have to pick up quite a few things to catch up on…

    Read
  • High Performance Computing

    Overview High Performance Computing (HPC) has recently been commoditized with the advent of commodity server hardware (x86 server), virtualization technology and cloud delivery model. It…

    Read
  • Spark, Cassandra and Python

    In this post we touch briefly on Apache Spark as a cluster computing framework that supports a number of drivers to pipe data in, and…

    Read
  • Intro to Big Data Projects

    Modern applications produce super large datasets beyond what traditional data-processing application can handle. Big data is a discipline that specialize in processing such data. For…

    Read
  • Zookeeper Summary

    Distributed systems Distributed system involves independent computing entities linked together by network. The components communicate and coordinate with each other to achieve a common goal.…

    Read
  • Kafka high-level Overview

    Zookeeper General definition of distributed system: a software system that is composed of independent computing entities linked together by a computer network whose components communicate…

    Read