[Update] Some security improvement was introduced in may 2021. Here‘s detail.
In this project we introduce a medical imaging web service based on Orthanc, an open-source project of DICOM server, and a pipeline to deploy such server automatically and consistently. This little project involves a number of technical deets in DevOps, to deliver a web application prototype with an automated deployment pipeline.
A brief on imaging
In medical imaging, scanning devices are the data collectors. It consists of various categories of scanners, such as Computed Tomography (CT), and Ultrasound (US). They are collectively referred to as modality, but vary significantly in terms of image generation and hardware manufacturing. The challenges to exchange data between these heterogeneous scanning devices and centralized computers came around as early as the 1980s, which brought about ACR-NEMA standard in 1985, under the initiative between American College Radiology (ACR) and National Electrical Manufacturers Association (NEMA). The standard lately evolved into DICOM (Digital Imaging Communication in Medicine), a comprehensive set of standard in the ISO framework that governs modern imaging data storage and exchange across several disciplines (radiology, cardiology, pathology, etc) that operate around images in medicine.
In addition to defining a file format to store imaging data, DICOM also includes an upper layer protocol that dictates how two compliant devices (referred as application entity, each identified by AE title) can negotiate a common syntax to transfer objects (e.g. an image, a report or a discovery). Upper layer refers to layer 5-7 in OSI model, or application layer in TCP/IP model.
Once scanner acquires images from patient, they stores the exams to imaging server for persistent storage. The functionalities of such server expands overtime since 1990s and hence go by different names in different eras, such as PACS (Picture Archive and Communication Systems), VNA (Vendor Neutral Archive) and EI (enterprise imaging) archive. Regardless of naming, they can be generally seen as a highly specialized variation of enterprise content management system. They are usually hosted with a centralized database to index clinical information at patient, exam and image levels. The other key component is the persistent storage devices, usually in the form of a NAS.
Orthanc is an open-source initiative for such imaging servers. It provides a DICOM endpoint, allowing scanning devices to store medical images. It also provides a web viewer allowing users to see the images stored. It is released for many platforms, including Docker images.
Infrastructure as code
We use Amazon Web Service (AWS) for infrastructure as service, and Terraform as the tool to provision resources off AWS, in a reliable and consistent mechanism, known as Infrastructure-as-Code. Terraform is an alternative to CloudFormation, AWS’s proprietary infrastructure-as-code technology. Terraform is developed by Hashicorp as an open-source project, and therefore is vendor neutral. It supports multiple public cloud vendor through different providers. Each provider accesses the vendor specific SDK. For example, the AWS provider integrates with AWS SDK. As a result, the code used in one vendor cannot just be applied to a different vendor without a major overhaul. Terraform’s current version is 0.13 as of Oct 2020, and has gone through some syntax changes since version 0.11. Terraform also produces files for state management locally in the working directory.
When executing, Terraform combines all files in the working directory to assess variables, and create required resources. It is compatible with the most of AWS resources. For example, you can specify user data with templates when creating EC2 instances. You can also create managed service instance as long as it is supported by the provider.
Orthanc web server stores data in sqlite by default, but also has a plugin to support PostgreSQL, an open-source relational database. AWS has managed service (RDS) based on PostgreSQL. In this project, we create an RDS instance that span across two availability zones for minimum high availability. Orthanc also supports storing imaging data including pixels in PostgreSQL, which obviates the need for a dedicated file storage system.
We deploy the application in Docker’s containers for compatibility and portability. The Orthanc server is shipped in Docker images, available in Docker hub registry. The docker environment is configured as part of EC2 instance bootstrapping, including installing packages with YUM, initializing and customizing environment variables. The docker-compose file, and the auxiliary configuration files are provided in the repo. The bootstrapping script installs git and pulls required files from this GitHub repo.
This demo project does not include load balancing, DNS management, or container orchestration.
Orthanc’s web browser natively supports HTTPS. However, the DICOM port does not support TLS natively, as their development has made clear in the FAQ. This leaves a severe security vulnerability because all patient data (protected health information in HIPPA context) would be sent across the Internet in the clear, visible to every network interface along the route. To address this issue we brought in Nginx as a reverse proxy to work at TCP layer to terminate encrypted traffic for Orthanc’s DICOM end point. DICOM upper layer works on top of TCP layer.
In Nginx literature, this use case is referred to as SSL Termination for TCP Upstream Servers. Note that Nginx is providing layer 4 capability in this use case so the certificate and key configuration should not be placed under http section of the configuration file. This layer 4 capability in fact enables security configurations of all protocol that operates in upper layers and can be used in a broad range of situations. It is also noteworthy that Nginx can re-encrypt the traffic on the way out, for even tighter security control measure as outlined in this use case.
It is also helpful to use Nginx to terminate HTTPS traffic, using a pair of certificate and key. When testing with self-signed certificate I realized that Chrome browser has specific requirement on self-signed certificate, or it won’t load the page. So the certificate has to be created as instructed here.
For better security, it is advisable that the RDS instance is provisioned in private subnet, with its data encrypted both in-transit and at-rest. Docker service should also manage sensitive information as secrets.
The deliverable is stored in this Github repo. The docker part of it can be executed on MacBook with PostgreSQL. The entire hardware stack represented by terraform code, can be executed against AWS to create required resources. Checkout README for further instruction. To emulate a modality, one will need a TLS supported DICOM application entity, Horos is a great project on MacOS to serve this purpose, both as DICOM-compliant sender and a viewer. Alternatively, consider some command-line based DICOM toolkit such as dcmtk, or grassroot dicom.