DICOM data encoding

DICOM is a standard for medical imaging exchanges, originally in radiology, but later expanded into other departments where mass imaging data are acquired, such as cardiology. One part of the DICOM standard defines how to lay out the data without providing any official code implementation. It is up to each vendor to implement their application and declare what parts of DICOM standard they are compliant to in their individual conformance statement. Therefore it is common that different vendors have different perspectives of whether each other’s implementation is compliant.

Many arguments of this topic revolves around the encoding of DICOM data (or in strict term, Information Object Definition, IOD). In this article I try to clarify DICOM IOD encoding, best practices from vendor support. A more comprehensive coverage of this topic is chapter 5 of Oleg Pianykh’s book, which discussed the basics such as implicit vs explicit VR, big vs little endian.

The very purpose of standard is to determine a common protocol in which two application communicate with each other. A strong standard leaves no room for ambiguity in implementation and unfortunately, many healthcare IT standards are weak standards. When two devices from two vendors fail to communicate properly, the healthcare provider (device buyer) should take the lead of moderation, because they suffer the most pain from proprietary implementation and they benefit the most from good interoperability. In reality however, healthcare organizations with insufficient technical competency in their information technology team, usually leave it in Vendors’ hands to configure integration, with minimum supervision on standard conformance. This allows vendor to put in technologies that are just made to work, but not fully up to standard. This is not optimal. Remember: Proprietary technology = Vendor Lock-In

From vendor’s perspective, the implementation should just comply with DICOM standard. They should not accommodate to third party application that are incorrectly implemented. Vendor’s responsibility with customer is simply to proof that the data encoding is compliant with DICOM; or if otherwise is discovered, escalate to engineering with low level technical detail. It is a courtesy in the discretion of vendor’s support operation, to investigate and advise on the integrity of externally sourced DICOM data. In reality however, vendors are pressured to just make it work.

DICOM Objects

In DICOM encoding, an IOD, either in a C-Store or a part 10 file consist of hundreds of data elements. A data element (uniquely identified by a tag) can be either:

  • A single item;
  • A sequence (SQ) of multiple items;

Transfer Syntax

Below is the table that summarizes the metadata and pixel data encoding under different transfer syntax UID specified in (0020,0010). It is not meant to be a completed list.

transfer syntax UIDtransfer syntax nameMetadata encodingPixel data encoding
1.2.840.10008.1.2Implicit VR Endian: Default Transfer Syntax for DICOMImplicit VR Little EndianImplicit VR Little Endian
1.2.840.10008.1.2.1Explicit VR Little EndianExplicit VR Little EndianExplicit VR Little Endian
1.2.840.10008. Lossless, Nonhierarchical, First- Order Prediction
(Processes 14 [Selection Value 1]):
Default Transfer Syntax for Lossless JPEG Image Compression
Explicit VR Little EndianJPEG Lossless Compression
1.2.840.10008. Lossless Image CompressionExplicit VR Little EndianJPEG-LS Lossless Compression

Practically, Big Endian encoding is rarely used in DICOM. So is .99 so they are not covered. Little Endian simply refers to the reverse ordering of each pair of bytes. The rest of this article only discusses metadata encoding.

Encoding of Data Element of single item

Most DICOM parsers out in the market don’t have a problem with data elements of single items. However it is important to understand the encoding of single item before trying to understand sequence. Regardless of implicit or explicit VR, big or little endian, a single item is always encoded in the following sequence:

LengthData formatExample
TagGroup number2-byteunsigned integer
Element number2-byteunsigned integer0010,0010
(present only for explicit VR)
2-byte2 ASCII charactersPN
Length of Value2-bytean even integer0x000A
Valuedetermined by length of valuedetermined by VRSmith^Joe 

Note that in the example, length of value is 10 in decimal, and the value “Smith^Joe ” contains a trailing space to make up for 10 character length. It is required by DICOM that the length be even number of characters, which sometimes omitted and tolerated by different implementations. The corresponding DICOM’s guideline is here. On this page please understand Figure 7.1-1, Table 7.1-1 and 7.1-2 before reading on.

Encoding of data element with SQ type

When it comes to SQ (sequence), there’s much confusion about what are valid options for sequence encoding. There is also a good chance that a third party DICOM interpreter is incompletely implemented, and mistakenly complains correctly-encoded sequence as bad data. Symptoms include, but not limited to, A-ABORT an association, silence a TCP connection, complaining in their logs that the data is “corrupted”.

Here is the reference to DICOM standard as to the valid options for sequence encoding. The language is fairly abstract and I’m making some addition to elucidate it:

  1. When determining the sequence encoding, DICOM needs to address two problems:
    1. Define how to start and end a data item;
    2. Define how to start and end the entire sequence;
  2. You can explicitly specify the length of a data item, or leave it undefined; Similarly for the entire sequence, you can explicitly specify the length upfront, or leave it undefined. This leads to four possible combinations but one of them is invalid. The following table points to an example of each based on the tables in DICOM document:
Sequence Length is explicitSequence Length is undefined
Data item length is explicitValid Format A
Exemplified in Table 7.5-1, when sequence length is explicit, length of each individual data item must be explicit as well.
– Sequence length is 0F00H, 3840 bytes
– Data Item length is 04F8H, 1272 bytes
– No delimiter (FFFE,E0DD or FFFE, E00D) is needed for sequence of data element
– Sum of unit length equals total length: (1272 + 4 + 4 ) x 3 = 3840

Even though this particular example is implicit VR, the parser should know this is a sequence by the length calculation
a. the length of data element (sequence) is 0F00H
Valid Format B
Exemplified in Table 7.5-2, as well as the first Item in Table 7.5-3
Sequence length is undefined, marked by (FFFF,FFFF) as the length value of data element
Data Item length is explicit defined, as follows:
– 98A5 and B321 for the two items in Table 7.5-2
– 17B6 for the first item in Table 7.5-3
– FFFE,E000 marks the start of a data item
Data Item length is undefined This is NOT a valid encoding option.
It would be error prone, if the total length is explicitly defined but the unit length is not. 
Valid Format C
Exemplified in the second Item in Table 7.5-3
– Sequence length is undefined, marked by (FFFF,FFFF) as the length value of data element
– Data ltem length is also undefined
– FFFE,E00D followed by 00000000H marks the end of the data item
– FFFE,E0DD followed by 00000000H marks the end of the sequence
  1. As shown above, there are multiple valid options (A, B and C) to encode data item in a sequence and the entire sequence. If sequence length is undefined, explicit and undefined data item length can even co-exist within the same sequence. (Table 7.5-3)
  2. if the length is left undefined at the beginning, you must clearly mark the end of the data item, or sequence using one of the special data elements.

Special Data Element used in SQ encoding:

FFFE,E000 (Data Item) – marks the start of each data item inside of SQ element; it shall be followed by a 4-byte field to indicate the length of the data item (either an explicit value or FFFFFFFFH to indicate undefined length)

FFFE,E00D (Item Delimitation) – marks the end of each data item only if the length of that data item is undefined; it shall follow the data item immediately and the length of itself shall be set to 00000000H

FFFE,E0DD (Sequence Delimitation) – marks the end of an entire sequence only if the length of that sequence is undefined; it shall follow the last item of the SQ element and the length of itself shall be set to 00000000H