dcm2xml(1)                        OFFIS DCMTK                       dcm2xml(1)

NAME
       dcm2xml - Convert DICOM file and data set to XML


SYNOPSIS
       dcm2xml [options] dcmfile-in [xmlfile-out]

DESCRIPTION
       The  dcm2xml utility converts the contents of a DICOM file (file format
       or raw data set) to XML (Extensible Markup  Language).  There  are  two
       output  formats.  The  first  one  is  specific  to  DCMTK with its DTD
       (Document Type Definition)  described  in  the  file  dcm2xml.dtd.  The
       second  one  refers  to the 'Native DICOM Model' which is specified for
       the DICOM Application Hosting service found in DICOM part 19.

       If dcm2xml reads a raw data set (DICOM data without a file format meta-
       header) it will attempt to guess the transfer syntax by  examining  the
       first  few  bytes  of  the file. It is not always possible to correctly
       guess the transfer syntax and it is better to convert a data set  to  a
       file  format  whenever possible (using the dcmconv utility). It is also
       possible to use the -f and -t[ieb] options to force dcm2xml to  read  a
       data set with a particular transfer syntax.

PARAMETERS
       dcmfile-in   DICOM input filename to be converted ("-" for stdin)

       xmlfile-out  XML output filename (default: stdout)

OPTIONS
   general options
         -h    --help
                 print this help text and exit

               --version
                 print version information and exit

               --arguments
                 print expanded command line arguments

         -q    --quiet
                 quiet mode, print no warnings and errors

         -v    --verbose
                 verbose mode, print processing details

         -d    --debug
                 debug mode, print debug information

         -ll   --log-level  [l]evel: string constant
                 (fatal, error, warn, info, debug, trace)
                 use level l for the logger

         -lc   --log-config  [f]ilename: string
                 use config file f for the logger

   input options
       input file format:

         +f    --read-file
                 read file format or data set (default)

         +fo   --read-file-only
                 read file format only

         -f    --read-dataset
                 read data set without file meta information

       input transfer syntax:

         -t=   --read-xfer-auto
                 use TS recognition (default)

         -td   --read-xfer-detect
                 ignore TS specified in the file meta header

         -te   --read-xfer-little
                 read with explicit VR little endian TS

         -tb   --read-xfer-big
                 read with explicit VR big endian TS

         -ti   --read-xfer-implicit
                 read with implicit VR little endian TS

       long tag values:

         +M    --load-all
                 load very long tag values (e.g. pixel data)

         -M    --load-short
                 do not load very long values (default)

         +R    --max-read-length  [k]bytes: integer (4..4194302, default: 4)
                 set threshold for long values to k kbytes

   processing options
       specific character set:

         +Cr   --charset-require
                 require declaration of extended charset (default)

         +Ca   --charset-assume  [c]harset: string
                 assume charset c if no extended charset declared

         +Cc   --charset-check-all
                 check all data elements with string values
                 (default: only PN, LO, LT, SH, ST, UC and UT)

                 # this option is only used for the extended check whether
                 # the Specific Character Set (0008,0005) attribute should be
                 # present, but not for the conversion of unaffected element
                 # values to UTF-8 (e.g. element values with a VR of CS)

         +U8   --convert-to-utf8
                 convert all element values that are affected
                 by Specific Character Set (0008,0005) to UTF-8

                 # requires support from an underlying character encoding
                 # library (see output of --version on which one is available)

   output options
       general XML format:

         -dtk  --dcmtk-format
                 output in DCMTK-specific format (default)

         -nat  --native-format
                 output in Native DICOM Model format (part 19)

         +Xn   --use-xml-namespace
                 add XML namespace declaration to root element

       DCMTK-specific format (not with --native-format):

         +Xd   --add-dtd-reference
                 add reference to document type definition (DTD)

         +Xe   --embed-dtd-content
                 embed document type definition into XML document

         +Xf   --use-dtd-file  [f]ilename: string
                 use specified DTD file (only with +Xe)
                 (default: /usr/local/share/dcmtk-<VERSION>/dcm2xml.dtd)

         +Wn   --write-element-name
                 write name of the DICOM data elements (default)

         -Wn   --no-element-name
                 do not write name of the DICOM data elements

         +Wb   --write-binary-data
                 write binary data of OB and OW elements
                 (default: off, be careful with --load-all)

       encoding of binary data:

         +Eh   --encode-hex
                 encode binary data as hex numbers
                 (default for DCMTK-specific format)

         +Eu   --encode-uuid
                 encode binary data as a UUID reference
                 (default for Native DICOM Model)

         +Eb   --encode-base64
                 encode binary data as Base64 (RFC 2045, MIME)

DCMTK Format
       The  basic  structure  of  the DCMTK-specific XML output created from a
       DICOM file looks like the following:

       <?xml version="1.0" encoding="ISO-8859-1"?>
       <!DOCTYPE file-format SYSTEM "dcm2xml.dtd">
       <file-format xmlns="http://dicom.offis.de/dcmtk">
         <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit">
           <element tag="0002,0000" vr="UL" vm="1" len="4"
                    name="MetaElementGroupLength">
             166
           </element>
           ...
           <element tag="0002,0013" vr="SH" vm="1" len="16"
                    name="ImplementationVersionName">
             OFFIS_DCMTK_353
           </element>
         </meta-header>
         <data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit">
           <element tag="0008,0005" vr="CS" vm="1" len="10"
                    name="SpecificCharacterSet">
             ISO_IR 100
           </element>
           ...
           <sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence">
             <item card="3">
               <element tag="0028,3002" vr="xs" vm="3" len="6"
                        name="LUTDescriptor">
                 256 8
               </element>
               ...
             </item>
             ...
           </sequence>
           ...
           <element tag="7fe0,0010" vr="OW" vm="1" len="262144"
                    name="PixelData" loaded="no" binary="hidden">
           </element>
         </data-set>
       </file-format>

       The 'file-format' and 'meta-header' tags  are  absent  for  DICOM  data
       sets.

   XML Encoding
       Attributes  with  very  large  value  fields  (e.g. pixel data) are not
       loaded by default. They can be identified by the  additional  attribute
       'loaded'  with  a  value  of 'no' (see example above). The command line
       option --load-all forces to load all value fields  including  the  very
       long ones.

       Furthermore, binary data of OB and OW attributes are not written to the
       XML  output  file  by  default. These elements can be identified by the
       additional attribute 'binary' with a  value  of  'hidden'  (default  is
       'no').  The  command line option --write-binary-data causes also binary
       value fields to be printed (attribute value is 'yes' or 'base64'). But,
       be careful when using this option together with --load-all  because  of
       the  large  amounts  of pixel data that might be printed to the output.
       Please note that in this context element values with a VR of OD, OF, OL
       and OV are not regarded as 'binary data'.

       Multiple values (i.e. where the DICOM  value  multiplicity  is  greater
       than  1)  are  separated  by a backslash '\' (except for Base64 encoded
       data). The 'len' attribute  indicates  the  number  of  bytes  for  the
       particular  value  field as stored in the DICOM data set, i.e. it might
       deviate from  the  XML  encoded  value  length  e.g.  because  of  non-
       significant padding that has been removed. If this attribute is missing
       in 'sequence' or 'item' start tags, the corresponding DICOM element has
       been stored with undefined length.

Native DICOM Model Format
       The  description  of  the Native DICOM Model format can be found in the
       DICOM standard, part 19 ('Application Hosting').

   Bulk Data
       Binary data, i.e. DICOM element values with Value Representations  (VR)
       of OB or OW, as well as OD, OF, OL, OV and UN values are by default not
       written  to  the  XML  output  because of their size. Instead, for each
       element, a new Universally Unique Identifier (UUID) is being  generated
       and  written as an attribute of a <BulkData> XML element. So far, there
       is no possibility to write an additional file to hold the  binary  data
       for  each  of  the  binary  data  chunks.  This  is not required by the
       standard, however, it might be useful for implementing  an  Application
       Hosting  interface;  thus  this  feature  may  be  available  in future
       versions of dcm2xml.

       In addition, Supplement 163 (Store Over  the  Web  by  Representational
       State  Transfer  Services)  introduces a new <InlineBinary> XML element
       that allows for encoding binary data as Base64. Currently, the  command
       line  option  --encode-base64  enables  this encoding for the following
       VRs: OB, OD, OF, OL, OV, OW and UN.

   Known Issues
       In addition to what is written in the above  section  on  'Bulk  Data',
       there  are  further known issues with the current implementation of the
       Native DICOM Model format. For example, large element values with a  VR
       other  than OB, OD, OF, OL, OV, OW or UN are currently never written as
       bulk data, although it  might  be  useful,  e.g.  for  very  long  text
       elements (especially UT) or very long numeric fields (of various VRs).

NOTES
   Character Encoding
       The  XML  character encoding is determined automatically from the DICOM
       attribute (0008,0005) 'Specific  Character  Set'  using  the  following
       mapping:

       ASCII         (ISO_IR 6)    =>  "UTF-8"
       UTF-8         "ISO_IR 192"  =>  "UTF-8"
       ISO Latin 1   "ISO_IR 100"  =>  "ISO-8859-1"
       ISO Latin 2   "ISO_IR 101"  =>  "ISO-8859-2"
       ISO Latin 3   "ISO_IR 109"  =>  "ISO-8859-3"
       ISO Latin 4   "ISO_IR 110"  =>  "ISO-8859-4"
       ISO Latin 5   "ISO_IR 148"  =>  "ISO-8859-9"
       ISO Latin 9   "ISO_IR 203"  =>  "ISO-8859-15"
       Cyrillic      "ISO_IR 144"  =>  "ISO-8859-5"
       Arabic        "ISO_IR 127"  =>  "ISO-8859-6"
       Greek         "ISO_IR 126"  =>  "ISO-8859-7"
       Hebrew        "ISO_IR 138"  =>  "ISO-8859-8"

       If  this DICOM attribute is missing in the input file, although needed,
       option --charset-assume can be used to specify an appropriate character
       set manually (using one of the DICOM defined  terms).  For  reasons  of
       backward  compatibility  with  previous  versions  of  this  tool,  the
       following terms are also supported  and  mapped  automatically  to  the
       associated  DICOM  defined  terms:  latin-1, latin-2, latin-3, latin-4,
       latin-5, latin-9, cyrillic, arabic, greek, hebrew.

       Multiple  character  sets  using  code  extension  techniques  are  not
       supported.  If  needed, option --convert-to-utf8 can be used to convert
       the DICOM file or data set to UTF-8 encoding prior to the conversion to
       XML format. This is also useful for DICOMDIR files where each directory
       record can have a different character set.

       If no mapping is defined and option --convert-to-utf8 is not used, non-
       ASCII characters and those below #32 are stored as '&#nnn;' where 'nnn'
       refers to the numeric  character  code.  This  might  lead  to  invalid
       character  entity  references  (such as '&#27;' for ESC) and will cause
       most XML parsers to reject the document.

LOGGING
       The level of logging output of  the  various  command  line  tools  and
       underlying  libraries  can  be  specified by the user. By default, only
       errors and warnings are written to the  standard  error  stream.  Using
       option  --verbose  also  informational messages like processing details
       are reported. Option --debug can be used to get  more  details  on  the
       internal  activity,  e.g.  for debugging purposes. Other logging levels
       can be selected using option --log-level. In --quiet  mode  only  fatal
       errors  are reported. In such very severe error events, the application
       will usually terminate. For  more  details  on  the  different  logging
       levels, see documentation of module 'oflog'.

       In  case  the logging output should be written to file (optionally with
       logfile rotation), to syslog (Unix) or the event log  (Windows)  option
       --log-config  can  be  used.  This  configuration  file also allows for
       directing only certain messages to a particular output stream  and  for
       filtering  certain  messages  based  on the module or application where
       they are generated.  An  example  configuration  file  is  provided  in
       <etcdir>/logger.cfg.

COMMAND LINE
       All  command  line  tools  use  the  following notation for parameters:
       square brackets enclose optional  values  (0-1),  three  trailing  dots
       indicate  that multiple values are allowed (1-n), a combination of both
       means 0 to n values.

       Command line options are distinguished from parameters by a leading '+'
       or '-' sign, respectively. Usually, order and position of command  line
       options  are  arbitrary  (i.e.  they  can appear anywhere). However, if
       options are mutually exclusive the rightmost appearance is  used.  This
       behavior  conforms  to  the  standard  evaluation  rules of common Unix
       shells.

       In addition, one or more command files can be specified  using  an  '@'
       sign  as  a  prefix to the filename (e.g. @command.txt). Such a command
       argument is replaced by the content  of  the  corresponding  text  file
       (multiple  whitespaces  are  treated  as a single separator unless they
       appear between two quotation marks) prior to  any  further  evaluation.
       Please  note  that  a command file cannot contain another command file.
       This simple but effective  approach  allows  one  to  summarize  common
       combinations  of  options/parameters  and  avoids longish and confusing
       command lines (an example is provided in file <datadir>/dumppat.txt).

ENVIRONMENT
       The dcm2xml utility  will  attempt  to  load  DICOM  data  dictionaries
       specified  in the DCMDICTPATH environment variable. By default, i.e. if
       the  DCMDICTPATH  environment   variable   is   not   set,   the   file
       <datadir>/dicom.dic  will be loaded unless the dictionary is built into
       the application (default for Windows).

       The  default  behavior  should  be  preferred   and   the   DCMDICTPATH
       environment  variable  only used when alternative data dictionaries are
       required. The DCMDICTPATH environment variable has the same  format  as
       the  Unix  shell PATH variable in that a colon (':') separates entries.
       On Windows systems, a semicolon (';') is used as a separator. The  data
       dictionary  code  will  attempt  to  load  each  file  specified in the
       DCMDICTPATH environment variable. It is an error if no data  dictionary
       can be loaded.

       Depending  on  the  command line options specified, the dcm2xml utility
       will attempt to load character set mapping tables.  This  happens  when
       DCMTK  was compiled with the oficonv library (which is the default) and
       the mapping tables are not built into the library (default  when  DCMTK
       uses shared libraries).

       The  mapping  table  files  are  expected  in  DCMTK's  <datadir>.  The
       DCMICONVPATH environment variable can be used to  specify  a  different
       location.  If  a  different location is specified, those mapping tables
       also replace any built-in tables.

FILES
       <datadir>/dcm2xml.dtd - Document Type Definition (DTD) file

SEE ALSO
       xml2dcm(1), dcmconv(1)

COPYRIGHT
       Copyright (C) 2002-2025 by OFFIS e.V., Escherweg  2,  26121  Oldenburg,
       Germany.

Version 3.7.0                   Mon Dec 15 2025                     dcm2xml(1)
