This tutorial explains about special characters escape in XML. XML content contains the following components
Tags text attributes CDATA comments
Learn which characters are escaped on these XML components.
Why are XML Escape Characters required?
Let’s go through some examples of why is required in XML
- XML tag contains content contains special characters
<result>
example <Text> with "special" characters content
</result>
In the above example, XML tag result content, content contains <
,>
and "
,"
.
Processor parsing this XML content assumes that <text>
is a start tag(<>
) and expects an end tag and fails, same with ""
characters, resulting in parse errors.
these characters need special handling to escape characters.
- XML attributes contain quotation marks
For example, attributes contain single and double quotes as an attribute value.
<user message='sales' and "hr" department'>
Some text content.
</user>
Here a message attribute contains a value sales' and "hr" department
, that contains single and double quotes.
attribute value
is usually enclosed in single
or double
quotes, and content inside quotes
treats that as a value
, But the processor is unable to understand the actual value due to ambiguity.
To avoid the above processing errors, We need to escape some characters
In that case, What characters need to escape in XML Documents?
Following are characters that you need to escape.
Name | Character | Escaping Character |
---|---|---|
Double quotation | " | " |
Single quotation | ' | ' |
Less than | < | < |
Greater Than | > | > |
Amphersand | & | & |
Escape characters are replaced based on the usage and type.
Let’s see some examples where Escape is required and not.
XML escape Characters Examples
Let’s see some examples of XML content
Escape Characters in XML Text Content
- Character
<
need to escape with<
, Otherwise this assumes that start of tag<users/>
symbol. - Always replace
&
with&
anywhere except&
character in&entity;
text. - Remaining characters(
",',>
) are not required to escape it, It is optional.
- Character
<result>
example <Text> with "special" characters content
</result>
<errors>
<!-- valid text content, escape is not required for this characters, Optional-->
"'>
</errors>
Above example, <
,>
, and "
characters replaced with escape characters <
, >
and "
.
"'>
characters not required to escape it.
- Escape Characters in XML attributes
In the following cases, escape is not required.
- If attributes are enclosed in single quotes, The values containing double quotes are valid.
<user message="'"/> <!-- Valid and escape not required for single quote-->
- If attributes are enclosed in double quotes, The values containing single quotes are valid.
<user message='"'/> <!-- Valid and escape not required for double quote-->
- Escape is not required for character
>
<user message='>'/> <!-- Valid and escape not required for greater than -->
- Remaining cases, You need to escape single and double quotes
'
,"
In the below example, double quotes are replaced with escape characters, single quotes are not escaped.
<user message='sales' and "hr" department'>
text example
</user>
- CDATA content Escape characters
Escape characters are not required to be implemented in CDATA sections.
<?xml version="1.0"?>
<user>
<![CDATA[ Characters not required to escape"'<>&]]>
</user>
- Comment
All these 5 characters are not required to escape in comments.