Is there any difference in the behavior of URLs with &
versus &
in the query parameters?
To help you understand the difference the ampersand (&
) is a special character in HTML that signifies the start of a character reference. When you see &
in a URL, it represents the character reference for an ampersand itself (&
).
On the other hand, ¤t;
is not a standard character reference and would be considered an error in HTML. Browsers may attempt to correct this error, but it’s not reliable behavior.
If you were to use a valid character reference, such as ™
for the trademark symbol (™), it would appear in the URL instead of the string you intended.
It’s worth noting that in HTML, a semicolon (;
) is typically required to end a character reference, although HTML 4 allows it to be omitted in certain cases. However, some browsers, like Internet Explorer, may have issues with this omission.
Another common error occurs when including a URL that contains an ampersand (“&”) in HTML:
This is invalid:
<a href="foo.cgi?chapter=1§ion=2©=3&lang=en">
Explanation:
This example generates an error for “unknown entity section” because the “&” is assumed to begin an entity reference. Although browsers often recover safely from this kind of error, real problems can occur in some cases.
For instance, many browsers may incorrectly convert ©=3
to ©=3
, which can cause the link to fail. Similarly, since <
is the HTML entity for the left-pointing angle bracket (“<”), some browsers might convert &lang=en
to 〈=en
. Additionally, one old browser might even interpret §
, converting §ion=2
to §ion=2
.
To avoid these problems when validating your website, you should replace ampersands with &
when writing a URL in your HTML markup.
Note that replacing “&” with &
is only necessary when writing the URL in HTML, where “&” is a special character (along with “<” and “>”). When writing the same URL in a plain text email message or in the location bar of your browser, you should use “&” and not &
. HTML interprets &
as “&”, so the web server only sees “&” and not &
in the query string of the request.
Here’s a fascinating example: when ¤t
is parsed into a text node, it is converted to ¤t
. However, when parsed into an attribute value, it is interpreted as ¤t
.
If you want ¤t
to appear as text in your document, you should write &current
in your markup.
For a detailed explanation, you can refer to the HTML5 parsing specification under the “Named Character Reference State” section.