Discussion:
About UTF-8, XHTML and Character Encoding
AmirBehzad Eslami
2003-10-28 18:08:11 UTC
Permalink
E-Greetings Every One,

I'm developing a web site using XHTML in Farsi (persian - 'fa'). The page encoded in UTF-8 using the following syntax in XHTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fa-IR">

The web page contains non US-ASCII characters such as Farsi and Arabic characters.
My question is:

Should I use "Character References" while writting the content in an XHTML (UTF-8) web page?
Or It is valid to use "Literal UTF-8" characters? (I mean it is not necessary to define a character using Numeric Character Reference)


Thanks in advance,
Behzad
Addison Phillips [wM]
2003-10-28 18:39:14 UTC
Permalink
Hi Behzad,

You can use UTF-8 literal characters (UTF-8 byte sequences) in your web
pages as long as:

1. the page is declared to be UTF-8 (which you've done)
2. the page actually is encoded using UTF-8 (generally you must save the
file as UTF-8, as the default for many text editors is some legacy,
non-Unicode encoding: just because the file is declared to be UTF-8 doesn't
make it so and many people struggle with their pages as a result.)

In fact, the use of character references is a way to get various Unicode
characters into a non-Unicode encoded page. One of the nice things about
using a Unicode encoding is that you can enter and work with the text in the
page in a normal manner, using real characters and not worry so much about
it.

Hope that helps.

Best Regards,

Addison
Addison P. Phillips
Director, Globalization Architecture
webMethods | Delivering Global Business Visibility
http://www.webMethods.com
Chair, W3C Internationalization (I18N) Working Group
Chair, W3C-I18N-WG, Web Services Task Force
http://www.w3.org/International

Internationalization is an architecture.
It is not a feature.

-----Original Message-----
From: www-international-***@w3.org
[mailto:www-international-***@w3.org]On Behalf Of AmirBehzad Eslami
Sent: mardi 28 octobre 2003 10:08
To: www-***@w3.org
Subject: About UTF-8, XHTML and Character Encoding


E-Greetings Every One,

I'm developing a web site using XHTML in Farsi (persian - 'fa'). The page
encoded in UTF-8 using the following syntax in XHTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fa-IR">

The web page contains non US-ASCII characters such as Farsi and Arabic
characters.
My question is:

Should I use "Character References" while writting the content in an XHTML
(UTF-8) web page?
Or It is valid to use "Literal UTF-8" characters? (I mean it is not
necessary to define a character using Numeric Character Reference)


Thanks in advance,
Behzad
Richard Ishida
2003-10-28 19:31:35 UTC
Permalink
Behzad,

Please use normal characters. It makes it much easier to maintain the
source code in editors that display the characters, and will reduce file
size considerably.

NCRs are mainly for use when the encoding doesn't support the character
you need (not the case here) or the author is unable to type in the
actual character (again, I'm assuming that's not the case).

Hope that helps,
RI


============
Richard Ishida
W3C

contact info: http://www.w3.org/People/Ishida/

http://www.w3.org/International/
http://www.w3.org/International/geo/

See the W3C Internationalization FAQ page
http://www.w3.org/International/questions.html


-----Original Message-----
From: www-international-***@w3.org
[mailto:www-international-***@w3.org] On Behalf Of AmirBehzad Eslami
Sent: 28 October 2003 18:08
To: www-***@w3.org
Subject: About UTF-8, XHTML and Character Encoding


E-Greetings Every One,

I'm developing a web site using XHTML in Farsi (persian - 'fa'). The
page encoded in UTF-8 using the following syntax in XHTML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN"
"http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fa-IR">

The web page contains non US-ASCII characters such as Farsi and Arabic
characters.
My question is:

Should I use "Character References" while writting the content in an
XHTML (UTF-8) web page?
Or It is valid to use "Literal UTF-8" characters? (I mean it is not
necessary to define a character using Numeric Character Reference)


Thanks in advance,
Behzad

Loading...