[reportlab-users] utf-8 characters

Discussion:

David Bourillot

2004-04-29 09:09:55 UTC

Hello,

I use reportlab to generate documents and my problem is some special
characters are not displayed correctly.
I use string encoded in utf-8.
How can I do to get theses characters well displayed?

Thanks in advance,
David

Amit Mongia

2004-04-29 09:57:55 UTC

Permalink

Hi,
Create a ttf font object and render it using that. Go
through the example that comes with the user guide for
rina.ttf.
You can use the popular windows font Times New Roman
instead. Or some other ttf font of your choice.
Happens using font embedding.
Regards,
Amit Mongia

Post by David Bourillot
Hello,
I use reportlab to generate documents and my problem
is some special
characters are not displayed correctly.
I use string encoded in utf-8.
How can I do to get theses characters well
displayed?
Thanks in advance,
David
_______________________________________________
reportlab-users mailing list

David Bourillot

2004-04-30 09:17:29 UTC

Permalink

This is a multi-part message in MIME format.

------=_NextPart_000_0044_01C42EA4.B9577F90
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

Re: [reportlab-users] utf-8 charactersHello,

Thanks for your help. I use the Times New Roman font and it's work fine for
most of the documents.
But I have a problem with one where there is this string: "UNIVERSITà DI
NAPOLI"
The string is encoded with utf-8 and when I generate the PDF, I get this
error:

File "c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\base.py", line 204, in
process
res = self._process()

File "c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\abstractModif.py",
line 88, in _process
data = pdf.getPDFBin()

File "c:/MaKaC/indico/code/code\MaKaC\PDFinterface\base.py", line 137, in
getPDFBin
self._doc.build(self._story, onFirstPage=self.firstPage,
onLaterPages=self.laterPages)

File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 801, in build
BaseDocTemplate.build(self,flowables)

File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 631, in build
self.handle_flowable(flowables)

File "C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",
line 549, in handle_flowable
if self.frame.add(f, self.canv, trySplit=self.allowSplitting):

File "C:\Python23\lib\site-packages\reportlab\platypus\frames.py", line
120, in _add
w, h = flowable.wrap(self._getAvailableWidth(), h)

File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
421, in wrap
self.blPara = self.breakLines([first_line_width, later_widths])

File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
564, in breakLines
for w in _getFragWords(frags):

File "C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line
199, in _getFragWords
n = n + stringWidth(w, f.fontName, f.fontSize)

File "C:\Python23\lib\site-packages\reportlab\pdfbase\pdfmetrics.py", line
632, in _slowStringWidth
return font.stringWidth(text, fontSize)

File "C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line
987, in stringWidth
for code in parse_utf8(text):

File "C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line
82, in
parse_utf8=lambda x, decode=codecs.lookup('utf8')[1]:
map(ord,decode(x)[0])

Exception type: exceptions.UnicodeDecodeError
Exception message: 'utf8' codec can't decode byte 0xc3 in position 9:
unexpected end of data

After some little investigation, it's seems to me that when the string is
split, it's cut between the two bytes of the encoded character 'à'

Am I right? is it a known bug?

Best regards,
David
-----Original Message-----
From: reportlab-users-***@reportlab.com
[mailto:reportlab-users-***@reportlab.com]On Behalf Of Amit Mongia
Sent: jeudi 29 avril 2004 11:58
To: reportlab-***@reportlab.com
Subject: Re: [reportlab-users] utf-8 characters

Hi,
Create a ttf font object and render it using that. Go
through the example that comes with the user guide for
rina.ttf.
You can use the popular windows font Times New Roman
instead. Or some other ttf font of your choice.
Happens using font embedding.
Regards,
Amit Mongia

http://two.pairlist.net/mailman/listinfo/reportlab-users

__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover
_______________________________________________
reportlab-users mailing list
reportlab-***@reportlab.com
http://two.pairlist.net/mailman/listinfo/reportlab-users

------=_NextPart_000_0044_01C42EA4.B9577F90
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD><TITLE>Re: [reportlab-users] utf-8 characters</TITLE>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Diso-8859-1">
<META content=3D"MSHTML 6.00.2800.1400" name=3DGENERATOR></HEAD>
<BODY>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2>Hello,</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>Thanks =
for your=20
help. I use the Times New Roman font and it's work fine for most of the=20
documents.</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>But I =
have a problem=20
with one where there is this string: "<FONT face=3D"Times New Roman"=20
size=3D3><EM><U>UNIVERSIT=E0 DI =
NAPOLI</U></EM>"</FONT></FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004>The string is encoded with utf-8 =
and when I=20
generate the PDF, I get this error:</SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004></SPAN> </DIV><SPAN=20
class=3D819264808-30042004>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>File=20
"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\base.py", line 204, in=20
process<BR>    res =3D =
self._process()</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\abstractModif.py", line =
88, in=20
_process<BR>    data =3D =
pdf.getPDFBin()</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"c:/MaKaC/indico/code/code\MaKaC\PDFinterface\base.py", line 137, in=20
getPDFBin<BR>    self._doc.build(self._story,=20
onFirstPage=3Dself.firstPage, =
onLaterPages=3Dself.laterPages)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
801, in=20
build<BR>   =20
BaseDocTemplate.build(self,flowables)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
631, in=20
build<BR>    =
self.handle_flowable(flowables)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py", line =
549, in=20
handle_flowable<BR>    if self.frame.add(f, self.canv,=20
trySplit=3Dself.allowSplitting):</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\frames.py", line 120, =
in=20
_add<BR>    w, h =3D =
flowable.wrap(self._getAvailableWidth(),=20
h)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
421, in=20
wrap<BR>    self.blPara =3D =
self.breakLines([first_line_width,=20
later_widths])</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
564, in=20
breakLines<BR>    for w in=20
_getFragWords(frags):</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py", line =
199, in=20
_getFragWords<BR>    n =3D n + stringWidth(w, f.fontName, =

f.fontSize)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\pdfmetrics.py", line =
632, in=20
_slowStringWidth<BR>    return font.stringWidth(text,=20
fontSize)</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line 987, =
in=20
stringWidth<BR>    for code in=20
parse_utf8(text):</FONT></SPAN></DIV>
<DIV> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>  =
File=20
"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py", line 82, =
in=20
<BR>    parse_utf8=3Dlambda x, =
decode=3Dcodecs.lookup('utf8')[1]:=20
map(ord,decode(x)[0])<BR></FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial =
size=3D2><STRONG>Exception=20
type</STRONG>: exceptions.UnicodeDecodeError <BR><STRONG>Exception=20
message</STRONG>: 'utf8' codec can't decode byte 0xc3 in position 9: =
unexpected=20
end of data <BR></DIV></FONT></SPAN>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>After =
some little=20
investigation, it's seems to me that when the string is split, it's cut =
between=20
the two bytes of the encoded character '<EM><U><FONT face=3D"Times New =
Roman"=20
size=3D3>=E0</FONT></U></EM>'</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN> </DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial size=3D2>Am I =
right? is it a=20
known bug?</FONT></SPAN></DIV>
<DIV><SPAN class=3D819264808-30042004><FONT face=3DArial=20
size=3D2></FONT></SPAN> </DIV>
<DIV><FONT face=3DArial><FONT size=3D2><SPAN =
class=3D819264808-30042004>Best=20
regards,</SPAN></FONT></FONT></DIV>
<DIV><FONT face=3DArial><FONT size=3D2><SPAN=20
class=3D819264808-30042004>David</SPAN></FONT></FONT></DIV></SPAN>
<BLOCKQUOTE dir=3Dltr=20
style=3D"PADDING-LEFT: 5px; MARGIN-LEFT: 5px; BORDER-LEFT: #0000ff 2px =
solid; MARGIN-RIGHT: 0px">
<DIV class=3DOutlookMessageHeader dir=3Dltr align=3Dleft><FONT =
face=3DTahoma=20
size=3D2>-----Original Message-----<BR><B>From:</B>=20
reportlab-users-***@reportlab.com=20
[mailto:reportlab-users-***@reportlab.com]<B>On Behalf Of </B>Amit=20
Mongia<BR><B>Sent:</B> jeudi 29 avril 2004 11:58<BR><B>To:</B>=20
reportlab-***@reportlab.com<BR><B>Subject:</B> Re: [reportlab-users] =
utf-8=20
characters<BR><BR></FONT></DIV>
<P><FONT size=3D2>Hi,</FONT> <BR><FONT size=3D2>Create a ttf font =
object and=20
render it using that. Go</FONT> <BR><FONT size=3D2>through the example =
that=20
comes with the user guide for</FONT> <BR><FONT =
size=3D2>rina.ttf.</FONT>=20
<BR><FONT size=3D2>You can use the popular windows font Times New =
Roman</FONT>=20
<BR><FONT size=3D2>instead. Or some other ttf font of your =
choice.</FONT>=20
<BR><FONT size=3D2>Happens using font embedding.</FONT> <BR><FONT=20
size=3D2>Regards,</FONT> <BR><FONT size=3D2>Amit Mongia</FONT> =
<BR><FONT=20
size=3D2>--- David Bourillot <***@cern.ch> =
wrote:</FONT>=20 <BR><FONT size=3D2>> Hello,</FONT> <BR><FONT size=3D2>> =
</FONT><BR><FONT=20
size=3D2>> I use reportlab to generate documents and my =
problem</FONT>=20 <BR><FONT size=3D2>> is some special</FONT> <BR><FONT size=3D2>> =
characters=20
are not displayed correctly.</FONT> <BR><FONT size=3D2>> I use =
string encoded=20
in utf-8.</FONT> <BR><FONT size=3D2>> How can I do to get theses =
characters=20
well</FONT> <BR><FONT size=3D2>> displayed?</FONT> <BR><FONT =
size=3D2>>=20 </FONT><BR><FONT size=3D2>> Thanks in advance,</FONT> <BR><FONT =
size=3D2>>=20
David</FONT> <BR><FONT size=3D2>> </FONT><BR><FONT size=3D2>>=20
_______________________________________________</FONT> <BR><FONT =
size=3D2>>=20
reportlab-users mailing list</FONT> <BR><FONT size=3D2>>=20
reportlab-***@reportlab.com</FONT> <BR><FONT size=3D2>></FONT> =
<BR><FONT=20
size=3D2><A=20
=
href=3D"http://two.pairlist.net/mailman/listinfo/reportlab-users">http://=
two.pairlist.net/mailman/listinfo/reportlab-users</A></FONT>=20
</P><BR><BR>
<P>       =20
<BR>       =20
        <BR><FONT=20
size=3D2>__________________________________</FONT> <BR><FONT =
size=3D2>Do you=20
Yahoo!?</FONT> <BR><FONT size=3D2>Win a $20,000 Career Makeover at =
Yahoo!=20
HotJobs  </FONT><BR><FONT size=3D2><A=20
=
href=3D"http://hotjobs.sweepstakes.yahoo.com/careermakeover">http://hotjo=
bs.sweepstakes.yahoo.com/careermakeover</A>=20
</FONT><BR><FONT =
size=3D2>_______________________________________________</FONT>=20
<BR><FONT size=3D2>reportlab-users mailing list</FONT> <BR><FONT=20
size=3D2>reportlab-***@reportlab.com</FONT> <BR><FONT size=3D2><A=20
=
href=3D"http://two.pairlist.net/mailman/listinfo/reportlab-users">http://=
two.pairlist.net/mailman/listinfo/reportlab-users</A></FONT>=20
</P></BLOCKQUOTE></BODY></HTML>

------=_NextPart_000_0044_01C42EA4.B9577F90--

Chris Withers

2004-05-03 07:34:01 UTC

Permalink

Post by David Bourillot
Exception type: exceptions.UnicodeDecodeError
unexpected end of data
After some little investigation, it's seems to me that when the string is
split, it's cut between the two bytes of the encoded character 'à'

That would seem unlikely, but maybe ask on the python list for confirmation.

Could it be tha tyou have non-UTF-8 data in your UTF-8 string?

Chris

--
Simplistix - Content Management, Zope & Python Consulting
- http://www.simplistix.co.uk

Marius Gedminas

2004-05-03 09:34:18 UTC

Permalink

--DKU6Jbt7q3WqK7+M
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

=20
That would seem unlikely, but maybe ask on the python list for confirmati=

on.

=20
Could it be tha tyou have non-UTF-8 data in your UTF-8 string?

I'm pretty sure the problem is in the line wrapping algorithm used by
Platypus.

There have been plans to ditch Python 1.5.2 support and switch to
unicode objects instead of str objects with UTF-8 data everywhere.
When this is done, this problem will disappear, as there's no way to
split a unicode string incorrectly [1][2].

[1] AFAIU Python does not use UTF-16 surrogate pairs, right? If you
want to use characters outside the BMP, you're supposed to compile
your Python interpreter with 32-bit Unicode support.

[2] There are also combining characters that might pose problems with
line wrapping. And I'm not talking about BiDi or other exotic
things that Reportlab does not support yet.

Marius Gedminas
--=20
Stupidity management for the superuser is a user space issue in Unix
systems.
-- Alan Cox

--DKU6Jbt7q3WqK7+M
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFAlhIakVdEXeem148RAr1cAJ44zHX1nn487pq1dwCAodNvTKQmHQCfVDYo
9mJEaZJg6dcIUDHZORptK7k=
=ULcx
-----END PGP SIGNATURE-----

--DKU6Jbt7q3WqK7+M--

Amit Mongia

2004-05-03 05:35:39 UTC

Permalink

Sorry friend,
No idea about this.
Regards,
Amit Mongia

Post by David Bourillot
Re: [reportlab-users] utf-8 charactersHello,
Thanks for your help. I use the Times New Roman font
and it's work fine for
most of the documents.
But I have a problem with one where there is this
string: "UNIVERSIT?DI
NAPOLI"
The string is encoded with utf-8 and when I generate
the PDF, I get this
File

"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\base.py",

Post by David Bourillot
line 204, in
process
res = self._process()
File

"c:/MaKaC/indico/code/code\MaKaC\webinterface\rh\abstractModif.py",

Post by David Bourillot
line 88, in _process
data = pdf.getPDFBin()
File

"c:/MaKaC/indico/code/code\MaKaC\PDFinterface\base.py",

Post by David Bourillot
line 137, in
getPDFBin
self._doc.build(self._story,
onFirstPage=self.firstPage,
onLaterPages=self.laterPages)
File

"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",

Post by David Bourillot
line 801, in build
BaseDocTemplate.build(self,flowables)
File

"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",

Post by David Bourillot
line 631, in build
self.handle_flowable(flowables)
File

"C:\Python23\lib\site-packages\reportlab\platypus\doctemplate.py",

Post by David Bourillot
line 549, in handle_flowable
if self.frame.add(f, self.canv,
File

"C:\Python23\lib\site-packages\reportlab\platypus\frames.py",

Post by David Bourillot
line
120, in _add
w, h = flowable.wrap(self._getAvailableWidth(),
h)
File

"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py",

Post by David Bourillot
line
421, in wrap
self.blPara = self.breakLines([first_line_width,
later_widths])
File

"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py",

Post by David Bourillot
line
564, in breakLines
File

"C:\Python23\lib\site-packages\reportlab\platypus\paragraph.py",

Post by David Bourillot
line
199, in _getFragWords
n = n + stringWidth(w, f.fontName, f.fontSize)
File

"C:\Python23\lib\site-packages\reportlab\pdfbase\pdfmetrics.py",

Post by David Bourillot
line
632, in _slowStringWidth
return font.stringWidth(text, fontSize)
File

"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py",

Post by David Bourillot
line
987, in stringWidth
File

"C:\Python23\lib\site-packages\reportlab\pdfbase\ttfonts.py",

Post by David Bourillot
line
82, in
parse_utf8=lambda x,
map(ord,decode(x)[0])
Exception type: exceptions.UnicodeDecodeError
Exception message: 'utf8' codec can't decode byte
unexpected end of data
After some little investigation, it's seems to me
that when the string is
split, it's cut between the two bytes of the encoded
character '?#39;
Am I right? is it a known bug?
Best regards,
David
-----Original Message-----
Behalf Of Amit Mongia
Sent: jeudi 29 avril 2004 11:58
Subject: Re: [reportlab-users] utf-8 characters
Hi,
Create a ttf font object and render it using that.
Go
through the example that comes with the user guide
for
rina.ttf.
You can use the popular windows font Times New
Roman
instead. Or some other ttf font of your choice.
Happens using font embedding.
Regards,
Amit Mongia

Post by David Bourillot
Hello,
I use reportlab to generate documents and my

problem

Post by David Bourillot
is some special
characters are not displayed correctly.
I use string encoded in utf-8.
How can I do to get theses characters well
displayed?
Thanks in advance,
David
_______________________________________________
reportlab-users mailing list

http://two.pairlist.net/mailman/listinfo/reportlab-users

Post by David Bourillot
__________________________________
Do you Yahoo!?
Win a $20,000 Career Makeover at Yahoo! HotJobs
http://hotjobs.sweepstakes.yahoo.com/careermakeover
_______________________________________________
reportlab-users mailing list

David Bourillot

2004-05-04 07:49:45 UTC

Permalink

Thanks for your answers

Cheers,
David

Post by David Bourillot
-----Original Message-----
Sent: lundi 3 mai 2004 11:34
Subject: Re: [reportlab-users] utf-8 characters

Post by Chris Withers

Post by David Bourillot
Exception type: exceptions.UnicodeDecodeError
unexpected end of data
After some little investigation, it's seems to me that when

the string is

Post by Chris Withers

Post by David Bourillot
split, it's cut between the two bytes of the encoded character 'à'

That would seem unlikely, but maybe ask on the python list for

confirmation.

Post by Chris Withers
Could it be tha tyou have non-UTF-8 data in your UTF-8 string?