[ Date Index ] [ Thread Index ] [ <= Previous by date / thread ] [ Next by date / thread => ]
Anyone know of any good resource on this..... I'm having fun with Perl and £ signs, and working through the Perl documentation initially (it seems quite good so far). After much pondering I set the locale manually for the current user (export LANG=en_GB.UTF-8"), and relevant things have changed. locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL= vi test.pl #!/usr/bin/perl use strict; use warnings; use utf8; binmode(STDOUT,":utf8"); print "£\n"; In "vim" that looks like a £ sign ('cat' and 'less' want to use the hexagonal ? symbol). ./test.pl Malformed UTF-8 character (unexpected continuation byte 0xa3, with no preceding start byte) at ./test.pl line 8. My first question is what is going wrong here? Is this the wrong way to do Unicode string literal (pressing shift + "3"). I'm not so concerned with the "right way" or a "working way", but I wanted to understand what is going wrong (perl/vim/my brain(likely)/Debian). With Perl 5.8 (on Debian Sarge) I understood that "use utf8" should still be used to allow Unicode literals to be used. It all started with a Java applet, and I'm just working my way through trying to establish consistent handling, and I suspect there is more than one bug, in fact I know there is more than one bug, but I'll try fixing them one at a time. Usually I muddle along in Posix or Latin-1, and rarely encounter any issues with locale, but here we have web pages that have to be in UTF-8 and data coming from them in UTF-8 (we hope). So this locale is generated, but not used much on this box. We don't actually have Unicode literals in the Perl code, but it is going to have to handle them, and I was writing a new test case for one of the Perl template toolkit filters, and I couldn't get it to run at all because it needed some UTF-8 character (so it could be taught not mangle them as badly as it does currently -- I suspect the real mangling is done in Javascript). -- The Mailing List for the Devon & Cornwall LUG http://mailman.dclug.org.uk/listinfo/list FAQ: http://www.dcglug.org.uk/linux_adm/list-faq.html