UTF8 security reminder
Lightning Talk
Working with unicode
1. read, decode
2. ...do something with it...
3. encode, output
See also: perlunitut, perlunifaq
Decoding: manually
* decode("UTF-8", $input)
* decode("UTF8", $input)
* decode_utf8($input)
* utf8::decode($input)
* _utf8_on($input) ?
NO!
Decoding: via PerlIO
open $fh, ":layer", ...;
binmode $fh, ":layer";
* :encoding(UTF-8)
* :encoding(UTF8)
* :utf8 ?
NO!
Not a shortcut!
Setting the UTF8 flag yourself is BAD
The UTF8 flag does not exist.
Except when you're actually dealing with Perl internals.
Perl does Unicode by default
(Except for a bug. See Unicode::Semantics.)
* \w matches a huge number of characters.
* Same for \d, [[:alpha:]], etcetera.
* Write [A-Za-z0-9_] explicitly if that's what you mean.
#!/usr/bin/perl -T
use strict;
%ENV = (PATH => '/usr/bin');
open my $fh, "< :utf8", "test.bin" or die $!;
my $word = readline $fh;
my ($untainted) = $word =~ /^(\w+)$/;
if ($untainted) {
# It passed the regex, so it is "safe".
system "echo $untainted";
}
Input
* We input 7 bytes:
66 6f 6f c9 3b 69 64
f o o ***** i d
* c9 3b is interpreted as U+027B
* But 3b in ASCII is a semicolon!
foo�;id
bash: foo�: command not found
uid=1000(juerd) gid=1000(juerd) groups=4(adm),20(di
alout),24(cdrom),25(floppy),29(audio),30(dip),44(vi
deo),46(plugdev),109(lpadmin),111(scanner),113(admi
n),123(vboxusers),1000(juerd)
* Command execution security hole!