UTF8 security reminder

    Lightning Talk

    Working with unicode

    1. read, decode
    2. ...do something with it...
    3. encode, output

    See also: perlunitut, perlunifaq

    Encoding

    * Hard to do wrong
    * ... if the recipient decodes properly!

    Decoding: manually

    * decode("UTF-8", $input)
    * decode("UTF8", $input)
    * decode_utf8($input)
    * utf8::decode($input)
    * _utf8_on($input) ?
        NO!

    Decoding: via PerlIO

    open $fh, ":layer", ...;
    binmode $fh, ":layer";
    * :encoding(UTF-8)
    * :encoding(UTF8)
    * :utf8 ?
        NO!
        Not a shortcut!

    Setting the UTF8 flag yourself is BAD

    The UTF8 flag does not exist.
    Except when you're actually dealing with Perl internals.

    Perl does Unicode by default

    (Except for a bug. See Unicode::Semantics.)
    * \w matches a huge number of characters.
    * Same for \d, [[:alpha:]], etcetera.
    * Write [A-Za-z0-9_] explicitly if that's what you mean.

    Demo exploit

    * Released at T-DOSE 2 weeks ago
    * Actual security risk
    #!/usr/bin/perl -T
    use strict;
    %ENV = (PATH => '/usr/bin');
    
    open my $fh, "< :utf8", "test.bin" or die $!;
    my $word = readline $fh;
    
    my ($untainted) = $word =~ /^(\w+)$/;
    
    if ($untainted) {
        # It passed the regex, so it is "safe".
        system "echo $untainted";
    }
    

    Input

    * We input 7 bytes:
    66 6f 6f c9 3b 69 64
    f  o  o  ***** i  d
    * c9 3b is interpreted as U+027B
    * But 3b in ASCII is a semicolon!
    foo�;id

    Output

    bash: foo�: command not found
    uid=1000(juerd) gid=1000(juerd) groups=4(adm),20(di
    alout),24(cdrom),25(floppy),29(audio),30(dip),44(vi
    deo),46(plugdev),109(lpadmin),111(scanner),113(admi
    n),123(vboxusers),1000(juerd)
    

    * Command execution security hole!