=head1 TITLE A Sane Parrot =head2 AUTHOR Leopold "leo" Toetsch =head1 ABSTRACT * Parrot is a virtual machine (VM) - a software CPU * Parrot is the interpreter for Perl6 but not only * Parrot provides everything for the implementation of an interpreted language except the parser-related stuff * Parrot is an interpreter language construction kit =head1 Introduction to Parrot =head2 Why Parrot? * April Fool's joke 2001 by Simon Cozens: "Programming Parrot", but * it became reality a few months later: * HLL independent interpreter for dynamic languages, especially Perl6 * reusable library (C6PAN) * HLL interopbility =head2 Structure +-----------+ +-----------+ +-----------+ | Compiler/ | | | | | | Assembler |---- | Optimizer |--- |Interpreter| | PASM/PIR | | | | | | +-----------+ | +-----------+ +-----------+ | | | +-----------+ | +-----------+ | | | | | | | | AST |--+ | Bytecode |----+ | | | | +-----------+ +-----------+ =head1 Features =head2 Register based * 4 distinct register sets: I, S, N, and P registers for * Integer, String, Numbers, PMCs * no artificial restrictions that hardware has * (0..n) I,S,N,P registers per call frame =head2 Opcodes * ~1200 opcodes (decreasing tendency - it was 1500) * 3-register maschine op dest, src, src add I, I, I # dest = src1 + src2 add I, Ic, I add I, I, Ic add I, Ic, Ic # optimized away to set I, const add N, N, N add N, N, Nc add N, Nc, N add N, Nc, Nc # optimized away to set N, const ... add N, N, I # => add N, N, N ... add P, P, P # => infix .MMD_ADD, P, P, P add P, P, N add P, P, I ... add I, I # dest += src add N, N add P, P # => i_infix .MMD_ADD, P, P Actually there are 10 variants of B now. A short PASM example: $ cat sane-1.pasm set I0, 2 set I1, 5 add I2, I0, I1 print I2 print "\n" end $ ./parrot sane-1.pasm 7 =head2 DOD/GC * no ref counts * much simpler programming, not so error-prone =head2 HLL string functions with unicode support * concat * substr * index * length * ord * sprintf * join ... =head2 NCI - Native Call Interface * a simple to use scheme to access C functions and vars e.g. calling the I library functions is all implemented in Parrot only: .local pmc lib, func loadlib lib, 'libncurses' dlfunc func, lib, 'addstr', 'it' $I0 = func("hello world") =head2 Continuation based Function Calls Pass a current snapshot of the interpreter's context on to the called function. On invoking this continuation the state of the interpreter is restored. That's just like a "plain" function return, but much more flexible. =head3 Continuation - the rest of the computation - a "super" goto - setjmp/longjmp A function call leaves a "hole" in the instructions to be filled by return results or by invoking a continuation. * continuations, closures, subs, ... are first class objects =head2 Exception Handling * try/catch - a limited escape continuation =head2 I/O * I/O layers * asynchronous ehem, that's The Plan, but it's simpler to provide synchronous I/O above async than the reverse. =head2 Events * timer * signals * asynchronous callbacks from C =head2 Threads * multithreaded at least where the platform supports it. =head2 HLL Object Support * instance variables (attributes) * multiple inheritance * MMD =head1 A Sane Parrot =head2 HLL object support $ cat sane-oo.pir .sub main :main .local pmc cl, obj, p cl = newclass "Point" # create new class addattribute cl, ".x" # define attributes addattribute cl, ".y" # define attributes obj = new "Point" # create instance of class p = new .Integer p = 2 setattribute obj, ".x", p # set the attribute p = new .Integer p = 3 setattribute obj, ".y", p # set the attribute ($P0, $P1) = obj."get_point"() # a method call print_item $P0 print_item $P1 print_newline .end .namespace ["Point"] .sub get_point :method $P0 = getattribute self, ".x" $P1 = getattribute self, ".y" .return ($P0, $P1) .end $ ./parrot sane-oo.pir 2 3 =head2 MMD - Multi Method Dispatch A traditional implementation to handle different types for e.g. add Px, Integer, Float add Px, Integer, Integer uses cascades of B sequences like: if left_type == Integer if right_type == Integer ... else if right_type == Float ... else if left_type == Float ... else if ... String, Complex, BigInt, ... (We had that in earlier times) With MMD there is one specialized function to handle a pair of types. Multi method dispatch makes it basically faster and easier to implement. Additionaly this scheme allows for a more fine-grained inheritance. Python or Tcl can e.g. inherit the (Float, Integer) function but provide it's own (Integer, Integer) implementation. =head2 Operator Overloading Perl5 can with help of the B module replace almost all core operations with user provided code. E.g. use overload '+' => \&myadd; For Perl6 the syntax is: multi sub *infix<+> ... And Python provides overloading by setting the B<__add__> attribute of a class or metaclass of the object. E.g. class I(int): def __add__(l, r): ... =head2 Operators are Methods/Subs At the surface a HLL compiler just emits a plain opcode. E.g. add Px, Py, Pz # add Py to Pz giving Px Px = add Py, Pz # the same Px = Py + Pz # the same But as the B operation can be - as we've seen above - any user provided code, the internal representation at the opcode level in the Parrot runloop is more like something: Px = Py."__add"(Pz) # multi method or better and more correctly: Px = "__add"(Py, Pz) # multi sub That is a (multi) sub call of the B<__add> function. As such operations like adding two numbers are pretty common this is optimized to a special opcode name "infix": Px = infix .MMD_ADD, Py, Pz As the B opcode knows it's operands, we save the overhead of a general function call syntax, which would additionally need the specification of the function signature. But the functionality is the same: find the multi sub that is capable of adding two PMCs of the classes of B and B and call it. =head2 PIC - Polymorphic Inline Cache Due to MMD the finally executed function depends on the types of B and B. Doing a full method search is of course undesirable and slow. So the method lookup is just done once. Then the above opcode replaces itself with something like: Px = pic_infix cache, Py, Pz And the implementation is basically: if (cache.types == (Py.type << 16) | Pz.type) Px = (cache.func)(Py, Pz) else // slow path The B is a small structure created per bytecode location that needs caching, thus the term B. The shown fast path is taken for about 95% of the cases, as the involved types of an operation at one specific bytecode location are changing just rarely. But to accomodate such situations a few more (typically three) type combinations are kept in an extended cache structure. This scheme is fully dynamic, it can cope with overloaded operations as one method lookup is performed initially. But running in a loop, it's around 30% faster then a static MMD lookup that has fixed entries for the called function. =head2 Tail calls .sub foo ... x = bar(a, b, c) .return(x) .end * new call frame, just for calling 'bar' and then returning * bad: highly recursive tail calls (FP languages!) =head3 Don't bogart that frame my friend, pass it over to me. .return bar(a, b, c) # tail call function "bar" .return obj.meth(x) # tail method call "meth" For example: $ cat sane-tail.pir .sub main :main trace 1 $S0 = foo() print $S0 .end .sub foo print "in foo\n" .return bar() print "never\n" .end .sub bar print "in bar\n" .return ("back\n") .end $ ./parrot sane-tail.pir 2 set_args PMC_C[3] 4 set P0, PMC_C[11] - P0=PMCNULL, 7 get_results PMC_C[7] (1), S0 - , 10 invokecc P0 - P0=Sub=PMC(0xfcf1b8 pc:15) 15 print "in foo\n" in foo 17 set_args PMC_C[3] 19 set P0, PMC_C[15] - P0=PMCNULL, 22 tailcall P0 - P0=Sub=PMC(0xfcf1a0 pc:29) 29 print "in bar\n" in bar 31 set_returns PMC_C[16] (1), "back\n" 34 returncc 12 print S0 - S0="back\n" back 14 end =head2 Dynamic Loading / More HLL support $ cat sane-tcl.pir .HLL "Tcl", "tcl_group" .pragma n_operators 1 .sub "main" :main .local pmc x, y, z y = new "TclInt" y = 7 x = new "TclInt" x = 2 z = y / x # Tcl overrides div as // print z print "\n" $S0 = typeof z print $S0 print "\n" .end $ ./parrot sane-tcl.pir 3 TclInt The first line pulls in the dynamic Tcl library and makes all Tcl types available for further use. Within the B directive all unary and binary operations return a new destination result in the flavour of the HLL currently used. =head2 PGE - Parrot Grammar Engine The grammar engine provides compilers for Perl5 regular expressions Perl6 rules, for glob patterns, and Text::Balanced matching. The result of the match is a compiled coroutine object that provides an iterator interface to extract matching pieces and captured substrings. $ cat sane-pge.pir .sub test :main load_bytecode "PGE.pbc" load_bytecode "PGE/Dumper.pbc" load_bytecode "dumper.pbc" .local pmc comp, rule, match, dump comp = find_global "PGE", "p6rule" rule = comp("a(b)b+") match = rule("xxabbbc") unless match goto no_match dump = global "_dumper" dump(match, "$/") end no_match: print "match failed\n" .end $ ./parrot sane-pge.pir "$/" => PMC 'PGE::Rule' => "abbb" @ 2 { [0] => PMC 'PGE::Rule' => "b" @ 3 } =head1 Networking * socket, sockaddr, connect, send * bind, listen, accept, recv A detailed example will follow in a minute, in combination with: =head1 Freeze/Thaw * freeze = serialize a data structure to a binary string * thaw = create data structure from frozen string * arbitrary deep nesting * self-referential structures * machine-independ format # at remote PC (x86 linux - little endian) $ ./parrot sane-srv.pir feather.perl6.nl Foo hello from here # at local PC (OS/X ppc - big endian) $ ./parrot sane-ft.pir feather.perl6.nl "hello from here" done. The client code: $ cat sane-ft.pir .sub main :mainMAIN .param pmc argv .local int argc .local string srv, message argc = argv srv = "localhost" message = "hello\n" if argc < 2 goto def_srv srv = argv[1] if argc < 3 goto def_msg message = argv[2] def_msg: def_srv: .local pmc sock .local string address, buf .local int ret socket sock, 2, 1, 0 unless sock goto ERR # Pack a sockaddr_in structure with IP and port sockaddr address, 2007, srv connect ret, sock, address unless ret goto SEND $S0 = err printerr $S0 printerr "\n" exit 1 SEND: .local pmc o, cl, s cl = newclass "Foo" addattribute cl, "x" o = new "Foo" s = new String s = message setattribute o, "x", s buf = freeze o send ret, sock, buf close sock print "done.\n" end ERR: printerr "error\n" .end The server code: $ cat sane-srv.pir .sub main :main .param pmc argv .local int argc .local string srv argc = argv srv = "localhost" if argc < 2 goto def_srv srv = argv[1] def_srv: .local pmc sock, work .local string address, buf .local int ret socket sock, 2, 1, 0 # PF_INET, SOCK_STREAM, ip v4 unless sock goto ERR_NO_SOCKET # Pack a sockaddr_in structure with IP and port address = sockaddr 2007, srv ret = bind sock, address listen ret, sock, 5 accept work, sock .local string got got = "" MORE: recv ret, work, buf if ret <= 0 goto SERVE_REQ got .= buf goto MORE SERVE_REQ: close work close sock .local pmc o, a o = thaw got $S0 = typeof o print $S0 print "\n" a = getattribute o, "x" print a .end =head1 Parrot Performance - MOPs Million Opcodes Per Second Nicholas Clark - When Perl is not quite fast enough * opcode dispatch is a limitating factor of interpreter execution speed Opcode dispatch is timed with the following code: loop: a = a - 1 if a goto loop We have these MOPs numbers normalized to perl5 := 1 (logarithmic scale): 1 2 4 10 20 40 100 200 400 +------+-------+---------+--------+-------+---------+-------+--------+ ^ ^ ^^ ^ ^^ ^ ^ ^ ^ | | || PIC || | | | | | | || || | | | | | | || || | | | -j | | |-S,j,C || -S -C gcc | | | || | | -g |PIC-C-inl perl5, | | python, PMC -f INT -f,-g ruby VM MOPs perl5, python, ruby 1.0 parrot PMC MOPs -f 4.0 -g 4.5 -S,j,-C 5.0 PIC-C 7.5 PIC-S-inl 9.0 PIC-C-inl 13.0 parrot int MOPs -g,-f 11.5 -S 20.5 -C 46.0 -j 400.0 gcc 2.95.2, -O3 ~150.0 Parrot runcore switches: -f ... fast function core -g ... cgoto -S ... switched prederefed -C ... CGP, direct-threaded -j ... JIT PIC-C ... CGP with PIC PIC-C-inl ... CGP with inlined PIC opcode PIC-S-inl ... switched with inlined PIC opcode * md5sum.pir runs currently at half of perl5/XS on x86. =head1 AUTHOR Leopold "leo" Toetsch . http://www.parrotcode.org http://www.poniecode.org http://www.pugscode.org http://toetsch.at/parrot/ . Perl6 Essentials, O'Reilly Perl6 and Parrot Essentials, 2nd Ed, O'Reilly (Allison Randal, Dan Sugalski & Leopold Toetsch) . Thanks. Obrigado. Danke. Fin. Konec. The end. =head2 Questions Any questions?