Synopsis 2: Bits and Pieces
Larry Wall <larry@wall.org>
Maintainer: Larry Wall <larry@wall.org> Date: 10 Aug 2004 Last Modified: 2 Apr 2008 Number: 2 Version: 132
This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl 6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)
From t/regex/smartparse.t lines 9–14 (0 √, 1 ×): (skip)
| L<S02/"One-pass parsing"> |
| |
| =cut |
| |
× | ok(eval('regex foo { <[ } > ]> }; 1'), |
| "can parse non-backslashed curly and right bracket in cclass"); |
To the extent allowed by sublanguages' parsers, Perl is parsed using a one-pass, predictive parser. That is, lookahead of more than one "longest token" is discouraged. The currently known exceptions to this are where the parser must:
[...] composer.
From t/syntax/unicode.t lines 7–93 (30 √, 0 ×): (skip)
| # L<S02/"Lexical Conventions"/"Perl is written in Unicode"> |
| |
| # Unicode variables |
| # english ;-) |
√ | ok(try {my $foo; sub foo {}; 1}, "ascii declaration"); |
√ | is(try {my $bar = 2; sub id ($x) { $x }; id($bar)}, 2, "evaluation"); |
| |
| # umlauts |
√ | ok(try {my $übervar; sub fü {}; 1}, "umlauts declaration"); |
√ | is(try {my $schloß = 2; sub öok ($x) { $x }; öok($schloß)}, 2, "evaluation"); |
| |
| # monty python |
√ | ok(try {my $møøse; sub bïte {}; 1}, "a møøse once bit my sister"); |
√ | is(try {my $møøse = 2; sub såck ($x) { $x }; såck($møøse)}, 2, |
| "møøse bites kan be preti nasti"); |
| |
| # french |
√ | ok(try {my $un_variable_français; sub blâ {}; 1}, "french declaration"); |
√ | is(try {my $frénch = 2; sub bléch ($x) { $x }; bléch($frénch)}, 2, "evaluation"); |
| |
| # Some Chinese Characters |
√ | ok(try {my $一; 1}, "chinese declaration"); |
√ | is(try {my $二 = 2; sub 恆等($x) {$x}; 恆等($二)}, 2, "evaluation"); |
| |
| # Tibetan Characters |
√ | ok(try {my $ཀ; 1}, "tibetan declaration"); |
√ | is(try {my $ཁ = 2; $ཁ}, 2, "evaluation"); |
| |
| # Japanese |
√ | ok(try {my $い; 1}, "japanese declaration"); |
√ | is(try {my $に = 2; $に}, 2, "evaluation"); |
| |
| # arabic |
√ | ok(try {my $الصفحة ; 1}, "arabic declaration"); |
√ | is(try {my $الصفحة = 2; $الصفحة}, 2, "evaluation"); |
| |
| # hebrew |
√ | ok(try {my $פוו; sub לה {}; 1}, "hebrew declaration"); |
√ | is(try {my $באר = 2; sub זהות ($x) { $x }; זהות($באר)}, 2, "evaluation"); |
| |
| # magyar |
√ | ok(try {my $aáeéiíoóöőuúüű ; 1}, "magyar declaration"); |
√ | is(try {my $áéóőöúűüí = 42; sub űáéóőöúüí ($óőöú) { $óőöú }; űáéóőöúüí($áéóőöúűüí)}, |
| 42, "evaluation"); |
| |
| # russian |
√ | ok(try {my $один; sub раз {}; 1}, "russian declaration"); |
√ | is( |
| try {my $два = 2; sub идентичный ($x) { $x }; идентичный($два)}, |
| 2, |
| "evaluation" |
| ); |
| |
√ | ok(try { my $पहला = 1; }, "hindi declaration"); |
√ | is(try { my $दूसरा = 2; sub टोटल ($x) { $x + 2 }; टोटल($दूसरा) }, 4, "evaluation"); |
| |
| # Unicode subs |
| { |
| my sub äöü () { 42 } |
√ | is äöü, 42, "Unicode subs with no parameters"; |
| } |
| { |
| my sub äöü ($x) { 1000 + $x } |
√ | is äöü 17, 1017, "Unicode subs with one parameter (parsed as prefix ops)"; |
| } |
| |
| # Unicode parameters |
| { |
| my sub abc (:$äöü) { 1000 + $äöü } |
| |
√ | is abc(äöü => 42), 1042, "Unicode named params (1)"; |
√ | is abc(:äöü(42)), 1042, "Unicode named params (2)"; |
| } |
| |
| # Unicode placeholder variables |
| { |
√ | is |
| ~(< foostraße barstraße fakestraße >.map:{ ucfirst $^straßenname }), |
| "Foostraße Barstraße Fakestraße", |
| "Unicode placeholder variables"; |
| } |
| |
| # Unicode methods |
| { |
| class Str is also { method äöü { self.ucfirst } }; |
√ | is "pugs".äöü(), "Pugs", "Unicode methods"; |
| } |
From t/spec/S02-whitespace_and_comments/unicode-whitespace.t lines 7–137 (4 √, 22 ×): (skip)
| # L<S02/"Lexical Conventions"/"Unicode horizontal whitespace"> |
| |
√ | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "CHARACTER TABULATION"); |
| |
√ | is(eval(' |
| my |
| @x |
| = |
| <a |
| b |
| c>; |
| sub |
| y |
| (@z) |
| { |
| @z[1] |
| }; |
| y(@x) |
| '), "b", "LINE FEED (LF)"); |
| |
× | is(eval(' |
| my@x=<abc>;suby(@z){@z[1]};y(@x) |
| '), "b", "LINE TABULATION"); |
| |
× | is(eval(' |
| my@x=<abc>;suby(@z){@z[1]};y(@x) |
| '), "b", "FORM FEED (FF)"); |
| |
√ | is(eval(' |
| my
@x
=
<a
b
c>;
sub
y
(@z)
{
@z[1]
};
y(@x) |
| '), "b", "CARRIAGE RETURN (CR)"); |
| |
√ | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "SPACE"); |
| |
× | is(eval(' |
| my
@x
=
<a
b
c>;
sub
y
(@z)
{
@z[1]
};
y(@x) |
| '), "b", "NEXT LINE (NEL)"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "NO-BREAK SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "OGHAM SPACE MARK"); |
| |
× | is(eval(' |
| my@x=<abc>;suby(@z){@z[1]};y(@x) |
| '), "b", "MONGOLIAN VOWEL SEPARATOR"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "EN QUAD"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "EM QUAD"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "EN SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "EM SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "THREE-PER-EM SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "FOUR-PER-EM SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "SIX-PER-EM SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "FIGURE SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "PUNCTUATION SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "THIN SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "HAIR SPACE"); |
| |
× | is(eval(' |
| my
@x
=
<a
b
c>;
sub
y
(@z)
{
@z[1]
};
y(@x) |
| '), "b", "LINE SEPARATOR"); |
| |
× | is(eval(' |
| my
@x
=
<a
b
c>;
sub
y
(@z)
{
@z[1]
};
y(@x) |
| '), "b", "PARAGRAPH SEPARATOR"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "NARROW NO-BREAK SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "MEDIUM MATHEMATICAL SPACE"); |
| |
× | is(eval(' |
| my @x = <a b c>; sub y (@z) { @z[1] }; y(@x) |
| '), "b", "IDEOGRAPHIC SPACE"); |
| |
| #Long dot whitespace tests |
| #These currently get different results than the above |
| |
| class Str is also { |
| method id($x:) { $x } |
| } |
| |
| #This makes 'foo.id' and 'foo .id' mean different things |
| multi foo() { 'a' } |
| multi foo($x) { $x } |
| |
| $_ = 'b'; |
| |
From t/spec/S02-whitespace_and_comments/unicode-whitespace.t lines 138–165 (23 √, 3 ×): (skip)
| # L<S02/"Lexical Conventions"/"Unicode horizontal whitespace"> |
√ | is(eval('foo\ .id'), 'a', 'long dot with CHARACTER TABULATION'); |
√ | is(eval('foo\ |
| .id'), 'a', 'long dot with LINE FEED (LF)'); |
√ | is(eval('foo\.id'), 'a', 'long dot with LINE TABULATION'); |
√ | is(eval('foo\.id'), 'a', 'long dot with FORM FEED (FF)'); |
√ | is(eval('foo\
.id'), 'a', 'long dot with CARRIAGE RETURN (CR)'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with SPACE'); |
× | is(eval('foo\
.id'), 'a', 'long dot with NEXT LINE (NEL)'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with NO-BREAK SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with OGHAM SPACE MARK'); |
√ | is(eval('foo\.id'), 'a', 'long dot with MONGOLIAN VOWEL SEPARATOR'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with EN QUAD'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with EM QUAD'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with EN SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with EM SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with THREE-PER-EM SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with FOUR-PER-EM SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with SIX-PER-EM SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with FIGURE SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with PUNCTUATION SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with THIN SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with HAIR SPACE'); |
× | is(eval('foo\
.id'), 'a', 'long dot with LINE SEPARATOR'); |
× | is(eval('foo\
.id'), 'a', 'long dot with PARAGRAPH SEPARATOR'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with NARROW NO-BREAK SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with MEDIUM MATHEMATICAL SPACE'); |
√ | is(eval('foo\ .id'), 'a', 'long dot with IDEOGRAPHIC SPACE'); |
From t/operators/quoting.t lines 26–96 (7 √, 0 ×): (skip)
| # L<S02/Lexical Conventions/"bidirectional mirrorings" or "Ps/Pe properties"> |
| { |
| my $s = q{ foo bar }; |
√ | is $s, ' foo bar ', 'string using q{}'; |
| } |
| |
| { |
| my $s = q「this is a string」; |
√ | is $s, 'this is a string', |
| 'q-style string with LEFT/RIGHT CORNER BRACKET'; |
| } |
| |
| { |
| my $s = q『blah blah blah』; |
√ | is $s, 'blah blah blah', |
| 'q-style string with LEFT/RIGHT WHITE CORNER BRACKET'; |
| } |
| |
| { |
| my $s = q⦍blah blah blah⦎; |
| is $s, 'blah blah blah', |
| 'q-style string with LEFT SQUARE BRACKET WITH TICK IN TOP CORNER and |
| RIGHT SQUARE BRACKET WITH TICK IN BOTTOM CORNER(U+298D/U+298E)'; |
| } |
| |
| { |
| my $s = q〝blah blah blah〞; |
| is $s, 'blah blah blah', |
| 'q-style string with REVERSED DOUBLE PRIME QUOTATION MARK and |
| DOUBLE PRIME QUOTATION MARK(U+301D/U+301E)'; |
| } |
| |
| { |
| my %ps_pe = ( |
| '(' => ')', '[' => ']', '{' => '}', '༺' => '༻', '༼' => '༽', |
| '᚛' => '᚜', '⁅' => '⁆', '⁽' => '⁾', '₍' => '₎', '〈' => '〉', |
| '❨' => '❩', '❪' => '❫', '❬' => '❭', '❮' => '❯', '❰' => '❱', |
| '❲' => '❳', '❴' => '❵', '⟅' => '⟆', '⟦' => '⟧', '⟨' => '⟩', |
| '⟪' => '⟫', '⦃' => '⦄', '⦅' => '⦆', '⦇' => '⦈', '⦉' => '⦊', |
| '⦋' => '⦌', '⦍' => '⦎', '⦏' => '⦐', '⦑' => '⦒', '⦓' => '⦔', |
| '⦕' => '⦖', '⦗' => '⦘', '⧘' => '⧙', '⧚' => '⧛', '⧼' => '⧽', |
| '〈' => '〉', '《' => '》', '「' => '」', '『' => '』', |
| '【' => '】', '〔' => '〕', '〖' => '〗', '〘' => '〙', |
| '〚' => '〛', '〝' => '〞', '﴾' => '﴿', '︗' => '︘', '︵' => '︶', |
| '︷' => '︸', '︹' => '︺', '︻' => '︼', '︽' => '︾', |
| '︿' => '﹀', '﹁' => '﹂', '﹃' => '﹄', '﹇' => '﹈', |
| '﹙' => '﹚', '﹛' => '﹜', '﹝' => '﹞', '(' => ')', |
| '[' => ']', '{' => '}', '⦅' => '⦆', '「' => '」', |
| ); |
| for keys %ps_pe { |
| next if $_ eq '('; # skip '(' => ')' because q() is a sub call |
| my $string = 'q' ~ $_ ~ 'abc' ~ %ps_pe{$_}; |
√ | is eval($string), 'abc', $string; |
| } |
| } |
| |
| { |
| my @list = 'a'..'c'; |
| |
| my $var = @list[ q(2) ]; |
√ | is $var, 'c', |
| 'q-style string with FULLWIDTH LEFT/RIGHT PARENTHESIS'; |
| |
| $var = @list[ q《0》]; |
√ | is $var, 'a', |
| 'q-style string with LEFT/RIGHT DOUBLE ANGLE BRACKET'; |
| |
| $var = @list[q〈1〉]; |
√ | is $var, 'b', 'q-style string with LEFT/RIGHT ANGLE BRACKET'; |
| } |
| |
In practice, though, you're safest using matching characters with Ps/Pe properties, though ASCII angle brackets are a notable exception, since they're bidirectional but not in the Ps/Pe set.
Characters with no corresponding closing character do not qualify as opening brackets. This includes the second section of the Unicode BidiMirroring data table, as well as U+201A and U+201E.
If a character is already used in Ps/Pe mappings, then any entry in BidiMirroring is ignored (both forward and backward mappings). For any given Ps character, the next Pe codepoint (in numerical order) is assumed to be its matching character even if that is not what you might guess using left-right symmetry. Therefore U+298D maps to U+298E, not U+2990, and U+298F maps to U+2990, not U+298E. Neither U+298E nor U+2990 are valid bracket openers, despite having reverse mappings in the BidiMirroring table.
The U+301D codepoint has two closing alternatives, U+301E and U+301F; Perl 6 only recognizes the one with lower code point number, U+301E, as the closing brace. This policy also applies to new one-to-many mappings introduced in the future.
From t/spec/S02-whitespace_and_comments/unspace.t lines 250–253 (no results): (skip)
| # L<S02/"Lexical Conventions"/"U+301D codepoint has two closing alternatives">
|
| is(eval('foo\#〝 comment 〞.id'), 'a', 'unspace with U+301D/U+301E comment');
|
| is(eval('foo\#〝 comment 〟.id'), undef, 'unspace with U+301D/U+301F is invalid');
|
|
|
=begin comment and =end comment delimit a POD block correctly without the need for =cut. (In fact, =cut is now gone.) The format name does not have to be comment -- any unrecognized format name will do to make it a comment. (However, bare =begin and =end probably aren't good enough, because all comments in them will show up in the formatted output.)
From t/syntax/comments.t lines 159–191 (2 √, 0 ×): (skip)
| # L<S02/"Whitespace and Comments"/POD sections may be> |
| { |
| my $a = eval q{ |
| my $var = 1; |
| |
| =begin comment |
| |
| This is a comment with |
| a "=cut". |
| |
| =cut |
| |
| "foo"; |
| }; |
√ | is $a, 'foo', '=begin comment with =cut works'; |
| } |
| |
| { |
| my $a = eval q{ |
| my $var = 1; |
| |
| =begin comment |
| |
| This is a comment without |
| a "=cut". |
| |
| =end comment |
| |
| "bar"; |
| }; |
√ | is $a, 'bar', '=begin comment without =cut works'; |
| } |
| |
We have single paragraph comments with =for comment as well. That lets =for keep its meaning as the equivalent of a =begin and =end combined. As with =begin and =end, a comment started in code reverts to code afterwards.
From t/syntax/comments.t lines 192–215 (2 √, 0 ×): (skip)
| # L<S02/Whitespace and Comments/"single paragraph comments" |
| # =for comment> |
| |
| { |
√ | is eval(q{ |
| my $var = 1; |
| |
| =for comment TimToady is here! |
| |
| 32; |
| }), 32, '=for comment works'; |
| } |
| |
| { |
√ | is eval(q{ |
| my $var = 1; |
| |
| =for comment TimToady and audreyt |
| are both here, yay! |
| |
| 17; |
| }), 17, '=for comment works'; |
| } |
| |
Since there is a newline before the first =, the POD form of comment counts as whitespace equivalent to a newline.
# character always introduces a comment in Perl 6. There are two forms of comment based on #. Embedded comments require the # to be followed by one or more opening bracketing characters.
All other uses of # are interpreted as single-line comments that work just as in Perl 5, starting with a # character and ending at the subsequent newline. They count as whitespace equivalent to newline for purposes of separation. Unlike in Perl 5, # may not be used as the delimiter in quoting constructs.
From t/syntax/comments.t lines 147–158 (4 √, 0 ×): (skip)
| # L<S02/Whitespace and Comments/"# may not be used as" |
| # delimiter quoting> |
| { |
| my $a; |
√ | ok eval '$a = q{ 32 }', 'sanity check'; |
√ | is $a, ' 32 ', 'sanity check'; |
| |
| $a = undef; |
√ | ok !eval '$a = q# 32 #;', 'misuse of # as quote delimiters'; |
√ | is $a, undef, "``#'' can't be used as quote delimiters"; |
| } |
| |
# plus any user-selected bracket characters (as defined in "Lexical Conventions" above):
From t/syntax/comments.t lines 9–49 (9 √, 0 ×): (skip)
| # L<S02/"Whitespace and Comments"/"Embedded comments" |
| # "#" plus any bracket> |
| { |
| |
√ | ok #[ |
| Multiline |
| comments |
| is fine |
| ] 1, 'multiline embedded comment with #[]'; |
| |
√ | ok #( |
| Parens works also |
| ) 1, 'multiline embedded comment with #()'; |
| |
√ | ok eval("2 * 3\n #<<<\n comment>>>"), "multiline comment with <<<"; |
| |
| my $var = #{ foo bar } 32; |
√ | is $var, 32, 'embedded comment with #{}'; |
| |
| $var = 3 + #「 this is a comment 」 56; |
√ | is $var, 59, 'embedded comment with LEFT/RIGHT CORNER BRACKET'; |
| |
| $var = 2 #『 blah blah blah 』 * 3; |
√ | is $var, 6, 'embedded comment with LEFT/RIGHT WHITE CORNER BRACKET'; |
| |
| my @list = 'a'..'c'; |
| |
| $var = @list[ #(注释)2 ]; |
√ | is $var, 'c', 'embedded comment with FULLWIDTH LEFT/RIGHT PARENTHESIS'; |
| |
| $var = @list[ 0 #《注释》]; |
√ | is $var, 'a', 'embedded comment with LEFT/RIGHT DOUBLE ANGLE BRACKET'; |
| |
| $var = @list[#〈注释〉1]; |
√ | is $var, 'b', 'embedded comment with LEFT/RIGHT ANGLE BRACKET'; |
| |
| # Note that 'LEFT/RIGHT SINGLE QUOTATION MARK' (i.e. ‘’) and |
| # LEFT/RIGHT DOUBLE QUOTATION MARK (i.e. “”) are not valid delimiter |
| # characters. |
| } |
| |
say #( embedded comment ) "hello, world!";
$object\#{ embedded comments }.say;
$object\ #「
embedded comments
」.say;
Brackets may be nested, following the same policy as ordinary quote brackets.
From t/syntax/comments.t lines 78–114 (4 √, 1 ×): (skip)
| # L<S02/"Whitespace and Comments"/"Brackets may be nested"> |
| { |
√ | is 3, #( |
| (Nested parens) works also |
| ) 3, 'nested parens #(...(...)...)'; |
| |
√ | is 3, #{ |
| {Nested braces} works also {} |
| } 3, 'nested braces #{...{...}...}'; |
| } |
| |
| # I am not sure if this is speced somewhere: |
| # comments can be nested |
| { |
√ | is 3, #( |
| comment |
| #{ |
| internal comment |
| } |
| more comment |
| ) 3, 'comments can be nested with different brackets'; |
√ | is 3, #( |
| comment |
| #( |
| internal comment |
| ) |
| more |
| ) 3, 'comments can be nested with same brackets'; |
| |
| # TODO: |
| # ok eval(" #{ comment }") failes with an error as it tries to execute |
| # comment() before seeing that I meant #{ comment within this string. |
| |
× | ok eval(" #<<\n comment\n # >>\n >> 3"), |
| 'single line comment cannot correctly nested within multiline', :todo<bug>; |
| } |
| |
There must be no space between the # and the opening bracket character. (There may be the visual appearance of space for some double-wide characters, however, such as the corner quotes above.)
From t/syntax/comments.t lines 50–59 (4 √, 0 ×): (skip)
| # L<S02/"Whitespace and Comments"/"no space" between "#" and bracket> |
| { |
| |
√ | ok !eval("3 * # (invalid comment) 2"), "no space allowed between '#' and '('"; |
√ | ok !eval("3 * #\t[invalid comment] 2"), "no tab allowed between '#' and '['"; |
√ | ok !eval("3 * # \{invalid comment\} 2"), "no spaces allowed between '#' and '\{'"; |
√ | ok !eval("3 * #\n<invalid comment> 2"), "no spaces allowed between '#' and '<'"; |
| |
| } |
| |
An embedded comment is not allowed as the first thing on the line.
#sub foo # line-end comment
#{ # ILLEGAL, syntax error
# ...
#}
If you wish to have a comment there, you must disambiguate it to either an embedded comment or a line-end comment. You can put a space in front of it to make it an embedded comment:
#sub foo # line end comment
#{ # okay, comment
... # extends
} # to here
Or you can put something other than a single # to make it a line-end comment. Therefore, if you are commenting out a block of code using the line-comment form, we recommend that you use ##, or # followed by some whitespace, preferably a tab to keep any tab formatting consistent:
##sub foo
##{ # okay
## ...
##}
# sub foo
# { # okay
# ...
# }
# sub foo
# { # okay
# ...
# }
However, it's often better to use pod comments because they are implicitly line-oriented. And if you have an intelligent syntax highlighter that will mark pod comments in a different color, there's less visual need for a # on every line.
From t/syntax/comments.t lines 60–77 (4 √, 0 ×): (skip)
| # L<S02/"Whitespace and Comments"/"closed by" "same number of" |
| # "closing brackets"> |
| { |
| |
√ | ok #<<< |
| Or this <also> works... |
| >>> 1, '#<<<...>>>'; |
| |
| my $var = \#((( comment ))) 12; |
√ | is $var, 12, '#(((...)))'; |
| |
| $var #<< < >> = 25; |
√ | is $var, 25, '#<< < >>'; |
| |
| $var = #<< > >> 36; |
√ | is $var, 36, '#<< > >>'; |
| } |
| |
From t/syntax/comments.t lines 115–127 (2 √, 0 ×): (skip)
| # L<S02/"Whitespace and Comments"/"Counting of nested brackets" |
| # "applies only to" "pairs of brackets of the same length"> |
| { |
√ | is -1 #<<< |
| Even <this> <<< also >>> works... |
| >>>, -1, 'nested brackets in embedded comment'; |
| |
√ | is 'cat', #{{ |
| This comment contains unmatched } and { { { { (ignored) |
| Plus a nested {{ ... }} pair (counted) |
| }} 'cat', 'embedded comments with nested/unmatched bracket chars'; |
| } |
| |
say #{{
This comment contains unmatched } and { { { { (ignored)
Plus a nested {{ ... }} pair (counted)
}} q<< <<woot>> >> # says " <<woot>> "
Note however that bare circumfix or postcircumfix <<...>> is not a user-selected bracket, but the ASCII variant of the «...» interpolating word list. Only # and the q-style quoters (including m, s, tr, and rx) enable subsequent user-selected brackets.
\. This is known as the "unspace". An unspace can suppress any of several whitespace dependencies in Perl. For example, since Perl requires an absence of whitespace between a noun and a postfix operator, using unspace lets you line up postfix operators:
From t/spec/S02-whitespace_and_comments/unspace.t lines 7–167 (no results): (skip)
| # L<S02/"Whitespace and Comments"/This is known as the "unspace">
|
|
|
| is(4\ .sqrt, 2, 'unspace with numbers');
|
| is(4\#(quux).sqrt, 2, 'unspace with comments');
|
| is("x"\ .bytes, 1, 'unspace with strings');
|
| is("x"\ .bytes(), 1, 'unspace with strings + parens');
|
|
|
| my $foo = 4;
|
| is(eval('$foo.++'), 4, '(short) unspace with postfix inc');
|
| is($foo, 5, '(short) unspace with postfix inc really postfix');
|
| is(eval('$foo\ .++'), 5, 'unspace with postfix inc');
|
| is($foo, 6, 'unspace with postfix inc really postfix');
|
| is(eval('$foo\ .--'), 6, 'unspace with postfix dec');
|
| is($foo, 5, 'unspace with postfix dec really postfix');
|
|
|
| is("xxxxxx"\.bytes, 6, 'unspace without spaces');
|
| is("xxxxxx"\
|
| .bytes, 6, 'unspace with newline');
|
|
|
| is((:foo\ .("bar")), ('foo' => "bar"), 'unspace with adverb');
|
|
|
| is( ~([1,2,3]\ .[2,1,0]), "3 2 1", 'unspace on postfix subscript');
|
|
|
| my @array = 1,2,3;
|
|
|
| eval "
|
| @array\ .>>++;
|
| @array>>\ .++;
|
| @array\ .>>\ .++;
|
| @array\ .»++;
|
| @array»\ .++;
|
| @array\ .»\ .++;
|
| ";
|
| #?pugs todo 'unimpl'
|
| is( ~@array, "7 8 9", 'unspace with postfix hyperops');
|
|
|
|
|
| #Test the "unspace" and unspace syntax
|
|
|
| class Str is also {
|
| method id($x:) { $x }
|
| }
|
|
|
| #This makes 'foo.id' and 'foo .id' mean different things
|
| multi foo() { 'a' }
|
| multi foo($x) { $x }
|
|
|
| #This should do the same, but currently doesn't
|
| sub bar($x? = 'a') { $x }
|
|
|
| $_ = 'b';
|
|
|
| #XXX why is eval required here?
|
| is(eval('foo.id'), 'a', 'sanity - foo.id');
|
| is(eval('foo .id'), 'b', 'sanity - foo .id');
|
| is(eval('bar.id'), 'a', 'sanity - bar.id');
|
| is(eval('bar .id'), 'b', 'sanity - bar .id');
|
| is(eval('foo\.id'), 'a', 'short unspace');
|
| is(eval('foo\ .id'), 'a', 'unspace');
|
| is(eval('foo \ .id'), 'b', 'not a unspace');
|
| is(eval('fo\ o.id'), undef, 'unspace not allowed in identifier');
|
| is(eval('foo\ .id'), 'a', 'longer dot');
|
| is(eval('foo\#( comment ).id'), 'a', 'unspace with embedded comment');
|
| is(eval('foo\#\ ( comment ).id'), undef, 'unspace can\'t hide space between # and opening bracket');
|
| is(eval('foo\ # comment
|
| .id'), 'a', 'unspace with end-of-line comment');
|
| is(eval(':foo\ <bar>'), (:foo<bar>), 'unspace in colonpair');
|
| is(eval('foo\ .\ ("x")'), 'x', 'unspace is allowed both before and after method .');
|
| is(eval('foo\
|
| =begin comment
|
| blah blah blah
|
| =end comment
|
| .id'), 'a', 'unspace with pod =begin/=end comment');
|
| is(eval('foo\
|
| =for comment
|
| blah
|
| blah
|
| blah
|
|
|
| .id'), 'a', 'unspace with pod =for comment');
|
| is(eval('foo\
|
| =comment blah blah blah
|
| .id'), 'a', 'unspace with pod =comment');
|
| #This is pretty strange: according to Perl-6.0.0-STD.pm,
|
| #unspace is allowed after a pod = ... which means pod is
|
| #syntactically recursive, i.e. you can put pod comments
|
| #inside pod directives recursively!
|
| is(eval('foo\
|
| =\ begin comment
|
| blah blah blah
|
| =\ end comment
|
| .id'), 'a', 'unspace with pod =begin/=end comment w/ pod unspace');
|
| is(eval('foo\
|
| =\ for comment
|
| blah
|
| blah
|
| blah
|
|
|
| .id'), 'a', 'unspace with pod =for comment w/ pod unspace');
|
| is(eval('foo\
|
| =\ comment blah blah blah
|
| .id'), 'a', 'unspace with pod =comment w/ pod unspace');
|
| is(eval('foo\
|
| =\
|
| =begin nested pod
|
| blah blah blah
|
| =end nested pod
|
| begin comment
|
| blah blah blah
|
| =\
|
| =begin nested pod
|
| blah blah blah
|
| =end nested pod
|
| end comment
|
| .id'), 'a', 'unspace with pod =begin/=end comment w/ pod-in-pod');
|
| is(eval('foo\
|
| =\
|
| =for nested pod
|
| blah
|
| blah
|
| blah
|
|
|
| for comment
|
| blah
|
| blah
|
| blah
|
|
|
| .id'), 'a', 'unspace with pod =for commenti w/ pod-in-pod');
|
| is(eval('foo\
|
| =\
|
| =nested pod blah blah blah
|
| comment blah blah blah
|
| .id'), 'a', 'unspace with pod =comment w/ pod-in-pod');
|
| is(eval('foo\
|
| =\ #1
|
| =\ #2
|
| =\ #3
|
| =comment blah blah blah
|
| for comment #3
|
| blah
|
| blah
|
| blah
|
|
|
| begin comment #2
|
| blah blah blah
|
| =\ #4
|
| =comment blah blah blah
|
| end comment #4
|
| begin comment #1
|
| blah blah blah
|
| =\ #5
|
| =\ #6
|
| =for comment
|
| blah
|
| blah
|
| blah
|
|
|
| comment blah blah blah #6
|
| end comment #5
|
| .id'), 'a', 'hideous nested pod torture test');
|
|
|
%hash\ {$key}
@array\ [$ix]
$subref\($arg)
As a special case to support the use above, a backslash where a postfix is expected is considered a degenerate form of unspace. Note that whitespace is not allowed before that, hence
$subref \($arg)
is a syntax error (two terms in a row). And
foo \($arg)
will be parsed as a list operator with a Capture argument:
foo(\($arg))
However, other forms of unspace may usefully be preceded by whitespace. (Unary uses of backslash may therefore never be followed by whitespace or they would be taken as an unspace.)
Other postfix operators may also make use of unspace:
$number\ ++;
$number\ --;
1+3\ i;
$object\ .say();
$object\#{ your ad here }.say
Another normal use of a you-don't-see-this-space is typically to put a dotted postfix on the next line:
$object\ # comment
.say
$object\#[ comment
].say
$object\
.say
But unspace is mainly about language extensibility: it lets you continue the line in any situation where a newline might confuse the parser, regardless of your currently installed parser. (Unless, of course, you override the unspace rule itself...)
Although we say that the unspace hides the whitespace from the parser, it does not hide whitespace from the lexer. As a result, unspace is not allowed within a token. Additionally, line numbers are still counted if the unspace contains one or more newlines. A # following such a newline is always an end-of-line comment, as described above. Since Pod chunks count as whitespace to the language, they are also swallowed up by unspace. Heredoc boundaries are suppressed, however, so you can split excessively long heredoc intro lines like this:
ok(q:to'CODE', q:to'OUTPUT', \
"Here is a long description", \ # --more--
todo(:parrøt<0.42>, :dötnet<1.2>));
...
CODE
...
OUTPUT
To the heredoc parser that just looks like:
ok(q:to'CODE', q:to'OUTPUT', "Here is a long description", todo(:parrøt<0.42>, :dötnet<1.2>));
...
CODE
...
OUTPUT
Note that this is one of those cases in which it is fine to have whitespace before the unspace, since we're only trying to suppress the newline transition, not all whitespace as in the case of postfix parsing. (Note also that the example above is not meant to spec how the test suite works. :)
From t/syntax/comments.t lines 140–146 (2 √, 0 ×): (skip)
| # L<S02/Whitespace and Comments/"comment may not contain an unspace"> |
| { |
| my $a; |
√ | ok !eval '$a = #\ (comment) 32', "comments can't contain unspace"; |
√ | is $a, undef, '$a remains undef'; |
| } |
| |
#\ (...
it is an end-of-line comment, not an embedded comment. Write:
\ #(
...
)
to mean the other thing.
This is an unchanging deep rule, but the surface ramifications of it change as various operators and macros are added to or removed from the language, which we expect to happen because Perl 6 is designed to be a mutable language. In particular, there is a natural conflict between postfix operators and infix operators, either of which may occur after a term. If a given token may be interpreted as either a postfix operator or an infix operator, the infix operator requires space before it. Postfix operators may never have intervening space, though they may have an intervening dot. If further separation is desired, an embedded comment may be used as described above, as long as no whitespace occurs outside the embedded comment.
From t/spec/S02-whitespace_and_comments/unspace.t lines 191–249 (no results): (skip)
| # L<S02/"Whitespace and Comments"/"natural conflict between postfix operators and infix operators">
|
| #This creates syntactic ambuguity between
|
| #($n) ++ ($m)
|
| #($n++) $m
|
| #($n) (++$m)
|
| #($n) + (+$m)
|
|
|
| my $n = 1;
|
| my $m = 2;
|
| sub infix:<++>($x, $y) { 42 }
|
|
|
| #'$n++$m' should be a syntax error
|
| is(eval('$n++$m'), undef, 'infix requires space when ambiguous with postfix');
|
| is($n, 1, 'check $n');
|
| is($m, 2, 'check $m');
|
|
|
| #'$n ++$m' should be infix:<++>
|
| #no, really: http://moritz.faui2k3.org/irclog/out.pl?channel=perl6;date=2007-05-09#id_l328
|
| $n = 1; $m = 2;
|
| is(eval('$n ++$m'), 42, '$n ++$m with infix:<++> is $n ++ $m');
|
| is($n, 1, 'check $n');
|
| is($m, 2, 'check $m');
|
|
|
| #'$n ++ $m' should be infix:<++>
|
| $n = 1; $m = 2;
|
| is(eval('$n ++ $m'), 42, 'postfix requires no space w/ infix ambiguity');
|
| is($n, 1, 'check $n');
|
| is($m, 2, 'check $m');
|
|
|
| #These should all be postfix syntax errors
|
| $n = 1; $m = 2;
|
| is(eval('$n.++ $m'), undef, 'postfix dot w/ infix ambiguity');
|
| is(eval('$n\ ++ $m'), undef, 'postfix unspace w/ infix ambiguity');
|
| is(eval('$n\ .++ $m'), undef, 'postfix unspace w/ infix ambiguity');
|
| is($n, 1, 'check $n');
|
| is($m, 2, 'check $m');
|
|
|
| #Unspace inside operator splits it
|
| $n = 1; $m = 2;
|
| is(eval('$n+\ +$m'), 3, 'unspace inside operator splits it');
|
| is($n, 1, 'check $n');
|
| is($m, 2, 'check $m');
|
|
|
| $n = 1;
|
| is(eval('$n ++'), undef, 'postfix requires no space');
|
| is($n, 1, 'check $n');
|
|
|
| $n = 1;
|
| is(eval('$n.++'), 1, 'postfix dot');
|
| is($n, 2, 'check $n');
|
|
|
| $n = 1;
|
| is(eval('$n\ ++'), 1, 'postfix unspace');
|
| is($n, 2, 'check $n');
|
|
|
| $n = 1;
|
| is(eval('$n\ .++'), 1, 'postfix unspace');
|
| is($n, 2, 'check $n');
|
|
|
For instance, if you were to add your own infix:<++> operator, then it must have space before it. The normal autoincrementing postfix:<++> operator may never have space before it, but may be written in any of these forms:
From t/01-sanity/02-counter.t lines 1–19 (no results): (skip)
| # L<S02/"Whitespace and Comments"/space before it, but may be written in any> |
| use v6-alpha; |
| |
| # Checking that testing is sane: counted tests |
| |
| |
| say '1..4'; |
| |
| my $counter = 1; |
| say "ok $counter"; |
| |
| $counter++; |
| say "ok $counter"; |
| |
| ++$counter; |
| say 'ok ', $counter; |
| |
| ++$counter; |
| say 'ok ' ~ $counter; |
$x++
$x.++
$x\ ++
$x\ .++
$x\#( comment ).++
$x\#((( comment ))).++
$x\
.++
$x\ # comment
# inside unspace
.++
$x\ # comment
# inside unspace
++ # (but without the optional postfix dot)
$x\#『 comment
more comment
』.++
$x\#[ comment 1
comment 2
=begin podstuff
whatever (pod comments ignore current parser state)
=end podstuff
comment 3
].++
A consequence of the postfix rule is that (except when delimiting a quote or terminating an unspace) a dot with whitespace in front of it is always considered a method call on $_ where a term is expected. If a term is not expected at this point, it is a syntax error. (Unless, of course, there is an infix operator of that name beginning with dot. You could, for instance, define a Fortranly infix:<.EQ.> if the fit took you. But you'll have to be sure to always put whitespace in front of it, or it would be interpreted as a postfix method call instead.)
For example,
foo .method
and
foo
.method
will always be interpreted as
foo $_.method
but never as
foo.method
Use some variant of
foo\
.method
if you mean the postfix method call.
One consequence of all this is that you may no longer write a Num as 42. with just a trailing dot. You must instead say either 42 or 42.0. In other words, a dot following a number can only be a decimal point if the following character is a digit. Otherwise the postfix dot will be taken to be the start of some kind of method call syntax, whether long-dotty or not. (The .123 form with a leading dot is still allowed however when a term is expected, and is equivalent to 0.123 rather than $_.123.)
From t/spec/S02-whitespace_and_comments/unspace.t lines 254–259 (no results): (skip)
| # L<S02/"Whitespace and Comments"/".123">
|
| # .123 is equal to 0.123
|
|
|
| is eval(' .123'), 0.123, ' .123 is equal to 0.123';
|
| is eval('.123'), 0.123, '.123 is equal to 0.123';
|
|
|
Array of Int is equivalent to another Array of Int. (Another way to look at it is that the type instantiation "factory" is memoized.) Typename aliases are considered equivalent to the original type.
This name equivalence of parametric types extends only to parameters that can be considered immutable (or that at least can have an immutable snapshot taken of them). Two distinct classes are never considered equivalent even if they have the same attributes because classes are not considered immutable.
According to S12, properties are actually implemented by a kind of mixin mechanism, and such mixins are accomplished by the generation of an individual anonymous class for the object (unless an identical anonymous class already exists and can safely be shared).
From t/spec/S02-builtin_data_types/type.t lines 10–36 (no results): (skip)
| # L<S02/"Built-In Data Types"/"A variable's type is a constraint indicating what sorts">
|
|
|
| plan 8;
|
|
|
| #?rakudo skip 'unimpl: try block does not return its value yet'
|
| {
|
| ok(try{my Int $foo; 1}, 'compile my Int $foo');
|
| ok(try{my Str $bar; 1}, 'compile my Str $bar');
|
| }
|
|
|
| ok(do{my Int $foo; $foo ~~ Int}, 'Int $foo isa Int');
|
| ok(do{my Str $bar; $bar ~~ Str}, 'Str $bar isa Str');
|
|
|
| my Int $foo;
|
| my Str $bar;
|
|
|
| #?rakudo skip 'type checking unimpl'
|
| {
|
| #?pugs 1 todo
|
| is(try{$foo = 'xyz'}, undef, 'Int restricts to integers');
|
| is(try{$foo = 42}, 42, 'Int is an integer');
|
|
|
| #?pugs 1 todo
|
| is(try{$bar = 42}, undef, 'Str restricts to strings');
|
| is(try{$bar = 'xyz'}, 'xyz', 'Str is a strings');
|
| }
|
|
|
# $x can contain only Int objects
my Int $x;
A variable may itself be bound to a container type that specifies how the container works, without specifying what kinds of things it contains.
# $x is implemented by the MyScalar class
my $x is MyScalar;
Constraints and container types can be used together:
# $x can contain only Int objects,
# and is implemented by the MyScalar class
my Int $x is MyScalar;
Note that $x is also initialized to ::Int. See below for more on this.
my Dog $spot by itself does not automatically call a Dog constructor. It merely assigns an undefined Dog prototype object to $spot:
my Dog $spot; # $spot is initialized with ::Dog
my Dog $spot = Dog; # same thing
$spot.defined; # False
say $spot; # "Dog"
Any class name used as a value by itself is an undefined instance of that class's prototype, or protoobject. See S12 for more on that. (Any type name in rvalue context is parsed as a list operator indicating a typecast, but an argumentless one of these degenerates to a typecast of undef, producing the protoobject.)
To get a real Dog object, call a constructor method such as new:
my Dog $spot .= new;
my Dog $spot = $spot.new; # .= is rewritten into this
You can pass in arguments to the constructor as well:
my Dog $cerberus .= new(heads => 3);
my Dog $cerberus = $cerberus.new(heads => 3); # same thing
my int @array is MyArray;
you are declaring that the elements of @array are native integers, but that the array itself is implemented by the MyArray class. Untyped arrays and hashes are still perfectly acceptable, but have the same performance issues they have in Perl 5.
.elems method. You can also ask for the total string length of an array's elements, in bytes, codepoints or graphemes, using these methods .bytes, .codes or .graphs respectively on the array. The same methods apply to strings as well. (Note that .bytes is not guaranteed to be well-defined when the encoding is unknown. Similarly, .codes is not well-defined unless you know which canonicalization is in effect. Hence, both methods allow an optional argument to specify the meaning exactly if it cannot be known from context.)
From t/builtins/strings/unicode.t lines 6–31 (2 √, 0 ×): (skip)
| #L<S02/"Built-In Data Types"/".bytes, .codes or .graphs"> |
| |
| # LATIN CAPITAL LETTER A, COMBINING GRAVE ACCENT |
| my Str $u = "\x[0041,0300]"; |
| ok $u does utf8, 'can specify Str is utf8'; |
| is $u.bytes, 3, 'combining À is three bytes as utf8'; |
| is $u.codes, 2, 'combining À is two codes'; |
| is $u.graphs, 1, 'combining À is one graph'; |
√ | is "foo\r\nbar".codes, 8, 'CRLF is 2 codes'; |
| is "foo\r\nbar".graphs, 7, 'CRLF is 1 graph'; |
| |
| # Speculation, .chars is unspecced, also use Bytes etc. |
| is $u.chars, 1, '.chars defaults to .graphs'; |
| |
| #?pugs 10 todo '' |
| use_ok 'Bytes', 'use bytes works'; |
√ | is $u.chars, 3, '.chars as bytes'; |
| use_ok 'Codes', 'use codes works'; |
| is $u.chars, 2, '.chars as codes'; |
| use_ok 'Graphs', 'use graphs works'; |
| is $u.chars, 1, '.chars as graphs'; |
| ok $u does utf16, 'can specify Str is utf16'; |
| is $u.bytes, 4, '.bytes in utf16'; |
| ok $u does utf32, 'can specify Str is utf32'; |
| is $u.bytes, 8, '.bytes in utf32'; |
| |
There is no .length method for either arrays or strings, because length does not specify a unit.
Int, Num, Complex, Rat, Str, Bit, Regex, Set, Junction, Code, Block, List, Seq), as well as mutable (container) types, such as Scalar, Array, Hash, Buf, Routine, Module, etc.
Non-object (native) types are lowercase: int, num, complex, rat, buf, bit. Native types are primarily intended for declaring compact array storage. However, Perl will try to make those look like their corresponding uppercase types if you treat them that way. (In other words, it does autoboxing. Note, however, that sometimes repeated autoboxing can slow your program more than the native type can speed it up.)
Some object types can behave as value types. Every object can produce a "WHICH" value that uniquely identifies the object for hashing and other value-based comparisons. Normal objects just use their address in memory, but if a class wishes to behave as a value type, it can define a .WHICH method that makes different objects look like the same object if they happen to have the same contents.
Object, Whatever and Failure objects. See S04 for more about failures (i.e. unthrown exceptions):
my Int $x = undef; # works
Variables with native types do not support undefinedness: it is an error to assign an undefined value to them:
my int $y = undef; # dies
Conjecture: num might support the autoconversion of undef to NaN, since the floating-point form can represent this concept. Might be better to make that conversion optional though, so that the rocket designer can decide whether to self-destruct immediately or shortly thereafter.
Variables of non-native types start out containing an undefined value unless explicitly initialized to a defined value.
HOW function/method that returns the metaclass instance managing it, regardless of whether the object is defined:
'x'.HOW.methods; # get available methods for strings
Str.HOW.methods; # same thing with the prototype object Str
HOW(Str).methods; # same thing as function call
'x'.methods; # this is likely an error - not a meta object
Str.methods; # same thing
(For a prototype system (a non-class-based object system), all objects are merely managed by the same meta object.)
Int automatically supports promotion to arbitrary precision, as well as holding Inf and NaN values. Note that Int assumes 2's complement arithmetic, so +^1 == -2 is guaranteed. (Native int operations need not support this on machines that are not natively 2's complement. You must convert to and from Int to do portable bitops on such ancient hardware.)
From t/data_types/num.t lines 11–22 (4 √, 0 ×):