Synopsis 2: Bits and Pieces
Larry Wall <larry@wall.org>
Created: 10 Aug 2004
Last Modified: 1 Feb 2009
Version: 198
This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl 6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)
From t/spec/S02-one-pass-parsing/less-than.t lines 7–32 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-whitespace_and_comments/one-pass-parsing.t lines 6–11 (no results): (skip)
Highlighted: small|fullTo the extent allowed by sublanguages' parsers, Perl is parsed using a one-pass, predictive parser. That is, lookahead of more than one "longest token" is discouraged. The currently known exceptions to this are where the parser must:
[...] composer.
From t/spec/S02-lexical-conventions/unicode.t lines 7–108 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-whitespace_and_comments/unicode-whitespace.t lines 7–133 (4 √, 22 ×): (skip)
From t/spec/S02-whitespace_and_comments/unicode-whitespace.t lines 134–163 (21 √, 3 ×): (skip)
From t/spec/S02-literals/quoting.t lines 23–56 (no results): (skip)
Highlighted: small|fullIn practice, though, you're safest using matching characters with Ps/Pe/Pi/Pf properties, though ASCII angle brackets are a notable exception, since they're bidirectional but not in the Ps/Pe/Pi/Pf sets.
Characters with no corresponding closing character do not qualify as opening brackets. This includes the second section of the Unicode BidiMirroring data table.
If a character is already used in Ps/Pe/Pi/Pf mappings, then any entry in BidiMirroring is ignored (both forward and backward mappings). For any given Ps character, the next Pe codepoint (in numerical order) is assumed to be its matching character even if that is not what you might guess using left-right symmetry. Therefore U+298D maps to U+298E, not U+2990, and U+298F maps to U+2990, not U+298E. Neither U+298E nor U+2990 are valid bracket openers, despite having reverse mappings in the BidiMirroring table.
The U+301D codepoint has two closing alternatives, U+301E and U+301F; Perl 6 only recognizes the one with lower code point number, U+301E, as the closing brace. This policy also applies to new one-to-many mappings introduced in the future.
From t/spec/S02-whitespace_and_comments/unspace.t lines 268–271 (no results): (skip)
Highlighted: small|fullHowever, many-to-one mappings are fine; multiple opening characters may map to the same closing character. For instance, U+2018, U+201A, and U+201B may all be used as the opener for the U+2019 closer. Constructs that count openers and closers assume that only the given opener is special. That is, if you open with one of the alternatives, all other alternatives are treated as non-bracketing characters within that construct.
From t/spec/S02-literals/listquote-whitespace.t lines 5–66 (no results): (skip)
Highlighted: small|full=begin comment and =end comment delimit a Pod block correctly without the need for =cut. (In fact, =cut is now gone.) The format name does not have to be comment -- any unrecognized format name will do to make it a comment. (However, bare =begin and =end probably aren't good enough, because all comments in them will show up in the formatted output.)
We have single paragraph comments with =for comment as well. That lets =for keep its meaning as the equivalent of a =begin and =end combined. As with =begin and =end, a comment started in code reverts to code afterwards.
From t/spec/S02-whitespace_and_comments/comments.t lines 188–214 (no results): (skip)
Highlighted: small|fullSince there is a newline before the first =, the Pod form of comment counts as whitespace equivalent to a newline. See S26 for more on embedded documentation.
# character always introduces a comment in Perl 6. There are two forms of comment based on #. Embedded comments require the # to be followed by a backtick (`) plus one or more opening bracketing characters.
All other uses of # are interpreted as single-line comments that work just as in Perl 5, starting with a # character and ending at the subsequent newline. They count as whitespace equivalent to newline for purposes of separation. Unlike in Perl 5, # may not be used as the delimiter in quoting constructs.
From t/spec/S02-whitespace_and_comments/comments.t lines 157–168 (no results): (skip)
Highlighted: small|full#` plus any user-selected bracket characters (as defined in "Lexical Conventions" above):
From t/spec/S02-whitespace_and_comments/comments.t lines 9–60 (no results): (skip)
Highlighted: small|fullsay #`( embedded comment ) "hello, world!";
$object\#`{ embedded comments }.say;
$object\ #`「
embedded comments
」.say;
Brackets may be nested, following the same policy as ordinary quote brackets.
From t/spec/S02-whitespace_and_comments/comments.t lines 87–126 (no results): (skip)
Highlighted: small|fullThere must be no space between the #` and the opening bracket character. (There may be the visual appearance of space for some double-wide characters, however, such as the corner quotes above.)
From t/spec/S02-whitespace_and_comments/comments.t lines 61–70 (no results): (skip)
Highlighted: small|fullFor multiline comments it is recommended (but not required) to use two or more brackets both for visual clarity and to avoid relying too much on internal bracket counting heuristics when commenting code that may accidentally miscount single brackets:
#`{{
say "here is an unmatched } character";
}}
However, it's sometimes better to use Pod comments because they are implicitly line-oriented.
From t/spec/S02-whitespace_and_comments/comments.t lines 71–86 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-whitespace_and_comments/comments.t lines 127–140 (no results): (skip)
Highlighted: small|full say #`{{
This comment contains unmatched } and { { { { (ignored)
Plus a nested {{ ... }} pair (counted)
}} q<< <<woot>> >> # says " <<woot>> "
Note however that bare circumfix or postcircumfix <<...>> is not a user-selected bracket, but the ASCII variant of the «...» interpolating word list. Only #` and the q-style quoters (including m, s, tr, and rx) enable subsequent user-selected brackets.
\. This is known as the "unspace". An unspace can suppress any of several whitespace dependencies in Perl. For example, since Perl requires an absence of whitespace between a noun and a postfix operator, using unspace lets you line up postfix operators:
From t/spec/S02-whitespace_and_comments/unspace.t lines 7–180 (no results): (skip)
Highlighted: small|full %hash\ {$key}
@array\ [$ix]
$subref\($arg)
As a special case to support the use above, a backslash where a postfix is expected is considered a degenerate form of unspace. Note that whitespace is not allowed before that, hence
$subref \($arg)
is a syntax error (two terms in a row). And
foo \($arg)
will be parsed as a list operator with a Capture argument:
foo(\($arg))
However, other forms of unspace may usefully be preceded by whitespace. (Unary uses of backslash may therefore never be followed by whitespace or they would be taken as an unspace.)
Other postfix operators may also make use of unspace:
$number\ ++;
$number\ --;
1+3\ i;
$object\ .say();
$object\#`{ your ad here }.say
Another normal use of a you-don't-see-this-space is typically to put a dotted postfix on the next line:
$object\ # comment
.say
$object\#`[ comment
].say
$object\
.say
But unspace is mainly about language extensibility: it lets you continue the line in any situation where a newline might confuse the parser, regardless of your currently installed parser. (Unless, of course, you override the unspace rule itself...)
Although we say that the unspace hides the whitespace from the parser, it does not hide whitespace from the lexer. As a result, unspace is not allowed within a token. Additionally, line numbers are still counted if the unspace contains one or more newlines. Since Pod chunks count as whitespace to the language, they are also swallowed up by unspace. Heredoc boundaries are suppressed, however, so you can split excessively long heredoc intro lines like this:
ok(q:to'CODE', q:to'OUTPUT', \
"Here is a long description", \ # --more--
todo(:parrøt<0.42>, :dötnet<1.2>));
...
CODE
...
OUTPUT
To the heredoc parser that just looks like:
ok(q:to'CODE', q:to'OUTPUT', "Here is a long description", todo(:parrøt<0.42>, :dötnet<1.2>));
...
CODE
...
OUTPUT
Note that this is one of those cases in which it is fine to have whitespace before the unspace, since we're only trying to suppress the newline transition, not all whitespace as in the case of postfix parsing. (Note also that the example above is not meant to spec how the test suite works. )
From t/spec/S02-whitespace_and_comments/comments.t lines 150–156 (no results): (skip)
Highlighted: small|full#`\ (...
or
#\ `(...
it is an end-of-line comment, not an embedded comment. Write:
\ #`(
...
)
to mean the other thing.
This is an unchanging deep rule, but the surface ramifications of it change as various operators and macros are added to or removed from the language, which we expect to happen because Perl 6 is designed to be a mutable language. In particular, there is a natural conflict between postfix operators and infix operators, either of which may occur after a term. If a given token may be interpreted as either a postfix operator or an infix operator, the infix operator requires space before it. Postfix operators may never have intervening space, though they may have an intervening dot. If further separation is desired, an unspace or embedded comment may be used as described above, as long as no whitespace occurs outside the unspace or embedded comment.
From t/spec/S02-whitespace_and_comments/unspace.t lines 207–267 (no results): (skip)
Highlighted: small|fullFor instance, if you were to add your own infix:<++> operator, then it must have space before it. The normal autoincrementing postfix:<++> operator may never have space before it, but may be written in any of these forms:
From t/01-sanity/02-counter.t lines 1–19 (no results): (skip)
Highlighted: small|full$x++
$x\++
$x.++
$x\ ++
$x\ .++
$x\#`( comment ).++
$x\#`((( comment ))).++
$x\
.++
$x\ # comment
# inside unspace
.++
$x\ # comment
# inside unspace
++ # (but without the optional postfix dot)
$x\#`『 comment
more comment
』.++
$x\#`[ comment 1
comment 2
=begin Podstuff
whatever (Pod comments ignore current parser state)
=end Podstuff
comment 3
].++
A consequence of the postfix rule is that (except when delimiting a quote or terminating an unspace) a dot with whitespace in front of it is always considered a method call on $_ where a term is expected. If a term is not expected at this point, it is a syntax error. (Unless, of course, there is an infix operator of that name beginning with dot. You could, for instance, define a Fortranly infix:<.EQ.> if the fit took you. But you'll have to be sure to always put whitespace in front of it, or it would be interpreted as a postfix method call instead.)
For example,
foo .method
and
foo
.method
will always be interpreted as
foo $_.method
but never as
foo.method
Use some variant of
foo\
.method
if you mean the postfix method call.
One consequence of all this is that you may no longer write a Num as 42. with just a trailing dot. You must instead say either 42 or 42.0. In other words, a dot following a number can only be a decimal point if the following character is a digit. Otherwise the postfix dot will be taken to be the start of some kind of method call syntax. (The .123 form with a leading dot is still allowed however when a term is expected, and is equivalent to 0.123 rather than $_.123.)
From t/spec/S02-whitespace_and_comments/unspace.t lines 272–279 (no results): (skip)
Highlighted: small|fullFrom t/spec/S06-signature/sub-ref.t lines 5–117 (no results): (skip)
Highlighted: small|fullArray of Int is equivalent to another Array of Int. (Another way to look at it is that the type instantiation "factory" is memoized.) Typename aliases are considered equivalent to the original type. In particular, the Array of Int syntax is just sugar for Array:of(Int), which is the canonical form of an instantiated generic type.
This name equivalence of parametric types extends only to parameters that can be considered immutable (or that at least can have an immutable snapshot taken of them). Two distinct classes are never considered equivalent even if they have the same attributes because classes are not considered immutable.
According to S12, properties are actually implemented by a kind of mixin mechanism, and such mixins are accomplished by the generation of an individual anonymous class for the object (unless an identical anonymous class already exists and can safely be shared).
From t/spec/S02-builtin_data_types/type.t lines 10–44 (no results): (skip)
Highlighted: small|full # $x can contain only Int objects
my Int $x;
A variable may itself be bound to a container type that specifies how the container works, without specifying what kinds of things it contains.
# $x is implemented by the MyScalar class
my $x is MyScalar;
Constraints and container types can be used together:
# $x can contain only Int objects,
# and is implemented by the MyScalar class
my Int $x is MyScalar;
Note that $x is also initialized to the Int type object. See below for more on this.
my Dog $spot by itself does not automatically call a Dog constructor. It merely assigns an undefined Dog prototype object to $spot:
my Dog $spot; # $spot is initialized with ::Dog
my Dog $spot = Dog; # same thing
$spot.defined; # False
say $spot; # "Dog"
Any type name used as a value is an undefined instance of that type's prototype object, or type object for short. See S12 for more on that.
Any type name in rvalue context is parsed as a single type value and expects no arguments following it. However, a type object responds to the function call interface, so you may use the name of a type with parentheses as if it were a function, and any argument supplied to the call is coerced to the type indicated by the type object. If there is no argument in the parentheses, the type object returns itself:
my $type = Num; # type object as a value
$num = $type($string) # coerce to Num
To get a real Dog object, call a constructor method such as new:
my Dog $spot .= new;
my Dog $spot = $spot.new; # .= is rewritten into this
You can pass in arguments to the constructor as well:
my Dog $cerberus .= new(heads => 3);
my Dog $cerberus = $cerberus.new(heads => 3); # same thing
my int @array is MyArray;
you are declaring that the elements of @array are native integers, but that the array itself is implemented by the MyArray class. Untyped arrays and hashes are still perfectly acceptable, but have the same performance issues they have in Perl 5.
.elems method. You can also ask for the total string length of an array's elements, in bytes, codepoints or graphemes, using these methods .bytes, .codes or .graphs respectively on the array. The same methods apply to strings as well. (Note that .bytes is not guaranteed to be well-defined when the encoding is unknown. Similarly, .codes is not well-defined unless you know which canonicalization is in effect. Hence, both methods allow an optional argument to specify the meaning exactly if it cannot be known from context.)
From t/spec/S02-builtin_data_types/unicode.t lines 6–28 (no results): (skip)
Highlighted: small|fullThere is no .length method for either arrays or strings, because length does not specify a unit.
Int, Num, Complex, Rat, Str, Bit, Regex, Set, Block, Iterator, Seq), as well as mutable (container) types, such as Scalar, Array, Hash, Buf, Routine, Module, and non-instantiable Roles such as Callable, Failure, and Integral.
From t/spec/S02-builtin_data_types/declare.t lines 9–504 (no results): (skip)
Highlighted: small|fullNon-object (native) types are lowercase: int, num, complex, rat, buf, bit. Native types are primarily intended for declaring compact array storage. However, Perl will try to make those look like their corresponding uppercase types if you treat them that way. (In other words, it does autoboxing. Note, however, that sometimes repeated autoboxing can slow your program more than the native type can speed it up.)
The junction type is considered a native type because its internal representation is fixed, and you may not usefully derive from it because the intent of junctions is to autothread any method calls on them.
Some object types can behave as value types. Every object can produce a "WHICH" value that uniquely identifies the object for hashing and other value-based comparisons. Normal objects just use their address in memory, but if a class wishes to behave as a value type, it can define a .WHICH method that makes different objects look like the same object if they happen to have the same contents.
Mu, Whatever and Failure objects. See S04 for more about failures (i.e. unthrown exceptions):
my Int $x = Int; # works
Variables with native types do not support undefinedness: it is an error to assign an undefined value to them:
From t/spec/S02-builtin_data_types/type.t lines 45–55 (no results): (skip)
Highlighted: small|fullmy int $y = Int; # dies
Since num can support the value NaN but not the general concept of undefinedness, you can coerce an undefined value like this:
my num = computation() // NaN;
Variables of non-native types start out containing an undefined value unless explicitly initialized to a defined value.
HOW function/method that returns the metaclass instance managing it, regardless of whether the object is defined:
'x'.HOW.methods('x'); # get available methods for strings
Str.HOW.methods(Str); # same thing with the prototype object Str
HOW(Str).methods(Str); # same thing as function call
'x'.methods; # this is likely an error - not a meta object
Str.methods; # same thing
(For a prototype system (a non-class-based object system), all objects are merely managed by the same meta object.)
Numeric role, and all string types perform the Stringy role, but there's no such thing as a "Numeric" object, since these are generic types that must be instantiated with extra arguments to produce normal object types. Common roles include:
Stringy
Numeric
Real
Integral
Rational
Callable
Positional
Associative
Int automatically supports promotion to arbitrary precision, as well as holding Inf and NaN values. Note that Int assumes 2's complement arithmetic, so +^1 == -2 is guaranteed. (Native int operations need not support this on machines that are not natively 2's complement. You must convert to and from Int to do portable bitops on such ancient hardware.)
From t/spec/S02-builtin_data_types/num.t lines 5–34 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/num.t lines 78–106 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/num.t lines 137–144 (no results): (skip)
Highlighted: small|fullNum must support the largest native floating point format that runs at full speed. It may be bound to an arbitrary precision type, but by default it is the same type as a native num. See below.
Rat supports extended precision rational arithmetic. Dividing two Integral objects using infix:</> produces a a Rat, which is generally usable anywhere a Num is usable, but may also be explicitly cast to Num. (Also, if either side is Num already, infix:</> gives you a Num instead of a Rat.)
Rat and Num both do the Real role.
Lower-case types like int and num imply the native machine representation for integers and floating-point numbers, respectively, and do not promote to arbitrary precision, though larger representations are always allowed for temporary values. Unless qualified with a number of bits, int and num types represent the largest native integer and floating-point types that run at full speed.
Numeric values in untyped variables use Int and Num semantics rather than int and num.
However, for pragmatic reasons, Rat values are guaranteed to be exact only up to a certain point. By default, this is the precision that would be represented by the Rat64 type, which is an alias for Rational[Int,uint64], which has a numerator of Int but is limited to a denominator of uint64. A Rat64 that would require more than 64 bits of storage in the denominator is automatically converted either to a Num or to a lesser-precision Rat, at the discretion of the implementation. (Native types such as rat64 limit the size of both numerator and denominator, though not to the same size. The numerator should in general be twice the size of the denominator to support user expectations. For instance, a rat8 actually supports Rational[int16,uint8], allowing numbers like 100.01 to be represented, and a rat64, defined as Rational[int128,int64], can hold the number of seconds since the Big Bang with attosecond precision. Though perhaps not with attosecond accuracy...)
The limitation on Rat values is intended to be enforced only on user-visible types. Intermediate values used internally in calculation the values of Rat operators may exceed this precision, or represent negative denominators. That is, the temporaries used in calculating the new numerator and denominator are (at least in the abstract) of Int type. After a new numerator and denominator are determined, any sign is forced to be represented only by the numerator. Then if the denominator exceeds the storage size of the unsigned integer used, the fraction is reduced via gcd. If the resulting denominator is still larger than the storage size, then and only then may the precision be reduced to fit into a Rat or Num.
Rat addition and subtraction should attempt to preserve the denominator of the more precise argument if that denominator is an integral multiple of the less precise denominator. That is, in practical terms, adding a column of dollars and cents should generally end up with a result that has a denominator of 100, even if values like 42 and 3.5 were added in. With other operators, this guarantee cannot be made; in such cases, the user should probably be explicitly rounding to a particular denominator anyway.
For applications that really need arbitrary precision denominators as well as numerators at the cost of performance, FatRat may be used, which is defined as Rational[Int,Int], that is, as arbitrary precision in both parts. There is no literal form for a FatRat, so it must be constructed using FatRat.new($nu,$de). In general, only math operators with at least one FatRat argument will return another FatRat, to prevent accidental promotion of reasonably fast Rat values into arbitrarily slow FatRat values.
From t/spec/S02-builtin_data_types/fatrat.t lines 7–13 (no results): (skip)
Highlighted: small|fullAlthough most rational implementations normalize or "reduce" fractions to their smallest representation immediately through a gcd algorithm, Perl allows a rational datatype to do so lazily at need, such as whenever the denominator would run out of precision, but avoid the overhead otherwise. Hence, if you are adding a bunch of Rats that represent, say, dollars and cents, the denominator may stay 100 the entire way through. The .nu and .de methods will return these unreduced values. You can use $rat.=norm to normalize the fraction. (This also forces the sign on the denominator to be positive.) The .perl method will produce a decimal number if the denominator is a power of 10, or normalizable to a power of 10 (that is, having factors of only 2 and 5 (and -1)). Otherwise it will normalize and return a rational literal of the form -47/3. Stringifying a rational does a similar calculation, with the same result on decimal-normalizable fractions, but where .perl would produce the -47/3 form, stringification instead converts to Num and stringifies that, so the rational internal form is somewhat hidden from the casual user, who would generally prefer to see pure decimal notation.
say 1/5; # 0.2 exactly (not via Num)
say 1/3; # 0.333333333333333 via Num
say <2/6>.perl
# 1/3
say 3.14159_26535_89793
# 3.141592653589793 including last digit
say 111111111111111111111111111111111111111111111.123
# 111111111111111111111111111111111111111111111.123
say 555555555555555555555555555555555555555555555/5
# 111111111111111111111111111111111111111111111
say <555555555555555555555555555555555555555555555/5>.perl
# 111111111111111111111111111111111111111111111/1
Inf (infinity) and NaN (not a number). Within a lexical scope, pragmas may specify the nature of temporary values, and how floating point is to behave under various circumstances. All IEEE modes must be lexically available via pragma except in cases where that would entail heroic efforts to bypass a braindead platform.
From t/spec/S02-builtin_data_types/infinity.t lines 5–48 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/nan.t lines 9–26 (no results): (skip)
Highlighted: small|fullThe default floating-point modes do not throw exceptions but rather propagate Inf and NaN. The boxed object types may carry more detailed information on where overflow or underflow occurred. Numerics in Perl are not designed to give the identical answer everywhere. They are designed to give the typical programmer the tools to achieve a good enough answer most of the time. (Really good programmers may occasionally do even better.) Mostly this just involves using enough bits that the stupidities of the algorithm don't matter much.
Str is a Unicode string object. There is no corresponding native str type. However, since a Str object may fill multiple roles, we say that a Str keeps track of its minimum and maximum Unicode abstraction levels, and plays along nicely with the current lexical scope's idea of the ideal character, whether that is bytes, codepoints, graphemes, or characters in some language. For all builtin operations, all Str positions are reported as position objects, not integers. These StrPos objects point into a particular string at a particular location independent of abstraction level, either by tracking the string and position directly, or by generating an abstraction-level independent representation of the offset from the beginning of the string that will give the same results if applied to the same string in any context. This is assuming the string isn't modified in the meanwhile; a StrPos is not a "marker" and is not required to follow changes to a mutable string. For instance, if you ask for the positions of matches done by a substitution, the answers are reported in terms of the original string (which may now be inaccessible!), not as positions within the modified string.
The subtraction of two StrPos objects gives a StrLen object, which is also not an integer, because the string between two positions also has multiple integer interpretations depending on the units. A given StrLen may know that it represents 18 bytes, 7 codepoints, 3 graphemes, and 1 letter in Malayalam, but it might only know this lazily because it actually just hangs onto the two StrPos endpoints within the string that in turn may or may not just lazily point into the string. (The lazy implementation of StrLen is much like a Range object in that respect.)
If you use integers as arguments where position objects are expected, it will be assumed that you mean the units of the current lexically scoped Unicode abstraction level. (Which defaults to graphemes.) Otherwise you'll need to coerce to the proper units:
From t/spec/S02-builtin_data_types/unicode.t lines 29–48 (no results): (skip)
Highlighted: small|fullsubstr($string, Bytes(42), ArabicChars(1))
Of course, such a dimensional number will fail if used on a string that doesn't provide the appropriate abstraction level.
If a StrPos or StrLen is forced into a numeric context, it will assume the units of the current Unicode abstraction level. It is erroneous to pass such a non-dimensional number to a routine that would interpret it with the wrong units.
Implementation note: since Perl 6 mandates that the default Unicode processing level must view graphemes as the fundamental unit rather than codepoints, this has some implications regarding efficient implementation. It is suggested that all graphemes be translated on input to a unique grapheme numbers and represented as integers within some kind of uniform array for fast substr access. For those graphemes that have a precomposed form, use of that codepoint is suggested. (Note that this means Latin-1 can still be represented internally with 8-bit integers.)
For graphemes that have no precomposed form, a temporary private id should be assigned that uniquely identifies the grapheme. If such ids are assigned consistently throughout the process, comparison of two graphemes is no more difficult than the comparison of two integers, and comparison of base characters no more difficult than a direct lookup into the id-to-NFD table.
Obviously, any temporary grapheme ids must be translated back to some universal form (such as NFD) on output, and normal precomposed graphemes may turn into either NFC or NFD forms depending on the desired output. Maintaining a particular grapheme/id mapping over the life of the process may have some GC implications for long-running processes, but most processes will likely see a limited number of non-precomposed graphemes.
If the program has a scope that wants a codepoint view rather than a grapheme view, the string visible to that lexical scope must also be translated to universal form, just as with output translation. Alternately, the temporary grapheme ids may be hidden behind an abstraction layer. In any case, codepoint scope should never see any temporary grapheme ids. (The lexical codepoint declaration should probably specify which normalization form it prefers to view strings under. Such a declaration could be applied to input translation as well.)
Buf is a stringish view of an array of integers, and has no Unicode or character properties without explicit conversion to some kind of Str. (A buf is the native counterpart.) Typically it's an array of bytes serving as a buffer. Bitwise operations on a Buf treat the entire buffer as a single large integer. Bitwise operations on a Str generally fail unless the Str in question can provide an abstract Buf interface somehow. Coercion to Buf should generally invalidate the Str interface. As a generic type Buf may be instantiated as (or bound to) any of buf8, buf16, or buf32 (or to any type that provides the appropriate Buf interface), but when used to create a buffer Buf defaults to buf8.
Unlike Str types, Buf types prefer to deal with integer string positions, and map these directly to the underlying compact array as indices. That is, these are not necessarily byte positions--an integer position just counts over the number of underlying positions, where one position means one cell of the underlying integer type. Builtin string operations on Buf types return integers and expect integers when dealing with positions. As a limiting case, buf8 is just an old-school byte string, and the positions are byte positions. Note, though, that if you remap a section of buf32 memory to be buf8, you'll have to multiply all your positions by 4.
utf8 type is derived from buf8, with the additional constraint that it may only contain validly encoded UTF-8. Likewise, utf16 is derived from buf16, and utf32 from buf32.
Note that since these are type names, parentheses must always be used to call them as coercers, since the listop form is not allowed for coercions. That is:
utf8 op $x
is always parsed as
(utf8) op $x
and never as
utf8(op $x)
* character as a standalone term captures the notion of "Whatever", which is applied lazily by whatever operator it is an argument to. Generally it can just be thought of as a "glob" that gives you everything it can in that argument position. For instance:
From t/spec/S02-builtin_data_types/whatever.t lines 6–13 (no results): (skip)
Highlighted: small|full if $x ~~ 1..* {...} # if 1 <= $x <= +Inf
my ($a,$b,$c) = "foo" xx *; # an arbitrary long list of "foo"
if /foo/ ff * {...} # a latching flipflop
@slice = @x[*;0;*]; # any Int
@slice = %x{*;'foo'}; # any keys in domain of 1st dimension
@array[*] # flattens, unlike @array[]
(*, *, $x) = (1, 2, 3); # skip first two elements
# (same as lvalue "undef" in Perl 5)
Whatever is an undefined prototype object derived from Any. As a type it is abstract, and may not be instantiated as a defined object. If for a particular MMD dispatch, nothing in the MMD system claims it, it dispatches to as an Any with an undefined value, and usually blows up constructively. If you say
say 1 + *;
you should probably not expect it to yield a reasonable answer, unless you think an exception is reasonable. Since the Whatever object is effectively immutable, the optimizer is free to recognize * and optimize in the context of what operator it is being passed to.
Most of the built-in numeric operators treat an argument of * as indicating the desire to create a function of a single unknown, so:
From t/spec/S02-builtin_data_types/whatever.t lines 14–76 (no results): (skip)
Highlighted: small|full* - 1
produces a closure of a single argument:
{ $_ - 1 }
This closure is officially returned at run time, so it is not subject to the rule that bare closures execute immediately when used as a statement. However, in most cases the result of a multiple dispatch can be determined at compile time, so the compiler is expected to optimize away the run-time call. Hence, despite the fact that the inside of parentheses is considered a statement, if you say
(* + 7)(3) # 10
the generated { $_ + 7 } closure is returned uncalled by those parentheses and then invoked by the .(3) postfix. In contrast,
( { $_ + 7 } )(3)
evaluates the bare block immediately with whatever $_ is already in scope, and then fails because a number doesn't know how to respond to the .(3) invocation.
Likewise, the single dispatcher officially recognizes *.meth at run time and returns { $_.meth }, so it can be used where patterns are expected:
@primes = grep *.prime, 2..*;
This also should be optimized to a closure by the compiler. Basically, dispatches to Whatever are assumed to be subject to constant folding.
If multiple * appear as terms within a single expression, the resulting closure binds them all to the same argument, so * * * returns the closure { $_ * $_ }.
From t/spec/S02-builtin_data_types/whatever.t lines 77–107 (no results): (skip)
Highlighted: small|fullThese returned closures are of type WhateverCode, not Whatever, so that constructs can distinguish via multiple dispatch:
1,2,3 ... *
1,2,3 ... *+1
A bare * which is immediately followed by a (...) or .(...) is parsed as the unary identity closure:
*(42) == 42
(* + 1)(42) == 43
But note that this is not what is happening above, or
1,2,3 ... *
would end up meaning:
1,2,3,3,3,3,3,3...
The ... operator is instead dispatching bare * to a routine that does dwimmery, and in this case decides to supply a function { * + 1 }. There is no requirement that an operator return a closure when Whatever is used as an argument; that's just the typical behavior for functions that have no intrinsic "globbish" meaning for *.
The final element of an array is subscripted as @a[*-1], which means that when the subscripting operation discovers a WhateverCode object for a subscript, it calls it and supplies an argument indicating the number of elements in (that dimension of) the array. See S09.
A variant of * is the ** term, which is of type HyperWhatever. It is generally understood to be a multidimension form of * when that makes sense. When modified by an operator that would turn * into a function of one argument, ** instead turns into a function with a slurpy argument, of type HyperWhateverCode. That is:
* - 1 means -> $x { $x - 1 }
** - 1 means -> *@x { map -> $x { $x - 1 }, @x }
Therefore @array[^**] represents @array[{ map { ^* }, @_ }], that is to say, every element of the array, no matter how many dimensions. (However, @array[**] means the same thing because (as with ... above), the subscript operator will interpret bare ** as meaning all the subscripts, not the list of dimension sizes. The meaning of Whatever is always controlled by the first context it is bound into.)
Other uses for * and ** will doubtless suggest themselves over time. These can be given meaning via the MMD system, if not the compiler. In general a Whatever should be interpreted as maximizing the degrees of freedom in a dwimmy way, not as a nihilistic "don't care anymore--just shoot me".
Values with these types autobox to their uppercase counterparts when you treat them as objects:
bit single native bit
int native signed integer
uint native unsigned integer (autoboxes to Int)
buf native buffer (finite seq of native ints or uints, no Unicode)
rat native rational
num native floating point
complex native complex number
bool native boolean
Since native types cannot represent Perl's concept of undefined values, in the absence of explicit initialization, native floating-point types default to NaN, while integer types (including bit) default to 0. The complex type defaults to NaN + NaN\i. A buf type of known size defaults to a sequence of 0 values. If any native type is explicitly initialized to * (the Whatever type), no initialization is attempted and you'll get whatever was already there when the memory was allocated.
From t/spec/S02-builtin_data_types/whatever.t lines 7–13 (no results): (skip)
Highlighted: small|fullIf a buf type is initialized with a Unicode string value, the string is decomposed into Unicode codepoints, and each codepoint shoved into an integer element. If the size of the buf type is not specified, it takes its length from the initializing string. If the size is specified, the initializing string is truncated or 0-padded as necessary. If a codepoint doesn't fit into a buf's integer type, a parse error is issued if this can be detected at compile time; otherwise a warning is issued at run time and the overflowed buffer element is filled with an appropriate replacement character, either U+FFFD (REPLACEMENT CHARACTER) if the element's integer type is at least 16 bits, or U+007f (DELETE) if the larger value would not fit. If any other conversion is desired, it must be specified explicitly. In particular, no conversion to UTF-8 or UTF-16 is attempted; that must be specified explicitly. (As it happens, conversion to a buf type based on 32-bit integers produces valid UTF-32 in the native endianness.)
Mu typeAmong other things, Mu is named after the eastern concept of "Mu" or 無 (see http://en.wikipedia.org/wiki/MU, especially the "Mu_(negative)" entry), so in Perl 6 it stands in for Perl 5's concept of "undef" when that is used as a noun. However, Mu is also the "nothing" from which everything else is derived via the undefined type objects, so it stands in for the concept of "Object" as used in languages like Java. Or think of it as a "micro" or µ-object that is the the basis for all other objects, something atomic like a Muon. Or if acronyms make you happy, there are a variety to pick from:
Most Universal
More Undefined
Modern Undef
Master Union
Meta Ur
Mega Up
...
Or just think of it as a sound a cow makes, which simultaneously means everything and nothing.
Perl 6 does not have a single value representing undefinedness. Instead, objects of various types can carry type information while nevertheless remaining undefined themselves. Whether an object is defined is determined by whether .defined returns true or not. These typed objects typically represent uninitialized values. Failure objects are also officially undefined despite carrying exception information; these may be created using the fail function, or by direct construction of an exception object of some sort. (See S04 for how failures are handled.)
Mu Most Undefined
Failure Failure (lazy exceptions, thrown if not handled properly)
Whenever you declare any kind of type, class, module, or package, you're automatically declaring a undefined prototype value with the same name, known as the type object. The name itself returns that type object:
Mu Perl 6 object (default block parameter type, either Any or junction)
Any Perl 6 object (default routine parameter type, excludes junction)
Cool Perl 6 Convenient OO Loopbacks
Whatever Wildcard (like Any, but subject to do-what-I-mean via MMD)
Int Any Int object
Widget Any Widget object
Type objects stringify to their name with empty parens concatenated. Note that type objects are not classes, but may be used to name classes:
Widget.new() # create a new Widget
Whenever a Failure value is put into a typed container, it takes on the type specified by the container but continues to carry the Failure role. Use fail to return specific failures. Use Mu for the most generic non-failure undefined value. The Any type, derived from Mu, is also undefined, but excludes junctions so that autothreading may be dispatched using normal multiple dispatch rules. All user-defined classes derive from the Any class by default. The Whatever type is derived from Any but nothing else is derived from it.
Objects with these types behave like values, i.e. $x === $y is true if and only if their types and contents are identical (that is, if $x.WHICH eqv $y.WHICH).
Str Perl string (finite sequence of Unicode characters)
Bit Perl single bit (allows traits, aliasing, undefinedness, etc.)
Int Perl integer (allows Inf/NaN, arbitrary precision, etc.)
Num Perl number (approximate Real, generally via floating point)
Rat Perl rational (exact Real, limited denominator)
FatRat Perl rational (unlimited precision in both parts)
Complex Perl complex number
Bool Perl boolean
From t/spec/S02-builtin_data_types/parsing-bool.t lines 8–17 (no results): (skip)
Highlighted: small|full Exception Perl exception
Block Executable objects that have lexical scopes
Seq A list of values (can be generated lazily)
Range A pair of Ordered endpoints
From t/spec/S02-builtin_data_types/range.t lines 8–40 (no results): (skip)
Highlighted: small|full Set Unordered collection of values that allows no duplicates
Bag Unordered collection of values that allows duplicates
Enum An immutable Pair
EnumMap A mapping of Enums with no duplicate keys
Signature Function parameters (left-hand side of a binding)
Parcel List of syntactic objects
Capture Function call arguments (right-hand side of a binding)
Blob An undifferentiated mass of bits
Instant A point on the continuous atomic timeline (TAI)
Duration The difference between two Instants
HardRoutine A routine that is committed to not changing
Instants and Durations are measured in atomic seconds with fractions. Notionally they are real numbers which may be implemented in either Num or Rat types. (Fixed-point implementations are strongly discouraged.) Interfaces that take Duration arguments, such as sleep(), may also take Num arguments, but Instant arguments must be explicitly created via any of various culturally aware time specification APIs that, by and large, are outside the CORE of Perl 6, with the possible exception of a constructor taking a native TAI value. In numeric context a Duration happily returns a Num representing seconds. If pressed for a number, an Instant will return the length of time in atomic seconds from the TAI epoch, but it will be unhappy about it. (The time will be returned as a Rat to preserve maximal precision and accuracy.) Systems which cannot provide a steady time base, such as POSIX systems, will simply have to make their best guess as to the correct atomic time.
These types do (at least) the following roles:
Class Roles
===== =====
Str Stringy
Bit Numeric Boolean Integral
Int Numeric Integral
Num Numeric Real
Rat Numeric Real Rational
FatRat Numeric Real Rational
Complex Numeric
Bool Boolean
Exception Failure
Block Callable
Seq Iterable
Range Iterable
Set Associative[Bool]
Bag Associative[UInt]
Enum Associative
EnumMap Associative Positional Iterable
Signature
Parcel Positional
Capture Positional Associative
Blob Stringy
Instant Real
Duration Real
HardRoutine Routine
[Conjecture: Stringy may best be split into 2 roles where both Str and Blob compose the more general one and just Str composes a less general one. The more general of those would apply to what is common to any dense sequence ("string") that Str and Blob both are (either of characters or bits or integers etc), and the string operators like catenation (~) and replication (x, xx) would be part of the more general role. The more specific role would apply to Str but not Blob and includes any specific operators that are specific to characters and don't apply to bits or integers etc. The other alternative is to more clearly distance character strings from bit strings, keeping ~/etc for character strings only and adding an analogy for bit strings.]
The Iterable role indicates not that you can iterate the type directly, but that you can request the type to return an iterator. Iterable types may have multiple iterators (lists) running across them simultaneously, but an iterator/list itself has only one thread of consumption. Every time you do get on an iterator, a value disappears from its list.
From t/spec/S02-builtin_data_types/keyweight.t lines 5–16 (no results): (skip)
Highlighted: small|fullObjects with these types have distinct .WHICH values that do not change even if the object's contents change. (Routines are considered mutable because they can be wrapped in place.)
Iterator Perl list
SeqIter Iterator over a Seq
RangeIter Iterator over a Range
Scalar Perl scalar
Array Perl array
From t/spec/S02-builtin_data_types/array.t lines 7–231 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/nested_arrays.t lines 5–36 (no results): (skip)
Highlighted: small|full Hash Perl hash
KeySet KeyHash of Bool (does Set in list/array context)
From t/spec/S02-builtin_data_types/keyset.t lines 5–27 (no results): (skip)
Highlighted: small|fullKeyBag KeyHash of UInt (does Bag in list/array context)
From t/spec/S02-builtin_data_types/keybag.t lines 5–50 (no results): (skip)
Highlighted: small|fullPair A single key-to-value association
From t/spec/integration/passing-pair-class-to-sub.t lines 4–31 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/pair.t lines 7–322 (no results): (skip)
Highlighted: small|full PairSeq A Seq of Pairs
Buf Perl buffer (a stringish array of memory locations)
IO Perl filehandle
Routine Base class for all wrappable executable objects
Sub Perl subroutine
Method Perl method
Submethod Perl subroutine acting like a method
Macro Perl compile-time subroutine
Regex Perl pattern
Match Perl match, usually produced by applying a pattern
Stash A symbol table hash (package, module, class, lexpad, etc)
SoftRoutine A routine that is committed to staying mutable
The KeyHash role differs from a normal Associative hash in how it handles default values. If the value of a KeyHash element is set to the default value for the KeyHash, the element is deleted. If undeclared, the default default for a KeyHash is 0 for numeric types, False for boolean types, and the null string for string and buffer types. A KeyHash of an object type defaults to the undefined prototype for that type. More generally, the default default is whatever defined value a Nil would convert to for that value type. A KeyHash of Scalar deletes elements that go to either 0 or the null string. A KeyHash also autodeletes keys for normal undefined values (that is, those undefined values that do not contain an unthrown exception).
A KeySet is a KeyHash of booleans with a default of False. If you use the Hash interface and increment an element of a KeySet its value becomes true (creating the element if it doesn't exist already). If you decrement the element it becomes false and is automatically deleted. Decrementing a non-existing value results in a False value. Incrementing an existing value results in True. When not used as a Hash (that is, when used as an Array or list or Set object) a KeySet behaves as a Set of its keys. (Since the only possible value of a KeySet is the True value, it need not be represented in the actual implementation with any bits at all.)
A KeyBag is a KeyHash of UInt with default of 0. If you use the Hash interface and increment an element of a KeyBag its value is increased by one (creating the element if it doesn't exist already). If you decrement the element the value is decreased by one; if the value goes to 0 the element is automatically deleted. An attempt to decrement a non-existing value results in a Failure value. When not used as a Hash (that is, when used as an Array or list or Bag object) a KeyBag behaves as a Bag of its keys, with each key replicated the number of times specified by its corresponding value. (Use .kv or .pairs to suppress this behavior in list context.)
As with Hash types, Pair and PairSeq are mutable in their values but not in their keys. (A key can be a reference to a mutable object, but cannot change its .WHICH identity. In contrast, the value may be rebound to a different object, just as a hash element may.)
The following roles are supported:
Iterator List
Scalar
Array Positional Iterable
Hash Associative
KeySet KeyHash[Bool]
KeyBag KeyHash[UInt]
KeyHash Associative
Pair Associative
PairSeq Associative Postional Iterable
Buf Stringy
IO
Routine Callable
Sub Callable
Method Callable
Submethod Callable
Macro Callable
Regex Callable
Match Positional Associative
Stash Associative
SoftRoutine Routine
Types that do the List role are generally hidden from casual view, since iteration is typically triggered by context rather than by explicit call to the iterator's .get method. Filehandles are a notable exception.
See "Wrapping" in S06 for a discussion of soft vs. hard routines.
Explicit types are optional. Perl variables have two associated types: their "value type" and their "implementation type". (More generally, any container has an implementation type, including subroutines and modules.) The value type is stored as its of property, while the implementation type of the container is just the object type of the container itself. The word returns is allowed as an alias for of.
The value type specifies what kinds of values may be stored in the variable. A value type is given as a prefix or with the of keyword:
my Dog $spot;
my $spot of Dog;
In either case this sets the of property of the container to Dog.
Subroutines have a variant of the of property, as, that sets the as property instead. The as property specifies a constraint (or perhaps coercion) to be enforced on the return value (either by explicit call to return or by implicit fall-off-the-end return). This constraint, unlike the of property, is not advertised as the type of the routine. You can think of it as the implicit type signature of the (possibly implicit) return statement. It's therefore available for type inferencing within the routine but not outside it. If no as type is declared, it is assumed to be the same as the of type, if declared.
sub get_pet() of Animal {...} # of type, obviously
sub get_pet() returns Animal {...} # of type
our Animal sub get_pet() {...} # of type
sub get_pet() as Animal {...} # as type
A value type on an array or hash specifies the type stored by each element:
my Dog @pound; # each element of the array stores a Dog
my Rat %ship; # the value of each entry stores a Rat
The key type of a hash may be specified as a shape trait--see S09.
The implementation type specifies how the variable itself is implemented. It is given as a trait of the variable:
my $spot is Scalar; # this is the default
my $spot is PersistentScalar;
my $spot is DataBase;
Defining an implementation type is the Perl 6 equivalent to tying a variable in Perl 5. But Perl 6 variables are tied directly at declaration time, and for performance reasons may not be tied with a run-time tie statement unless the variable is explicitly declared with an implementation type that does the Tieable role.
However, package variables are always considered Tieable by default. As a consequence, all named packages are also Tieable by default. Classes and modules may be viewed as differently tied packages. Looking at it from the other direction, classes and modules that wish to be bound to a global package name must be able to do the Package role.
A non-scalar type may be qualified, in order to specify what type of value each of its elements stores:
From t/spec/S02-builtin_data_types/declare.t lines 523–543 (no results): (skip)
Highlighted: small|full my Egg $cup; # the value is an Egg
my Egg @carton; # each elem is an Egg
my Array of Egg @box; # each elem is an array of Eggs
my Array of Array of Egg @crate; # each elem is an array of arrays of Eggs
my Hash of Array of Recipe %book; # each value is a hash of arrays of Recipes
Each successive of makes the type on its right a parameter of the type on its left. Parametric types are named using square brackets, so:
my Hash of Array of Recipe %book;
actually means:
my Hash:of(Array:of(Recipe)) %book;
Because the actual variable can be hard to find when complex types are specified, there is a postfix form as well:
my Hash of Array of Recipe %book; # HoHoAoRecipe
my %book of Hash of Array of Recipe; # same thing
The as form may be used in subroutines:
my sub get_book ($key) as Hash of Array of Recipe {...}
Alternately, the return type may be specified within the signature:
my sub get_book ($key --> Hash of Array of Recipe) {...}
There is a slight difference, insofar as the type inferencer will ignore a as but pay attention to --> or prefix type declarations, also known as the of type. Only the inside of the subroutine pays attention to as, and essentially coerces the return value to the indicated type, just as if you'd coerced each return expression.
You may also specify the of type as the of trait (with returns allowed as a synonym):
my Hash of Array of Recipe sub get_book ($key) {...}
my sub get_book ($key) of Hash of Array of Recipe {...}
my sub get_book ($key) returns Hash of Array of Recipe {...}
Anywhere you can use a single type you can use a set of types, for convenience specifiable as if it were an "or" junction:
my Int|Str $error = $val; # can assign if $val~~Int or $val~~Str
Fancier type constraints may be expressed through a subtype:
From t/spec/S02-polymorphic_types/subset.t lines 11–27 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-polymorphic_types/subset.t lines 28–138 (no results): (skip)
Highlighted: small|full subset Shinola of Any where {.does(DessertWax) and .does(FloorTopping)};
if $shimmer ~~ Shinola {...} # $shimmer must do both interfaces
Since the terms in a parameter could be viewed as a set of constraints that are implicitly "anded" together (the variable itself supplies type constraints, and where clauses or tree matching just add more constraints), we relax this to allow juxtaposition of types to act like an "and" junction:
# Anything assigned to the variable $mitsy must conform
# to the type Fish and either the Squirrel or Dog type...
my Squirrel|Dog Fish $mitsy = new Fish but { Bool.pick ?? .does Squirrel
!! .does Dog };
[Note: the above is a slight lie, insofar as parameters are currently restricted for 6.0.0 to having only a single main type for the formal variable until we understand MMD a bit better.]
Parameters may be given types, just like any other variable:
From t/spec/S02-builtin_data_types/type.t lines 56–82 (no results): (skip)
Highlighted: small|full sub max (int @array is rw) {...}
sub max (@array of int is rw) {...}
From t/spec/S06-signature/type-capture.t lines 7–41 (no results): (skip)
Highlighted: small|fullWithin a declaration, a class variable (either by itself or following an existing type name) declares a new type name and takes its parametric value from the actual type of the parameter it is associated with. It declares the new type name in the same scope as the associated declaration.
sub max (Num ::X @array) {
push @array, X.new();
}
The new type name is introduced immediately, so two such types in the same signature must unify compatibly if they have the same name:
sub compare (Any ::T $x, T $y) {
return $x eqv $y;
}
On a scoped subroutine, a return type can be specified before or after the name. We call all return types "return types", but distinguish two kinds of return types, the as type and the of type, because the of type is normally an "official" named type and declares the official interface to the routine, while the as type is merely a constraint on what may be returned by the routine from the routine's point of view.
From t/spec/S02-builtin_data_types/type.t lines 83–158 (no results): (skip)
Highlighted: small|full our sub lay as Egg {...} # as type
our Egg sub lay {...} # of type
our sub lay of Egg {...} # of type
our sub lay (--> Egg) {...} # of type
my sub hat as Rabbit {...} # as type
my Rabbit sub hat {...} # of type
my sub hat of Rabbit {...} # of type
my sub hat (--> Rabbit) {...} # of type
If a subroutine is not explicitly scoped, it defaults to my scoping. Any return type must go after the name:
sub lay as Egg {...} # as type
sub lay of Egg {...} # of type
sub lay (--> Egg) {...} # of type
On an anonymous subroutine, any return type can only go after the sub keyword:
$lay = sub as Egg {...}; # as type
$lay = sub of Egg {...}; # of type
$lay = sub (--> Egg) {...}; # of type
but you can use the anon scope declarator to introduce an of prefix type:
$lay = anon Egg sub {...}; # of type
$hat = anon Rabbit sub {...}; # of type
The return type may also be specified after a --> token within the signature. This doesn't mean exactly the same thing as as. The of type is the "official" return type, and may therefore be used to do type inferencing outside the sub. The as type only makes the return type available to the internals of the sub so that the return statement can know its context, but outside the sub we don't know anything about the return value, as if no return type had been declared. The prefix form specifies the of type rather than the as type, so the return type of
my Fish sub wanda ($x) { ... }
is known to return an object of type Fish, as if you'd said:
my sub wanda ($x --> Fish) { ... }
not as if you'd said
my sub wanda ($x) as Fish { ... }
It is possible for the of type to disagree with the as type:
my Squid sub wanda ($x) as Fish { ... }
or equivalently,
my sub wanda ($x --> Squid) as Fish { ... }
This is not lying to yourself--it's lying to the world. Having a different inner type is useful if you wish to hold your routine to a stricter standard than you let on to the outside world, for instance.
The Cool type is derived from Any, and contains all the methods that are "cool" (as in, "I'm cool with an argument of that type.").
More specifically, these are the methods that are culturally universal, insofar as the typical user will expect the name of the method to imply conversion to a particular built-in type that understands the method in question. For instance, $x.abs implies conversion to an appropriate numeric type if $x is "cool" but doesn't already support a method of that name. Conversely, $x.substr implies conversion to a string or buffer type.
The Cool namespace also contains all multimethods of last resort; these are automatically searched if normal multiple dispatch does not find a viable candidate. Note that the Cool namespace is mutable, and both single and multiple dispatch must take into account changes there for the purposes of run-time monkey patching. However, since the multiple dispatcher uses the Cool package only as a failover, compile-time analysis of such dispatches is largely unaffected for any arguments with an exact or close match. Likewise any single dispatch a method that is more specific than the Cool class is not affected by the mutability of Cool. User-defined classes don't derive from Cool by default, so such classes are also unaffected by changes to Cool.
$Package'var syntax is gone. Use $Package::var instead.
$ scalar (object)
@ ordered array
% unordered hash (associative array)
& code/rule/token/regex
:: package/module/class/role/subset/enum/type/grammar
Within a declaration, the & sigil also declares the visibility of the subroutine name without the sigil within the scope of the declaration:
my &func := sub { say "Hi" };
func; # calls &func
Within a signature or other declaration, the :: sigil followed by an identifier marks a type variable that also declares the visibility of a package/type name without the sigil within the scope of the declaration. The first such declaration within a scope is assumed to be an unbound type, and takes the actual type of its associated argument. With subsequent declarations in the same scope the use of the sigil is optional, since the bare type name is also declared.
A declaration nested within must not use the sigil if it wishes to refer to the same type, since the inner declaration would rebind the type. (Note that the signature of a pointy block counts as part of the inner block, not the outer block.)
$x may be bound to any object, including any object that can be bound to any other sigil. Such a scalar variable is always treated as a singular item in any kind of list context, regardless of whether the object is essentially composite or unitary. It will not automatically dereference to its contents unless placed explicitly in some kind of dereferencing context. In particular, when interpolating into list context, $x never expands its object to anything other than the object itself as a single item, even if the object is a container object containing multiple items.
@x may be bound to an object of the Array class, but it may also be bound to any object that does the Positional role, such as a Seq, Range, Buf, Parcel, or Capture. The Positional role implies the ability to support postcircumfix:<[ ]>.
Likewise, %x may be bound to any object that does the Associative role, such as Pair, PairSet, Set, Bag, KeyHash, or Capture. The Associative role implies the ability to support postcircumfix:<{ }>.
&x may be bound to any object that does the Callable role, such as any Block or Routine. The Callable role implies the ability to support postcircumfix:<( )>.
::x may be bound to any object that does the Abstraction role, such as a package, module, class, role, grammar, or any other type object, or any immutable value object that can be used as a type. This Abstraction role implies the ability to do various symbol table and/or typological manipulations which may or may not be supported by any given abstraction. Mostly though it just means that you want to give some abstraction an official name that you can then use later in the compilation without any sigil.
In any case, the minimal container role implied by the sigil is checked at binding time at the latest, and may fail earlier (such as at compile time) if a semantic error can be detected sooner. If you wish to bind an object that doesn't yet do the appropriate role, you must either stick with the generic $ sigil, or mix in the appropriate role before binding to a more specific sigil.
An object is allowed to support both Positional and Associative. An object that does not support Positional may not be bound directly to @x. However, any construct such as %x that can interpolate the contents of such an object into list context can automatically construct a list value that may then be bound to an array variable. Subscripting such a list does not imply subscripting back into the original object.
$foo ordinary scoping
$.foo object attribute public accessor
$^foo self-declared formal positional parameter
$:foo self-declared formal named parameter
$*foo dynamically overridable global variable
$?foo compiler hint variable
$=foo Pod variable
$<foo> match variable, short for $/{'foo'}
$!foo object attribute private storage
$~foo the foo sublanguage seen by the parser at this lexical spot
Most variables with twigils are implicitly declared or assumed to be declared in some other scope, and don't need a "my" or "our". Attribute variables are declared with has, though.
$ always means a scalar variable, @ an array variable, and % a hash variable, even when subscripting. In item context, variables such as @array and %hash simply return themselves as Array and Hash objects. (Item context was formerly known as scalar context, but we now reserve the "scalar" notion for talking about variables rather than contexts, much as arrays are disassociated from list context.)
.perl method. Like the Data::Dumper module in Perl 5, the .perl method will put quotes around strings, square brackets around list values, curlies around hash values, constructors around objects, etc., so that Perl can evaluate the result back to the same object. The .perl method will return a representation of the object on the assumption that, if the code is reparsed at some point, it will be used to regenerate the object as a scalar in item context. If you wish to interpolate the regenerated object in a list context, it may be necessary to use <prefix:<| >> to force interpolation.
From t/spec/S02-names_and_variables/list_array_perl.t lines 5–35 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names_and_variables/list_array_perl.t lines 79–117 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names_and_variables/perl.t lines 5–49 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names_and_variables/perl.t lines 50–148 (no results): (skip)
Highlighted: small|full.fmt('%03d') method to do an implicit sprintf on the value.
From t/spec/S02-names_and_variables/fmt.t lines 7–15 (no results): (skip)
Highlighted: small|fullTo format an array value separated by commas, supply a second argument: .fmt('%03d', ', '). To format a hash value or list of pairs, include formats for both key and value in the first string: .fmt('%s: %s', "\n").
From t/spec/S02-names_and_variables/fmt.t lines 16–30 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names_and_variables/fmt.t lines 31–44 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names_and_variables/fmt.t lines 45–88 (no results): (skip)
Highlighted: small|full@foo.[1] and %bar.{'a'}) that makes the dereference a little more explicit. Constant string subscripts may be placed in angles, so %bar.{'a'} may also be written as %bar<a> or %bar.<a>. Additionally, you may insert extra whitespace using the unspace.
From t/spec/S02-literals/misc-interpolation.t lines 45–51 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-builtin_data_types/subscripts_and_context.t lines 9–59 (no results): (skip)
Highlighted: small|fullIf you need to force inner context to item (scalar), we now have convenient single-character context specifiers such as + for numbers and ~ for strings:
$x = g(); # item context for g()
@x[f()] = g(); # list context for f() and g()
@x[f()] = +g(); # list context for f(), numeric item context for g()
@x[+f()] = g(); # numeric item context for f(), list context for g()
@x[f()] = @y[g()]; # list context for f() and g()
@x[f()] = +@y[g()]; # list context for f() and g()
@x[+f()] = @y[g()]; # numeric item context for f(), list context for g()
@x[f()] = @y[+g()]; # list context for f(), numeric item context for g()
%x{~f()} = %y{g()}; # string item context for f(), list context for g()
%x{f()} = %y{~g()}; # list context for f(), string item context for g()
Sigils used either as functions or as list prefix operators also force context, so these also work:
@x[$(g())] # item context for g()
%x{$(g())} # item context for g()
But note that these don't do the same thing:
@x[$g()] # call function in $g
%x{$g()} # call function in $g
:= binding operator that lets you bind names to Array and Hash objects without copying, in the same way as subroutine arguments are bound to formal parameters. See S06 for more about binding.
Parcel. This kind of list should not be confused with the flattening list context. Instead, this is a raw syntactic list; no interpretation is made of the list without knowing what context it will be evaluated in. For example, when you say:
(1,2,3,:mice<blind>)
the result is a Parcel object containing three Int objects and a Pair object, that is, four positional objects. When, however, you say something like:
rhyme(1,2,3,:mice<blind>)
the Parcel is translated (at compile time, in this case) into a Capture with 3 positionals and one named argument in preparation for binding.
A parcel may be captured into an object with backslashed parens:
$args = \(1,2,3,:mice<blind>)
Values in the Parcel object are parsed as ordinary expressions, and any functions mentioned are called, with their results placed as a single subparcel within the outer parcel. Whether they are subsequently flattened will depend on the eventual binding.
Parcel is used as a list of arguments, it will be transformed into a Capture objects, which is much like a Parcel but has its arguments divvied up into positional and named subsets for faster binding. (Usually this transformation happens at compile time.) If the first positional is followed by a colon instead of a comma, it is marked as the invocant in case it finds itself in a context that cares. It's illegal to use the colon in place of the comma anywhere except after the first argument.
Capture objects are immutable in the abstract, but evaluate their arguments lazily. Before everything inside a Capture is fully evaluated (which happens at compile time when all the arguments are constants), the eventual value may well be unknown. All we know is that we have the promise to make the bits of it immutable as they become known.
Capture objects may contain multiple unresolved iterators such as feeds or slices. How these are resolved depends on what they are eventually bound to. Some bindings are sensitive to multiple dimensions while others are not.
You may retrieve parts from a Capture object with a prefix sigil operator:
$args = \3; # same as "$args = \(3)"
@$args; # same as "Array($args)"
%$args; # same as "Hash($args)"
When cast into an array, you can access all the positional arguments; into a hash, all named arguments.
All prefix sigil operators accept one positional argument, evaluated in item context as a rvalue. They can interpolate in strings if called with parentheses. The special syntax form $() translates into $( $.ast // Str($/) ) to operate on the current match object; similarly @() and %() can extract positional and named submatches.
Parcel and Capture objects fill the ecological niche of references in Perl 6. You can think of them as "fat" references, that is, references that can capture not only the current identity of a single object, but also the relative identities of several related objects. Conversely, you can think of Perl 5 references as a degenerate form of Capture when you want to refer only to a single item.
The empty Parcel is a value with a special name: Nil. It is the named equivalent of the empty () list. The Nil value is officially undefined as an item but interpolates as a null list into list context, and an empty Seq into slice context. An iterator can never return Nil as an ordinary value, so it is the sentinel value that marks the end of iteration.
From t/spec/S02-builtin_data_types/undefined-types.t lines 5–62 (no results): (skip)
Highlighted: small|fullAssigning or binding Nil to any scalar container causes the container to throw out any contents and restore itself to an uninitialized state (after which it will contain a type object appropriate to the declared type of the container, or Mu for untyped containers).
Assigning or binding Nil to any composite container (such as an Array or Hash) empties the container, resetting it back to an uninitialized state. The container object itself remains defined.
The sink statement prefix will eagerly evaluate any block or statement, throw away the results, and instead return the Nil value. This can be useful to peg some behavior to an empty list while still returning an empty list:
# Check that incoming argument list isn't null
@inclist = map { $_ + 1 }, @list || sink warn 'Nil input!';
@inclist = do for @list || sink { warn 'Nil input!'; $warnings++; } {
$_ + 1;
}
# Check that outgoing result list isn't null
@inclist = do map { $_ + 1 }, @list or sink warn 'Nil result!';
@inclist = do for @list {
$_ + 1;
} or sink { warn 'Nil result'; $warnings++; }
Given sink, there's no need for an "else" clause on Perl 6's loops, and the sink construct works in any list, not just for loops.
Signature) may be created with colon-prefixed parens:
From t/spec/S02-names_and_variables/signature.t lines 10–71 (no results): (skip)
Highlighted: small|fullmy ::MySig ::= :(Int, Num, Complex, Status)
Expressions inside the signature are parsed as parameter declarations rather than ordinary expressions. See S06 for more details on the syntax for parameters.
Signature objects bound to type variables (as in the example above) may be used within other signatures to apply additional type constraints. When applied to a Capture argument, the signature allows you to take the types of the capture's arguments from MySig, but declare the (untyped) variable names yourself via an additional signature in parentheses:
sub foo (Num Dog|Squirrel $numdog, MySig $a ($i,$j,$k,$mousestatus)) {...}
foo($mynumdog, \(1, 2.7182818, 1.0i, statmouse());
&foo merely stands for the foo function as a Routine object without calling it. You may call any Code object by dereferencing it with parens (which may, of course, contain arguments):
&foo($arg1, $arg2);
Whitespace is not allowed before the parens because it is parsed as a postfix. As with any postfix, there is also a corresponding .() operator, and you may use the "unspace" form to insert optional whitespace and comments between the backslash and either of the postfix forms:
&foo\ ($arg1, $arg2);
&foo\ .($arg1, $arg2);
&foo\#`[
embedded comment
].($arg1, $arg2);
Note however that the parentheses around arguments in the "normal" named forms of function and method calls are not postfix operators, so do not allow the .() form, because the dot is indicative of an actual dereferencing operation, which the named forms aren't doing. You may, however, use "unspace" to install extra space before the parens in the forms:
foo() # okay
foo\ () # okay
foo.() # means foo().()
.foo() # okay
.foo\ () # okay
.foo.() # means .foo().()
$.foo() # okay
$.foo\ () # okay
$.foo.() # means $.foo().()
If you do use the dotty form on these special forms, it will assume you wanted to call the named form without arguments, and then dereference the result of that.
&foo may actually be the name of a set of candidate functions (which you can use as if it were an ordinary function). However, in that case &foo by itself is not sufficient to uniquely name a specific function. To do that, the type may be refined by using a signature literal as a postfix operator:
&foo:(Int,Num)
It still just returns the Routine object. A call may also be partially applied by using the .assuming method:
&foo.assuming(1,2,3,:mice<blind>)
From t/spec/S03-operators/subscript-adverbs.t lines 7–199 (no results): (skip)
Highlighted: small|full @array = <A B>;
@array[0,1,2]; # returns 'A', 'B', Nil
@array[0,1,2] :p; # returns 0 => 'A', 1 => 'B'
@array[0,1,2] :kv; # returns 0, 'A', 1, 'B'
@array[0,1,2] :k; # returns 0, 1
@array[0,1,2] :v; # returns 'A', 'B'
%hash = (:a<A>, :b<B>);
%hash<a b c>; # returns 'A', 'B', Nil
%hash<a b c> :p; # returns a => 'A', b => 'B'
%hash<a b c> :kv; # returns 'a', 'A', 'b', 'B'
%hash<a b c> :k; # returns 'a', 'b'
%hash<a b c> :v; # returns 'A', 'B'
These adverbial forms all weed out non-existing entries. You may also perform an existence test, which will return true if all the elements of the slice exist:
if %hash<a b c> :exists {...}
likewise,
my ($a,$b,$c) = %hash<a b c> :delete;
From t/spec/S32-hash/delete.t lines 6–62 (no results): (skip)
Highlighted: small|fulldeletes the entries "en passant" while returning them. (Of course, any of these forms also work in the degenerate case of a slice containing a single index.) Note that these forms work by virtue of the fact that the subscript is the topmost previous operator. You may have to parenthesize or force list context if some other operator that is tighter than comma would appear to be topmost:
1 + (%hash{$x} :delete);
$x = (%hash{$x} :delete);
($x) = %hash{$x} :delete;
(The situation does not often arise for the slice modifiers above because they are usually used in list context, which operates at comma precedence.)
Int or Num), a Hash object becomes the number of pairs contained in the hash. In a boolean context, a Hash object is true if there are any pairs in the hash. In either case, any intrinsic iterator would be reset. (If hashes do carry an intrinsic iterator (as they do in Perl 5), there will be a .reset method on the hash object to reset the iterator explicitly.)
sort see S29.
$*PID or @*ARGS.
From t/spec/S02-names_and_variables/varnames.t lines 7–19 (no results): (skip)
Highlighted: small|full$_ and @_, as well as the new $/, which is the return value of the last regex match. $0, $1, $2, etc., are aliases into the $/ object.
$#foo notation is dead. Use @foo.end or @foo[*-1] instead. (Or @foo.shape[$dimension] for multidimensional arrays.)
From t/spec/S02-names_and_variables/names.t lines 9–67 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-names/identifier.t lines 6–91 (no results): (skip)
Highlighted: small|fullA name is anything that is a legal part of a variable name (not counting the sigil). This includes
$foo # simple identifiers
$Foo::Bar::baz # compound identifiers separated by ::
$Foo::($bar)::baz # compound identifiers that perform interpolations
$42 # numeric names
$! # certain punctuational variables
When not used as a sigil, the semantic function of :: within a name is to force the preceding portion of the name to be considered a package through which the subsequent portion of the name is to be located. If the preceding portion is null, it means the package is unspecified and must be searched for according to the nature of what follows. Generally this means that an initial :: following the main sigil is a no-op on names that are known at compile time, though ::() can also be used to introduce an interpolation (see below). Also, in the absence of another sigil, :: can serve as its own sigil indicating intentional use of a not-yet-declared package name.
Unlike in Perl 5, if a sigil is followed by comma, semicolon, a colon not followed by an identifier, or any kind of bracket or whitespace (including Unicode brackets and whitespace), it will be taken to be a sigil without a name rather than a punctuational variable. This allows you to use sigils as coercion operators:
print $( foo() ) # foo called in item context
print %( foo() ) # foo called in hash context
In declarative constructs bare sigils may be used as placeholders for anonymous variables:
my ($a, $, $c) = 1..3;
print unless (state $)++;
Outside of declarative constructs you may use * for a placeholder:
($a, *, $c) = 1..3;
Attempts to say something like:
($a, $, $c) = 1..3;
will result in the message, "Anonymous variable requires declarator".
$Foo::Bar::baz # the $baz variable in package Foo::Bar
Sometimes it's clearer to keep the sigil with the variable name, so an alternate way to write this is:
Foo::Bar::<$baz>
This is resolved at compile time because the variable name is a constant.
From t/spec/S02-names_and_variables/variables-and-packages.t lines 7–18 (no results): (skip)
Highlighted: small|full MY # Symbols in the current lexical scope (aka $?SCOPE)
OUR # Symbols in the current package (aka $?PACKAGE)
From t/spec/S02-names/our.t lines 6–41 (no results): (skip)
Highlighted: small|full CORE # Outermost lexical scope, definition of standard Perl
GLOBAL # Interpreter-wide package symbols, really UNIT::GLOBAL
PROCESS # Process-related globals (superglobals)
COMPILING # Lexical symbols in the scope being compiled
CALLER # Contextual symbols in the immediate caller's lexical scope
DYNAMIC # Contextual symbols in my or any caller's lexical scope
The following relative names are also reserved but may be used anywhere in a name:
OUTER # Symbols in the next outer lexical scope
UNIT # Symbols in the outermost lexical scope of compilation unit
SETTING # Lexical symbols in the unit's DSL (usually CORE)
PARENT # Symbols in this package's parent package (or lexical scope)
The following is reserved at the beginning of method names in method calls:
SUPER # Package symbols declared in inherited classes
Other all-caps names are semi-reserved. We may add more of them in the future, so you can protect yourself from future collisions by using mixed case on your top-level packages. (We promise not to break any existing top-level CPAN package, of course. Except maybe ACME, and then only for coyotes.)
The file's scope is known as UNIT, but there are one or more lexical scopes outside of that corresponding to the linguistic setting (often known as the prelude in other cultures). Hence, the SETTING scope is equivalent to UNIT::OUTER. For a standard Perl program SETTING is the same as CORE, but various startup options (such as -n or -p) can put you into a domain specific language, in which case CORE remains the scope of the standard language, while SETTING represents the scope defining the DSL that functions as the setting of the current file. See also the -L/--language switch described in S19-commandline. If a setting wishes to gain control of the main execution, it merely needs to declare a MAIN routine as documented in S06. In this case the ordinary execution of the user's code is suppressed; instead, execution of the user's code is entirely delegated to the setting's MAIN routine, which calls back to the user's lexically embedded code with YOU_ARE_HERE.
Note that, since the UNIT of an eval is the eval string itself, the SETTING of an eval is the language in effect at the point of the eval, not the language in effect at the top of the file. (You may, however, use OUTER::SETTING to get the setting of the code that is executing the eval.) In more traditional terms, the normal program is functioning as the "prelude" of the eval.
So the outermost lexical scopes nest like this, traversed via OUTER:
CORE <= SETTING < UNIT < (your_block_here)
The outermost package scopes nest like this, traversed via PARENT:
GLOBAL < (your_package_here)
You main program starts up in the GLOBAL package and the UNIT lexical scope. Whenever anything is declared with "our" semantics, it inserts a name into both the current package and the current lexical scope. (And "my" semantics only insert into the current lexical scope.) Note that the standard setting, CORE, is a lexical scope, not a package; the various items that are defined within (or imported into) CORE are *not* in GLOBAL, which is pretty much empty when your program starts compiling, and mostly only contains things you either put there yourself, or some other module put there because you used that module. In general things defined within (or imported into) CORE should only be declared or imported with "my" semantics. All Perl code can see CORE anyway as the outermost lexical scope, so there's no need to also put such things into GLOBAL.
The GLOBAL package itself is accessible via UNIT::GLOBAL. The PROCESS package is accessible via UNIT::PROCESS. The PROCESS package is not the parent of GLOBAL. However, searching up the dynamic stack for dynamic variables will look in all nested dynamic scopes (mapped automatically to each call's lexical scope, not package scope) out to the main dynamic scope; once all the dynamic scopes are exhausted, it also looks in the GLOBAL package and then in the PROCESS package, so $*OUT typically finds the process's standard output handle. Hence, PROCESS and GLOBAL serve as extra outer dynamic scopes, much like CORE and SETTING function as extra outer lexical scopes.
Extra SETTING scopes keep their identity and their nesting within CORE, so you may have to go to OUTER several times from UNIT before you get to CORE. Normally, however, there is only the core setting, in which case UNIT::OUTER ends up meaning the same as SETTING which is the same as CORE.
Extra GLOBAL scopes are treated differently. Every compilation unit has its own associated UNIT::GLOBAL package. As the currently compiling compilation unit expresses the need for various other compilation units, the global names known to those other units must be merged into the new unit's UNIT::GLOBAL. (This includes the names in all the packages within the global package.) If two different units use the same global name, they must generally be taken to refer to the same item, but only if the type signatures can be meshed (and augmentation rules followed, in the case of package names). If two units provide package names with incompatible type signatures, the compilation of the unit fails. In other words, you may not use incompatible global types to provide a union type. However, if one or the other unit underspecifies the type in a compatible way, the underspecified type just takes on the extra type information as it learns it. (Presumably some combination of Liskov substitution, duck-typing, and run-time checking will prevent tragedy in the unit that was compiled with the underspecified type. Alternately, the compiler is allowed to recompile or re-examine the unit with the new type constraints to see if any issues are certain to arise at run time, in which case the compiler is free to complain.)
Any dynamic variable declared with our in the user's main program (specifically, the part compiled with GLOBAL as the current package) is accessible (by virtue of being in GLOBAL) as a dynamic variable even if not directly in the dynamic call chain. Note that dynamic vars do *not* look in CORE for anything. (They might look in SETTING if you're running under a setting distinct from CORE, if that setting defines a dynamic scope outside your main program, such as for the -n or -p switch.) Context variables declared with our in the GLOBAL or PROCESS packages do not need to use the * twigil, since the twigil is stripped before searching those packages. Hence, your environment variables are effectively declared without the sigil:
augment package GLOBAL { our %ENV; }
::($expr) where you'd ordinarily put a package or variable name. The string is allowed to contain additional instances of ::, which will be interpreted as package nesting. You may only interpolate entire names, since the construct starts with ::, and either ends immediately or is continued with another :: outside the parens. Most symbolic references are done with this notation:
From t/spec/S02-names/symbolic-deref.t lines 11–130 (no results): (skip)
Highlighted: small|full $foo = "Bar";
$foobar = "Foo::Bar";
$::($foo) # lexically-scoped $Bar
$::("MY::$foo") # lexically-scoped $Bar
$::("OUR::$foo") # package-scoped $Bar
$::("GLOBAL::$foo") # global $Bar
$::("PROCESS::$foo")# process $Bar
$::("PARENT::$foo") # current package's parent's $Bar
$::($foobar) # $Foo::Bar
$::($foobar)::baz # $Foo::Bar::baz
$::($foo)::Bar::baz # $Bar::Bar::baz
$::($foobar)baz # ILLEGAL at compile time (no operator baz)
Note that unlike in Perl 5, initial :: doesn't imply global. Here as part of the interpolation syntax it doesn't even imply package. After the interpolation of the ::() component, the indirect name is looked up exactly as if it had been there in the original source code, with priority given first to leading pseudo-package names, then to names in the lexical scope (searching scopes outwards, ending at CORE). The current package is searched last.
Use the MY pseudopackage to limit the lookup to the current lexical scope, and OUR to limit the scopes to the current package scope.
$x and @y) are only looked up from lexical scopes, but never from package scopes.
To bind package variables into a lexical scope, simply say our ($x, @y). To bind global variables into a lexical scope, predeclare them with use:
use PROCESS <$IN $OUT>;
Or just refer to them as $*IN and $*OUT.
Foo::Bar::{'&baz'} # same as &Foo::Bar::baz
PROCESS::<$IN> # Same as $*IN
Foo::<::Bar><::Baz> # same as Foo::Bar::Baz
The :: before the subscript is required here, because the Foo::Bar{...} syntax is reserved for attaching a "WHENCE" initialization closure to an autovivifiable type object. (see S12).
Unlike ::() symbolic references, this does not parse the argument for ::, nor does it initiate a namespace scan from that initial point. In addition, for constant subscripts, it is guaranteed to resolve the symbol at compile time.
The null pseudo-package is reserved to mean the same search list as an ordinary name search. That is, the following are all identical in meaning:
$foo
$::{'foo'}
::{'$foo'}
$::<foo>
::<$foo>
That is, each of them scans lexical scopes outward, and then the current package scope (though the package scope is then disallowed when "strict" is in effect).
As a result of these rules, you can write any arbitrary variable name as either of:
$::{'!@#$#@'}
::{'$!@#$#@'}
You can also use the ::<> form as long as there are no spaces in the name.
MY. The current package symbol table is visible as pseudo-package OUR. The OUTER name refers to the MY symbol table immediately surrounding the current MY, and OUTER::OUTER is the one surrounding that one.
From t/spec/S02-names_and_variables/variables-and-packages.t lines 19–140 (no results): (skip)
Highlighted: small|full our $foo = 41;
say $::foo; # prints 41, :: is no-op
{
my $foo = 42;
say MY::<$foo>; # prints "42"
say $MY::foo; # same thing
say $::foo; # same thing, :: is no-op here
say OUR::<$foo>; # prints "41"
say $OUR::foo; # same thing
say OUTER::<$foo>; # prints "41" (our $foo is also lexical)
say $OUTER::foo; # same thing
}
You may not use any lexically scoped symbol table, either by name or by reference, to add symbols to a lexical scope that is done compiling. (We reserve the right to relax this if it turns out to be useful though.)
CALLER package refers to the lexical scope of the (dynamically scoped) caller. The caller's lexical scope is allowed to hide any user-defined variable from you. In fact, that's the default, and a lexical variable must have the trait "is dynamic" to be visible via CALLER. ($_, $! and $/ are always dynamic, as are any variables whose declared names contain a * twigil.) If the variable is not visible in the caller, it returns failure. Variables whose names are visible at the point of the call but that come from outside that lexical scope are controlled by the scope in which they were originally declared as dynamic. Hence the visibility of CALLER::<$*foo> is determined where $*foo is actually declared, not by the caller's scope (unless that's where it happens to be declared). Likewise CALLER::CALLER::<$x> depends only on the declaration of $x visible in your caller's caller.
From t/spec/S02-names/caller.t lines 51–181 (no results): (skip)
Highlighted: small|fullUser-defined dynamic variables should generally be initialized with ::= unless it is necessary for variable to be modified. (Marking dynamic variables as readonly is very helpful in terms of sharing the same value among competing threads, since a readonly variable need not be locked.)
DYNAMIC pseudo-package is just like CALLER except that it starts in the current dynamic scope and from there scans outward through all dynamic scopes (frames) until it finds a dynamic variable of that name in that dynamic frame's associated lexical pad. (This search is implied for variables with the * twigil; hence $*FOO is equivalent to DYNAMIC::<$*FOO>.) If, after scanning outward through all those dynamic scopes, there is no variable of that name in any immediately associated lexical pad, it strips the * twigil out of the name and looks in the GLOBAL package followed by the PROCESS package. If the value is not found, it returns failure.
Unlike CALLER, DYNAMIC will see a dynamic variable that is declared in the current scope, since it starts search 0 scopes up the stack rather than 1. You may, however, use CALLER::<$*foo> to bypass a dynamic definition of $*foo in your current scope, such as to initialize it with the outer dynamic value:
my $*foo ::= CALLER::<$*foo>;
The temp declarator may be used (without an initializer) on a dynamic variable to perform a similar operation:
temp $*foo;
The main difference is that by default it initializes the new $*foo with its current value, rather than the caller's value. Also, it is allowed only on read/write dynamic variables, since the only reason to make a copy of the outer value would be because you'd want to override it later and then forget the changes at the end of the current dynamic scope.
You may also use OUTER::<$*foo> to mean you want to start the search in your outer lexical scope, but this will succeed only if that outer lexical scope also happens to be be one of your current dynamic scopes. That is, the same search is done as with the bare $*foo, but any "hits" are ignored until we've got to the OUTER scope in our traversal.
%Foo::. Just subscript the package object itself as a hash object, the key of which is the variable name, including any sigil. The package object can be derived from a type name by use of the :: postfix:
MyType::<$foo>
(Directly subscripting the type with either square brackets or curlies is reserved for various generic type-theoretic operations. In most other matters type names and package names are interchangeable.)
Typeglobs are gone. Use binding (:= or ::=) to do aliasing. Individual variable objects are still accessible through the hash representing each symbol table, but you have to include the sigil in the variable name now: MyPackage::{'$foo'} or the equivalent MyPackage::<$foo>.
GLOBAL package. The user's program starts in the GLOBAL package, so "our" declarations in the mainline code go into that package by default. Process-wide variables live in the PROCESS package. Most predefined globals such as $*UID and %*PID are actually process globals.
PROCESS package. For an ordinary Perl program running by itself, there is only one GLOBAL package as well. However, in certain situations (such as shared hosting under a webserver), the actual process may contain multiple virtual processes or interpreters, each running its own "main" code. In this case, the GLOBAL namespace holds variables that properly belong to the individual virtual process, while the PROCESS namespace holds variables that properly belong to the actual process as a whole. From the viewpoint of the program there is little difference as long as all global variables are accessed as if they were dynamic variables (by using the * twigil). The process as a whole may place restrictions on the mutability of process variables as seen by the individual subprocesses. Also, individual subprocesses may not create new process variables. If the process wishes to grant subprocesses the ability to communicate via the PROCESS namespace, it must supply a writeable dynamic variable to all the subprocesses granted that privilege.
$*ARGFILES. The arguments themselves come in @*ARGS. See also "Declaring a MAIN subroutine" in S06.
= secondary sigil. $=DATA is the name of your DATA filehandle, for instance. All Pod structures are available through %=POD (or some such). As with *, the = may also be used as a package name: $=::DATA.
? secondary sigil. These are all values that are known to the compiler, and may in fact be dynamically scoped within the compiler itself, and only appear to be lexically scoped because dynamic scopes of the compiler resolve to lexical scopes of the program. All $? variables are considered constants, and may not be modified after being compiled in. The user is also allowed to define or (redefine) such constants:
constant $?TABSTOP = 4; # assume heredoc tabs mean 4 spaces
(Note that the constant declarator always evaluates its initialization expression at compile time.)
$?FILE and $?LINE are your current file and line number, for instance. Instead of $?OUTER::FOO you probably want to write OUTER::<$?FOO>. Within code that is being run during the compile, such as BEGIN blocks, or macro bodies, or constant initializers, the compiler variables must be referred to as (for instance) COMPILING::<$?LINE> if the bare $?LINE would be taken to be the value during the compilation of the currently running code rather than the eventual code of the user's compilation unit. For instance, within a macro body $?LINE is the line within the macro body, but COMPILING::<$?LINE> is the line where the macro was invoked. See below for more about the COMPILING pseudo package.
Here are some possibilities:
$?FILE Which file am I in?
From t/spec/S02-magicals/file_line.t lines 12–20 (no results): (skip)
Highlighted: small|full$?LINE Which line am I at?
From t/spec/S02-magicals/file_line.t lines 8–11 (no results): (skip)
Highlighted: small|full&?ROUTINE Which routine am I in?
From t/spec/S02-magicals/sub.t lines 14–31 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-magicals/subname.t lines 9–26 (no results): (skip)
Highlighted: small|full&?BLOCK Which block am I in?
From t/spec/S02-magicals/block.t lines 16–23 (no results): (skip)
Highlighted: small|full%?LANG What is the current set of interwoven languages?
The following return objects that contain all pertinent info:
$?KERNEL Which kernel am I compiled for?
$?DISTRO Which OS distribution am I compiling under
$?VM Which virtual machine am I compiling under
$?XVM Which virtual machine am I cross-compiling for
$?PERL Which Perl am I compiled for?
$?SCOPE Which lexical scope am I in?
$?PACKAGE Which package am I in?
$?MODULE Which module am I in?
$?CLASS Which class am I in? (as variable)
From t/spec/S12-class/magical-vars.t lines 7–128 (no results): (skip)
Highlighted: small|full $?ROLE Which role am I in? (as variable)
$?GRAMMAR Which grammar am I in?
It is relatively easy to smartmatch these constant objects against pairs to check various attributes such as name, version, or authority:
given $?VM {
when :name<Parrot> :ver(v2) { ... }
when :name<CLOS> { ... }
when :name<SpiderMonkey> { ... }
when :name<JVM> :ver(v6.*) { ... }
}
Matches of constant pairs on constant objects may all be resolved at compile time, so dead code can be eliminated by the optimizer.
Note that some of these things have parallels in the * space at run time:
$*KERNEL Which kernel I'm running under
$*DISTRO Which OS distribution I'm running under
$*VM Which VM I'm running under
$*PERL Which Perl I'm running under
You should not assume that these will have the same value as their compile-time cousins.
From t/spec/S02-magicals/perlver.t lines 6–17 (no results): (skip)
Highlighted: small|full$? variables are constant to the run time, the compiler has to have a way of changing these values at compile time without getting confused about its own $? variables (which were frozen in when the compile-time code was itself compiled). The compiler can talk about these compiler-dynamic values using the COMPILING pseudopackage.
References to COMPILING variables are automatically hoisted into the lexical scope currently being compiled. Setting or temporizing a COMPILING variable sets or temporizes the incipient $? variable in the surrounding lexical scope that is being compiled. If nothing in the context is being compiled, an exception is thrown.
$?FOO // say "undefined"; # probably says undefined
BEGIN { COMPILING::<$?FOO> = 42 }
say $?FOO; # prints 42
{
say $?FOO; # prints 42
BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block
say $?FOO; # prints 43
BEGIN { COMPILING::<$?FOO> = 44 }
say $?FOO; # prints 44
BEGIN { say COMPILING::<$?FOO> } # prints 44, but $?FOO probably undefined
}
say $?FOO; # prints 42 (left scope of temp above)
$?FOO = 45; # always an error
COMPILING::<$?FOO> = 45; # an error unless we are compiling something
Note that CALLER::<$?FOO> might discover the same variable as COMPILING::<$?FOO>, but only if the compiling scope is the immediate caller. Likewise OUTER::<$?FOO> might or might not get you to the right place. In the abstract, COMPILING::<$?FOO> goes outwards dynamically until it finds a compiling scope, and so is guaranteed to find the "right" $?FOO. (In practice, the compiler hopefully keeps track of its current compiling scope anyway, so no scan is needed.)
Perceptive readers will note that this subsumes various "compiler hints" proposals. Crazy readers will wonder whether this means you could set an initial value for other lexicals in the compiling scope. The answer is yes. In fact, this mechanism is probably used by the exporter to bind names into the importer's namespace.
COMPILING::<%?LANG>. Lexically scoped parser changes should temporize the modification. Changes from here to end-of-compilation unit can just assign or bind it. In general, most parser changes involve deriving a new grammar and then pointing one of the COMPILING::<%?LANG> entries at that new grammar. Alternately, the tables driving the current parser can be modified without derivation, but at least one level of anonymous derivation must intervene from the preceding Perl grammar, or you might be messing up someone else's grammar. Basically, the current set of grammars in %?LANG has to belong only to the current compiling scope. It may not be shared, at least not without explicit consent of all parties. No magical syntax at a distance. Consent of the governed, and all that.
~ twigil. The following are useful:
$~MAIN the current main language (e.g. Perl statements)
$~Q the current root of quoting language
$~Quasi the current root of quasiquoting language
$~Regex the current root of regex language
$~Trans the current root of transliteration language
$~P5Regex the current root of the Perl regex language
Hence, when you are defining a normal Perl macro, you're replacing $~MAIN with a derived language, but when you define a new regex backslash sequence, you're replacing $~Regex with a derived language. (There may or may not be a syntax in the main language to do this.) Note that such changes are automatically scoped to the lexical scope; as with real slang, the definitions are temporary and embedded in a larger language inherited from the surrounding culture.
Instead of defining macros directly you may also mix in one or more grammar rules by lexically scoped declaration of a new sublanguage:
augment slang Regex { # derive from $~Regex and then modify $~Regex
token backslash:std<\Y> { YY };
}
This tends to be more efficient since it only has to do one mixin at the end of the block. Note that the slang declaration has nothing to do with package Regex, but only with $~Regex. Sublanguages are in their own namespace (inside the current value of %?LANG, in fact). Hence augment is modifying one of the local strands of a braided language, not a package somewhere else.
You may also supersede a sublang entirely if, for example, you just want to disable that sublanguage in the current lexical scope:
supersede slang P5Regex {}
m:P5/./; # kaboom
If you supersede MAIN then you're replacing the Perl parser entirely. This might be done by, say, the "use COBOL" declaration. :-)
ThatModule, but the complete long name of a module includes its version, naming authority, and perhaps even its source language. Similarly, sets of operators work together in various syntactic categories with names like prefix, infix, postfix, etc. The long names of these operators, however, often contain characters that are excluded from ordinary identifiers.
For all such uses, an identifier followed by a subscript-like adverbial form (see below) is considered an extended identifier:
infix:<+> # the official name of the operator in $a + $b
infix:<*> # the official name of the operator in $a * $b
infix:«<=» # the official name of the operator in $a <= $b
prefix:<+> # the official name of the operator in +$a
postfix:<--> # the official name of the operator in $a--
This name is to be thought of semantically, not syntactically. That is, the bracketing characters used do not count as part of the name; only the quoted data matters. These are all the same name:
infix:<+>
infix:<<+>>
infix:«+»
infix:['+']
Despite the appearance as a subscripting form, these names are resolved not at run time but at compile time. The pseudo-subscripts need not be simple scalars. These are extended with the same two-element list:
infix:<?? !!>
infix:['??','!!']
An identifier may be extended with multiple named identifier extensions, in which case the names matter but their order does not. These name the same module:
use ThatModule:auth<Somebody>:ver<2.7.18.28.18>
use ThatModule:ver<2.7.18.28.18>:auth<Somebody>
Adverbial syntax will be described more fully later.
From t/spec/S02-literals/quoting-unicode.t lines 5–87 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/listquote.t lines 13–68 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/pair-boolean.t lines 13–34 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/underscores.t lines 6–54 (no results): (skip)
Highlighted: small|full0 no longer indicates octal numbers by itself. You must use an explicit radix marker for that. Pre-defined radix prefixes include:
0b base 2, digits 0..1
0o base 8, digits 0..7
0d base 10, digits 0..9
0x base 16, digits 0..9,a..f (case insensitive)
:10<42> same as 0d42 or 42
From t/spec/S02-literals/radix.t lines 6–17 (no results): (skip)
Highlighted: small|full:16<DEAD_BEEF> same as 0xDEADBEEF
From t/spec/S02-literals/radix.t lines 33–82 (no results): (skip)
Highlighted: small|full:8<177777> same as 0o177777 (65535)
From t/spec/S02-literals/radix.t lines 112–142 (no results): (skip)
Highlighted: small|full:2<1.1> same as 0b1.1 (0d1.5)
Extra digits are assumed to be represented by a..z and A..Z, so you can go up to base 36. (Use A and B for base twelve, not T and E.) Alternately you can use a list of digits in decimal:
From t/spec/S02-literals/radix.t lines 203–274 (no results): (skip)
Highlighted: small|full :60[12,34,56] # 12 * 3600 + 34 * 60 + 56
:100[3,'.',14,16] # pi
All numbers representing digits must be less than the radix, or an error will result (at compile time if constant-folding can catch it, or at run time otherwise).
Any radix may include a fractional part. A dot is never ambiguous because you have to tell it where the number ends:
From t/spec/S02-literals/radix.t lines 106–111 (no results): (skip)
Highlighted: small|full :16<dead_beef.face> # fraction
:16<dead_beef>.face # method call
From t/spec/S02-literals/radix.t lines 96–105 (no results): (skip)
Highlighted: small|full :16<dead_beef> * 16**8
:16<dead_beef*16**8>
It's true that only radixes that define e as a digit are ambiguous that way, but with any radix it's not clear whether the exponentiator should be 10 or the radix, and this makes it explicit:
From t/spec/S02-literals/radix.t lines 181–183 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/radix.t lines 184–190 (no results): (skip)
Highlighted: small|full0b1.1e10 ILLEGAL, could be read as any of:
:2<1.1> * 2 ** 10 1536
:2<1.1> * 10 ** 10 15,000,000,000
:2<1.1> * :2<10> ** :2<10> 6
So we write those as
From t/spec/S02-literals/radix.t lines 191–202 (no results): (skip)
Highlighted: small|full :2<1.1*2**10> 1536
:2<1.1*10**10> 15,000,000,000
:2«1.1*:2<10>**:2<10>» 6
The generic string-to-number converter will recognize all of these forms (including the * form, since constant folding is not available to the run time). Also allowed in strings are leading plus or minus, and maybe a trailing Units type for an implied scaling. Leading and trailing whitespace is ignored. Note also that leading 0 by itself never implies octal in Perl 6.
Any of the adverbial forms may be used as a function:
:2($x) # "bin2num"
:8($x) # "oct2num"
:10($x) # "dec2num"
:16($x) # "hex2num"
Think of these as setting the default radix, not forcing it. Like Perl 5's old oct() function, any of these will recognize a number starting with a different radix marker and switch to the other radix. However, note that the :16() converter function will interpret leading 0b or 0d as hex digits, not radix switchers.
From t/spec/S02-literals/radix.t lines 18–31 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/radix.t lines 83–86 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/radix.t lines 87–95 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/radix.t lines 143–155 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/radix.t lines 169–180 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/numeric.t lines 6–16 (no results): (skip)
Highlighted: small|full 1/2 # one half literal Rat
1 / 2 # 1 divided by 2 (also produces a Rat by constant folding)
Note that this essentially overrides precedence to produce a term, so:
1/2 * 3/4
means
(1 / 2) * (3 / 4)
rather than
((1 / 2) * 3) / 4
Decimal fractions not using "e" notation are also stored as Rat values:
6.02e23.WHAT # Num
1.23456.WHAT # Rat
0.11 == 11/100 # True
From t/spec/S02-literals/numeric.t lines 17–26 (no results): (skip)
Highlighted: small|full 5.2+1e42i
3-1i
As with rational literals, constant folding would produce the same complex number, but this form parses as a single term, ignoring surrounding precedence.
"\x", followed by either a bare hex number ("\x263a") or a hex number in square brackets ("\x[263a]"). Similarly, "\o12" and "\o[12]" interpolate octals--but generally you should be using hex in the world of Unicode. Multiple characters may be specified within any of the bracketed forms by separating the numbers with comma: "\x[41,42,43]". You must use the bracketed form to disambiguate if the unbracketed form would "eat" too many characters, because all of the unbracketed forms eat as many characters as they think look like digits in the radix specified. None of these notations work in normal Perl code. They work only in interpolations and regexes and the like.
From t/spec/S02-literals/hex_chars.t lines 6–25 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/misc-interpolation.t lines 85–95 (no results): (skip)
Highlighted: small|fullThe old \123 form is now illegal, as is the \0123 form. Only \0 remains, and then only if the next character is not in the range '0'..'7'. Octal characters must use \o notation. Note also that backreferences are no longer represented by \1 and the like--see S05.
From t/spec/S02-literals/char-by-number.t lines 65–70 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/char-by-number.t lines 71–76 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/char-by-number.t lines 77–89 (no results): (skip)
Highlighted: small|fullqw/foo bar/ quote operator now has a bracketed form: <foo bar>. When used as a subscript it performs a slice equivalent to {'foo','bar'}. Elsewhere it is equivalent to a parenthesized list of strings: ('foo','bar'). Since parentheses are generally reserved just for precedence grouping, they merely autointerpolate in list context. Therefore
From t/spec/S02-literals/quoting.t lines 181–206 (no results): (skip)
Highlighted: small|full@a = 1, < x y >, 2;
is equivalent to:
@a = 1, ('x', 'y'), 2;
which is the same as:
@a = 1, 'x', 'y', 2;
In item context, though, the implied parentheses are not removed, so
From t/spec/S02-literals/autoref.t lines 4–280 (no results): (skip)
Highlighted: small|full$a = < a b >;
is equivalent to:
$a = ('a', 'b');
which, because the list is assigned to a scalar, is autopromoted into a Capture object:
From t/spec/S02-literals/listquote.t lines 75–84 (no results): (skip)
Highlighted: small|full $a = \('a', 'b');
Likewise, if bound to a scalar parameter, <a b> will be treated as a single Capture object, but if bound to a slurpy parameter, it will auto-flatten.
But note that under the parenthesis-rewrite rule, a single value will still act like a scalar value. These are all the same:
$a = < a >;
$a = ('a');
$a = 'a';
And if bound to a scalar parameter, no list is constructed. To force a single value to become a list object in item context, you should use ['a'] for clarity as well as correctness.
For any item in the list that appears to be numeric, the literal is stored as an object with both a string and a numeric nature, where the string nature always returns the original string. It is as if the item is converted to an appropriate numeric type, then a Str conversion is mixed in that reproduces the original string (if normal stringification would produce something else). Hence:
From t/spec/S02-literals/listquote.t lines 85–97 (no results): (skip)
Highlighted: small|full< 1 1/2 6.02e23 1+2i > # Int/Str Rat/Str Num/Str Complex/Str
The purpose of this would be to facilitate compile-time analysis of multi-method dispatch, when the user prefers angle notation as the most readable way to represent a list of numbers, which it often is. It also gives us a reasonable way of visually isolating any known literal format as a single syntactic unit:
<-1+2i>.polar
(-1+2i).polar # same, but less clearly a literal
The degenerate case <> is disallowed as a probable attempt to do IO in the style of Perl 5; that is now written lines(). (<STDIN> is also disallowed.) Empty lists are better written with () or Nil in any case because <> will often be misread as meaning (''). (Likewise the subscript form %foo<> should be written %foo{} to avoid misreading as @foo{''}.) If you really want the angle form for stylistic reasons, you can suppress the error by putting a space inside: < >.
From t/spec/S02-literals/listquote.t lines 69–74 (no results): (skip)
Highlighted: small|fullMuch like the relationship between single quotes and double quotes, single angles do not interpolate while double angles do. The double angles may be written either with French quotes, «$foo @bar[]», or with "Texas" quotes, <<$foo @bar[]>>, as the ASCII workaround. The implicit split is done after interpolation, but respects quotes in a shell-like fashion, so that «'$foo' "@bar[]"» is guaranteed to produce a list of two "words" equivalent to ('$foo', "@bar[]"). Pair notation is also recognized inside «...» and such "words" are returned as Pair objects.
From t/spec/S02-literals/quoting.t lines 232–246 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 261–267 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 268–284 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 337–381 (no results): (skip)
Highlighted: small|fullColon pairs (but not arrow pairs) are recognized within double angles. In addition, the double angles allow for comments beginning with #. These comments work exactly like ordinary comments in Perl code. Unlike in the shells, any literal # must be quoted, even ones without whitespace in front of them, but note that this comes more or less for free with a colon pair like :char<#x263a>, since comments only work in double angles, not single.
From t/spec/S02-literals/pairs.t lines 5–137 (no results): (skip)
Highlighted: small|fullFrom t/spec/S12-construction/autopairs.t lines 8–21 (no results): (skip)
Highlighted: small|full Fat arrow Adverbial pair Paren form
========= ============== ==========
a => True :a
a => False :!a
a => 0 :a(0)
a => $x :a($x)
a => 'foo' :a<foo> :a(<foo>)
a => <foo bar> :a<foo bar> :a(<foo bar>)
a => «$foo @bar» :a«$foo @bar» :a(«$foo @bar»)
a => {...} :a{...} :a({...})
a => [...] :a[...] :a([...])
a => $a :$a
a => @a :@a
a => %a :%a
a => &a :&a
a => $$a :$$a
a => @$$a :@$$a (etc.)
a => %foo<a> %foo<a>:p
The fatarrow construct may be used only where a term is expected because it's considered an expression in its own right, since the fatarrow itself is parsed as a normal infix operator (even when autoquoting an identifier on its left). Because the left side is a general expression, the fatarrow form may be used to create a Pair with any value as the key. On the other hand, when used as above to generate Pair objects, the adverbial forms are restricted to the use of identifiers as keys. You must use the fatarrow form to generate a Pair where the key is not an identifier.
Despite that restriction, it's possible for other things to come between a colon and its brackets; however, all of the possible non-identifier adverbial keys are reserved for special syntactical forms. Perl 6 currently recognizes decimal numbers and the null key. In the following table the first and second columns do not mean the same thing:
Simple pair DIFFERS from which means
=========== ============ ===========
2 => <101010> :2<101010> radix literal 0b101010
8 => <123> :8<123> radix literal 0o123
16 => <deadbeef> :16<deadbeef> radix literal 0xdeadbeef
16 => $somevalue :16($somevalue) radix conversion function
'' => $x :($x) arglist or signature literal
'' => ($x,$y) :($x,$y) arglist or signature literal
'' => <x> :<x> identifier extension
'' => «x» :«x» identifier extension
'' => [$x,$y] :[$x,$y] identifier extension
'' => { .say } :{ .say } adverbial block
All of the adverbial forms (including the normal ones with identifier keys) are considered special tokens and are recognized in various positions in addition to term position. In particular, when used where an infix would be expected they modify the previous topmost operator that is tighter in precedence than "loose unary" (see S03):
1 == 100 :fuzz(3) # calls: infix:<==>(1, 100, fuzz => 3)
Within declarations the adverbial form is used to rename parameter declarations:
sub foo ( :externalname($myname) ) {...}
Adverbs modify the meaning of various quoting forms:
q:x 'cat /etc/passwd'
When appended to an identifier (that is, in postfix position), the adverbial syntax is used to generate unique variants of that identifier; this syntax is used for naming operators such as infix:<+> and multiply-dispatched grammatical rules such as statement_control:if. When so used, the adverb is considered an integral part of the name, so infix:<+> and infix:<-> are two different operators. Likewise prefix:<+> is different from infix:<+>. (The notation also has the benefit of grouping distinct identifiers into easily accessible sets; this is how the standard Perl 6 grammar knows the current set of infix operators, for instance.)
Either fatarrow or adverbial pair notation may be used to pass named arguments as terms to a function or method. After a call with parenthesized arguments, only the adverbial syntax may be used to pass additional arguments. This is typically used to pass an extra block:
find($directory) :{ when not /^\./ }
This just naturally falls out from the preceding rules because the adverbial block is in operator position, so it modifies the "find operator". (Parens aren't considered an operator.)
Note that (as usual) the {...} form (either identifier-based or special) can indicate either a closure or a hash depending on the contents. It does not always indicate a subscript despite being parsed as one. (The function to which it is passed can use the value as a subscript if it chooses, however.)
Note also that the <a b> form is not a subscript and is therefore equivalent not to .{'a','b'} but rather to ('a','b'). Bare <a> turns into ('a') rather than ('a',). (However, as with the other bracketed forms, the value may end up being used as a subscript depending on context.)
Two or more adverbs can always be strung together without intervening punctuation anywhere a single adverb is acceptable. When used as named arguments in an argument list, you may put comma between, because they're just ordinary named arguments to the function, and a fatarrow pair would work the same. However, this comma is allowed only when the first pair occurs where a term is expected. Where an infix operator is expected, the adverb is always taken as modifying the nearest preceding operator that is not hidden within parentheses, and if you string together multiple such pairs, you may not put commas between, since that would cause subsequent pairs to look like terms. (The fatarrow form is not allowed at all in operator position.) See S06 for the use of adverbs as named arguments.
The negated form (:!a) and the sigiled forms (:$a, :@a, :%a) never take an argument and don't care what the next character is. They are considered complete. These forms require an identifier to serve as the key.
For identifiers that take a numeric argument, it is allowed to abbreviate, for example, :sweet(16) to :16sweet. (This is distinguishable from the :16<deadbeef> form, which never has an alphabetic character following the number.) Only literal decimal numbers may be swapped this way.
The other forms of adverb (including the bare :a form) always look for an immediate bracketed argument, and will slurp it up. If that's not intended, you must use whitespace between the adverb and the opening bracket. The syntax of individual adverbs is the same everywhere in Perl 6. There are no exceptions based on whether an argument is wanted or not. (There is a minor exception for quote and regex adverbs, which accept only parentheses as their bracketing operator, and ignore other brackets, which must be placed in parens if desired. See "Paren form" in the table above.)
Except as noted above, the parser always looks for the brackets. Despite not indicating a true subscript, the brackets are similarly parsed as postfix operators. As postfixes the brackets may be separated from their initial :foo with either unspace or dot (or both), but nothing else.
Regardless of syntax, adverbs used as named arguments (in either term or infix position) generally show up as optional named parameters to the function in question--even if the function is an operator or macro. The function in question neither knows nor cares how weird the original syntax was.
q and qq, there is now the base form Q which does no interpolation unless explicitly modified to do so. So q is really short for Q:q and qq is short for Q:qq. In fact, all quote-like forms derive from Q with adverbs:
From t/spec/S02-literals/quoting.t lines 98–120 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 136–143 (no results): (skip)
Highlighted: small|full q// Q :q //
qq// Q :qq //
rx// Q :regex //
s/// Q :subst ///
tr/// Q :trans ///
Adverbs such as :regex change the language to be parsed by switching to a different parser. This can completely change the interpretation of any subsequent adverbs as well as the quoted material itself.
q:s// Q :q :scalar //
rx:s// Q :regex :sigspace //
Short Long Meaning
===== ==== =======
:x :exec Execute as command and return results
:w :words Split result on words (no quote protection)
From t/spec/S02-literals/quoting.t lines 218–231 (no results): (skip)
Highlighted: small|full :ww :quotewords Split result on words (with quote protection)
:q :single Interpolate \\, \q and \' (or whatever)
From t/spec/S02-literals/quoting.t lines 57–71 (no results): (skip)
Highlighted: small|full :qq :double Interpolate with :s, :a, :h, :f, :c, :b
:s :scalar Interpolate $ vars
:a :array Interpolate @ vars
:h :hash Interpolate % vars
From t/spec/S02-literals/quoting.t lines 453–532 (no results): (skip)
Highlighted: small|full :f :function Interpolate & calls
:c :closure Interpolate {...} expressions
:b :backslash Interpolate \n, \t, etc. (implies :q at least)
:to :heredoc Parse result as heredoc terminator
:regex Parse as regex
:subst Parse as substitution
:trans Parse as transliteration
:code Quasiquoting
:p :path Return a Path object (see S16 for more options)
You may omit the first colon by joining an initial Q, q, or qq with a single short form adverb, which produces forms like:
qw /a b c/; # P5-esque qw// meaning q:w
Qc '...{$x}...'; # Q:c//, interpolate only closures
qqx/$cmd @args[]/ # equivalent to P5's qx//
(Note that qx// doesn't interpolate.)
If you want to abbreviate further, just define a macro:
macro qx { 'qq:x ' } # equivalent to P5's qx//
From t/spec/S02-literals/quoting.t lines 121–135 (no results): (skip)
Highlighted: small|full macro qTO { 'qq:x:w:to ' } # qq:x:w:to//
macro quote:<❰ ❱> ($text) { quasi { $text.quoteharder } }
All the uppercase adverbs are reserved for user-defined quotes. All Unicode delimiters above Latin-1 are reserved for user-defined quotes.
From t/spec/S02-literals/quoting.t lines 382–452 (no results): (skip)
Highlighted: small|full %hash = qw:c/a b c d {@array} {%hash}/;
or
%hash = qq:w/a b c d {@array} {%hash}/;
to interpolate items into a qw. Conveniently, arrays and hashes interpolate with only whitespace separators by default, so the subsequent split on whitespace still works out. (But the built-in «...» quoter automatically does interpolation equivalent to qq:ww/.../. The built-in <...> is equivalent to q:w/.../.)
From t/spec/S02-literals/quoting.t lines 247–260 (no results): (skip)
Highlighted: small|fullq :w /.../.
From t/spec/S02-literals/quoting.t lines 207–217 (no results): (skip)
Highlighted: small|full'', "", <>, «», ``, (), [], and {} have no special significance when used in place of // as delimiters. There may be whitespace before the opening delimiter. (Which is mandatory for parens because q() is a subroutine call and q:w(0) is an adverb with arguments). Other brackets may also require whitespace when they would be understood as an argument to an adverb in something like q:z<foo>//. A colon may never be used as the delimiter since it will always be taken to mean another adverb regardless of what's in front of it. Nor may a # character be used as the delimiter since it is always taken as whitespace (specifically, as a comment). You may not use whitespace or alphanumerics for delimiters.
From t/spec/S02-literals/quoting.t lines 72–80 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 81–89 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/quoting.t lines 90–97 (no results): (skip)
Highlighted: small|full macro quote:<qX> (*%adverbs) {...}
Note: macro adverbs are automatically evaluated at macro call time if the adverbs are included in the parse. If an adverb needs to affect the parsing of the quoted text of the macro, then an explicit named parameter may be passed on as a parameter to the is parsed subrule, or used to select which subrule to invoke.
\qq[...] construct. Other "q" forms also work, including user-defined ones, as long as they start with "q". Otherwise you'll just have to embed your construct inside a \qq[...].
From t/spec/S02-literals/quoting.t lines 144–180 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/fmt-interpolation.t lines 15–65 (no results): (skip)
Highlighted: small|fullIn other words, this is legal:
"Val = $a.ord.fmt('%x')\n"
and is equivalent to
"Val = { $a.ord.fmt('%x') }\n"
print "The answers are @foo[]\n"
Note that this fixes the spurious "@" problem in double-quoted email addresses.
As with Perl 5 array interpolation, the elements are separated by a space. (Except that a space is not added if the element already ends in some kind of whitespace. In particular, a list of pairs will interpolate with a tab between the key and value, and a newline after the pair.)
From t/spec/S02-literals/array-interpolation.t lines 5–62 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-literals/hash-interpolation.t lines 22–43 (no results): (skip)
Highlighted: small|full print "The associations are:\n%bar{}"
print "The associations are:\n%bar<>"
Note that this avoids the spurious "%" problem in double-quoted printf formats.
By default, keys and values are separated by tab characters, and pairs are terminated by newlines. (This is almost never what you want, but if you want something polished, you can be more specific.)
print "The results are &baz().\n"
The function is called in item context. (If it returns a list anyway, that list is interpolated as if it were an array in string context.)
From t/spec/S02-literals/misc-interpolation.t lines 52–84 (no results): (skip)
Highlighted: small|full print "The attribute is $obj.attr().\n"
print "The attribute is $obj.attr<Jan>.\n"
The method is called in item context. (If it returns a list, that list is interpolated as if it were an array.)
It is allowed to have a cascade of argumentless methods as long as the last one ends with parens:
print "The attribute is %obj.keys.sort.reverse().\n"
(The cascade is basically counted as a single method call for the end-bracket rule.)
print "The attribute is @baz[3](1,2,3){$xyz}<blurfl>.attr().\n"
Note that the final period above is not taken as part of the expression since it doesn't introduce a bracketed dereferencer.
list operator if necessary. A closure in a string establishes its own lexical scope.
From t/spec/S02-literals/string-interpolation.t lines 5–44 (no results): (skip)
Highlighted: small|fullThe following means the same as the previous example.
print "The attribute is { @baz[3](1,2,3){$xyz}<blurfl>.attr }.\n"
The final parens are unnecessary since we're providing "real" code in the curlies. If you need to have double quotes that don't interpolate curlies, you can explicitly remove the capability:
qq:c(0) "Here are { $two uninterpolated } curlies";
or equivalently:
qq:!c "Here are { $two uninterpolated } curlies";
Alternately, you can build up capabilities from single quote to tell it exactly what you do want to interpolate:
q:s 'Here are { $two uninterpolated } curlies';
$a interpolates, so do $^a, $*a, $=a, $?a, $.a, etc. It only depends on the $.
print "The dog bark is {Dog.bark}.\n"
${foo[$bar]}
${foo}[$bar]
is dead. Use closure curlies instead:
{$foo[$bar]}
{$foo}[$bar]
(You may be detecting a trend here...)
"{.bark}".
"{abs $var}".
\v to mean vertical tab, whatever that is... (\v now matches vertical whitespace in a regex.) Literal character representations are:
\a BELL
\b BACKSPACE
\t TAB
\n LINE FEED
\f FORM FEED
\r CARRIAGE RETURN
\e ESCAPE
\L, \U, \l, \u, or \Q. Use curlies with the appropriate function instead: "{ucfirst $word}".
\c and square brackets:
From t/spec/S02-literals/char-by-name.t lines 9–23 (no results): (skip)
Highlighted: small|full"\c[NEGATED DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE]"
Multiple codepoints constituting a single character may be interpolated with a single \c by separating the names with comma:
From t/spec/S02-literals/char-by-name.t lines 24–32 (no results): (skip)
Highlighted: small|full"\c[LATIN CAPITAL LETTER A, COMBINING RING ABOVE]"
Whether that is regarded as one character or two depends on the Unicode support level of the current lexical scope. It is also possible to interpolate multiple codepoints that do not resolve to a single character:
"\c[LATIN CAPITAL LETTER A, LATIN CAPITAL LETTER B]"
[Note: none of the official Unicode character names contains comma.]
You may also put one or more decimal numbers inside the square brackets:
"\c[13,10]" # CRLF
Any single decimal number may omit the brackets:
"\c8" # backspace
(Within a regex you may also use \C to match a character that is not the specified character.)
From t/spec/S05-mass/named-chars.t lines 9–561 (no results): (skip)
Highlighted: small|fullIf the character following \c or \C is neither a left square bracket nor a decimal digit, the single following character is turned into a control character by the usual trick of XORing the 64 bit. This allows \c@ for NULL and \c? for DELETE, but note that the ESCAPE character may not be represented that way; it must be represented something like:
\e
\c[ESCAPE]
\c27
\x1B
\o33
Obviously \e is preferred when brevity is needed.
The treatment of backslashed characters that would not have introduced an interpolation varies depending on the type of quote:
qq or :qq in its semantic derivation (including the normal double quote form) assumes that all backslashes are to be considered meaningful. The meaning depends on whether the following character is alphanumeric; if it is, the non-interpolating sequence produces a compile-time error. If the character is non-alphanumeric, the backslash is silently removed, on the assumption that the string was backslashed using quotemeta() or some such.
must be backslashed the same, since they would otherwise be counted wrong in the bracket count.)
From t/spec/S02-literals/quoting.t lines 533–541 (no results): (skip)
Highlighted: small|fullAs a consequence, these all produce the same literal string:
" \{ this is not a closure } "
" \{ this is not a closure \} "
q:c / \{ this is not a closure } /
q:c / \{ this is not a closure \} /
q:c { \{ this is not a closure \} }
q { { this is not a closure } }
q { \{ this is not a closure \} }
(Of course, matching backslashes is likely to make your syntax highlighter a bit happier, along with any other naïve bracket counting algorithms...)
:: type sigil when you're declaring a new one.) A consequence of this is that there's no longer any "use strict 'subs'". Since the syntax for method calls is distinguished from sub calls, it is only unrecognized sub calls that must be treated specially.
From t/spec/S02-literals/types.t lines 4–21 (no results): (skip)
Highlighted: small|fullYou still must declare your subroutines, but a bareword with an unrecognized name is provisionally compiled as a subroutine call, on that assumption that such a declaration will occur by the end of the current compilation unit:
foo; # provisional call if neither &foo nor ::foo is defined so far
foo(); # provisional call if &foo is not defined so far
foo($x); # provisional call if &foo is not defined so far
foo($x, $y); # provisional call if &foo is not defined so far
$x.foo; # not a provisional call; it's a method call on $x
From t/spec/S12-methods/calling_syntax.t lines 13–33 (no results): (skip)
Highlighted: small|full foo $x:; # not a provisional call; it's a method call on $x
foo $x: $y; # not a provisional call; it's a method call on $x
If a postdeclaration is not seen, the compile fails at CHECK time. (You are still free to predeclare subroutines explicitly, of course.) The postdeclaration may be in any lexical or package scope that could have made the declaration visible to the provisional call had the declaration occurred before rather than after the provisional call.
From t/spec/S02-literals/types.t lines 22–30 (no results): (skip)
Highlighted: small|fullThis fixup is done only for provisional calls. If there is any real predeclaration visible, it always takes precedence. In case of multiple ambiguous postdeclarations, either they must all be multis, or a compile-time error is declared and you must predeclare, even if one postdeclaration is obviously "closer". A single proto predeclaration may make all postdeclared multi work fine, since that's a run-time dispatch, and all multis are effectively visible at the point of the controlling proto declaration.
Parsing of a bareword function as a provisional call is always done the same way list operators are treated. If a postdeclaration bends the syntax to be inconsistent with that, it is an error of the inconsistent signature variety.
If the unrecognized subroutine name is followed by postcircumfix:<( )>, it is compiled as a provisional function call of the parenthesized form. If it is not, it is compiled as a provisional function call of the list operator form, which may or may not have an argument list. When in doubt, the attempt is made to parse an argument list. As with any list operator, an immediate postfix operator is illegal unless it is a form of parentheses, whereas anything following whitespace will be interpreted as an argument list if possible.
From t/spec/S02-literals/sub-calls.t lines 8–74 (no results): (skip)
Highlighted: small|fullBased on the signature of the subroutine declaration, there are only four ways that an argument list can be parsed:
Signature # of expected args
() 0
($x) 1
($x?) 0..1
(anything else) 0..Inf
That is, a standard subroutine call may be parsed only as a 0-arg term (or function call), a 1-mandatory-arg prefix operator (or function call), a 1-optional-arg term or prefix operator (or function call), or an "infinite-arg" list operator (or function call). A given signature might only accept 2 arguments, but the only number distinctions the parser is allowed to make is between void, singular and plural; checking that number of arguments supplied matches some number larger than one must be done as a separate semantic constraint, not as a syntactic constraint. Perl functions never take N arguments off of a list and leave the rest for someone else, except for small values of N, where small is defined as not more than 1. You can get fancier using macros, but macros always require predeclaration. Since the non-infinite-list forms are essentially behaving as macros, those forms also require predeclaration. Only the infinite-list form may be postdeclared (and hence used provisionally).
It is illegal for a provisional subroutine call to be followed by a colon postfix, since such a colon is allowed only on an indirect object, or a method call in dot form. (It is also allowed on a label when a statement is expected.) So for any undeclared identifier "foo":
foo.bar # ILLEGAL -- postfix must use foo().bar
foo .bar # foo($_.bar) -- no postfix starts with whitespace
foo\ .bar # ILLEGAL -- must use foo()\ .bar
foo++ # ILLEGAL -- postfix must use foo()++
foo 1,2,3 # foo(1,2,3) -- args always expected after listop
foo + 1 # foo(+1) -- term always expected after listop
foo; # foo(); -- no postfix, but no args either
foo: # label -- must be label at statement boundary.
-- ILLEGAL otherwise
foo: bar: # two labels in a row, okay
.foo: 1 # $_.foo: 1 -- must be "dot" method with : args
.foo(1) # $_.foo(1) -- must be "dot" method with () args
.foo # $_.foo() -- must be "dot" method with no args
.$foo: 1 # $_.$foo: 1 -- indirect "dot" method with : args
foo bar: 1 # bar.foo(1) -- bar must be predecl as class
-- sub bar allowed here only if 0-ary
-- otherwise you must say (bar):
foo bar 1 # foo(bar(1)) -- both subject to postdeclaration
-- never taken as indirect object
foo $bar: 1 # $bar.foo(1) -- indirect object even if declared sub
-- $bar considered one token
foo (bar()): # bar().foo(1) -- even if foo declared sub
foo bar(): # ILLEGAL -- bar() is two tokens.
foo .bar: # foo(.bar:) -- colon chooses .bar to listopify
foo bar baz: 1 # foo(baz.bar(1)) -- colon controls "bar", not foo.
foo (bar baz): 1 # bar(baz()).foo(1) -- colon controls "foo"
$foo $bar # ILLEGAL -- two terms in a row
$foo $bar: # ILLEGAL -- use $bar.$foo for indirection
(foo bar) baz: 1 # ILLEGAL -- use $baz.$(foo bar) for indirection
The indirect object colon only ever dominates a simple term, where "simple" includes classes and variables and parenthesized expressions, but explicitly not method calls, because the colon will bind to a trailing method call in preference. An indirect object that parses as more than one token must be placed in parentheses, followed by the colon.
In short, only an identifier followed by a simple term followed by a postfix colon is ever parsed as an indirect object, but that form will always be parsed as an indirect object regardless of whether the identifier is otherwise declared.
use strict 'refs'" because symbolic dereferences are now syntactically distinguished from hard dereferences. @($arrayref) must now provide an actual array object, while @::($string) is explicitly a symbolic reference. (Yes, this may give fits to the P5-to-P6 translator, but I think it's worth it to separate the concepts. Perhaps the symbolic ref form will admit real objects in a pinch.)
%x<foo> for constant hash subscripts, or the old standby %x{'foo'}. (It also works to say %x«foo» as long as you realized it's subject to interpolation.)
But => still autoquotes any bare identifier to its immediate left (horizontal whitespace allowed but not comments). The identifier is not subject to keyword or even macro interpretation. If you say
$x = do {
call_something();
if => 1;
}
then $x ends up containing the pair ("if" => 1). Always. (Unlike in Perl 5, where version numbers didn't autoquote.)
You can also use the :key($value) form to quote the keys of option pairs. To align values of option pairs, you may use the "unspace" postfix forms:
:longkey\ ($value)
:shortkey\ <string>
:fookey\ { $^a <=> $^b }
These will be interpreted as
:longkey($value)
:shortkey<string>
:fookey{ $^a <=> $^b }
From t/spec/S02-whitespace_and_comments/begin_end_pod.t lines 9–18 (no results): (skip)
Highlighted: small|fullFrom t/spec/S02-whitespace_and_comments/end-pod.t lines 9–19 (no results): (skip)
Highlighted: small|full Old New
--- ---
__LINE__ $?LINE
__FILE__ $?FILE
__PACKAGE__ $?PACKAGE
__END__ =begin END
__DATA__ =begin DATA
[Note: this paragraph is speculative and subject to drastic change as S26 evolves.] The =begin END Pod stream is special in that it assumes there's no corresponding =end END before end of file. The DATA stream is no longer special--any Pod stream in the current file can be accessed via a filehandle, named as %=POD{'DATA'} and such. Alternately, you can treat a Pod stream as a scalar via $=DATA or as an array via @=DATA. Presumably a module could read all its COMMENT blocks from @=COMMENT, for instance. Each chunk of Pod comes as a separate array element. You have to split it into lines yourself. Each chunk has a .range property that indicates its line number range within the source file.
From t/spec/S02-literals/pod.t lines 9–16 (no results): (skip)
Highlighted: small|fullThe lexical routine itself is &?ROUTINE; you can get its name with &?ROUTINE.name. The current block is &?BLOCK. If the block has any labels, those shows up in &?BLOCK.labels. Within the lexical scope of a statement with a label, the label is a pseudo-object representing the dynamically visible instance of that statement. (If inside multiple dynamic instances of that statement, the label represents the innermost one.) This is known as lexotic semantics.
When you say:
next LINE;
it is really a method on this pseudo-object, and
LINE.next;
would work just as well. You can exit any labeled block early by saying
MyLabel.leave(@results);
<<, but with an adverb on any other quote construct:
From t/spec/S02-literals/quoting.t lines 285–300 (no results): (skip)
Highlighted: small|full print qq:to/END/;
Give $amount to the man behind curtain number $curtain.
END
Other adverbs are also allowed, as are multiple heredocs within the same expression:
print q:c:to/END/, q:to/END/;
Give $100 to the man behind curtain number {$curtain}.
END
Here is a $non-interpolated string
END
($?TABSTOP // 8) spaces, but as long as tabs and spaces are used consistently that doesn't matter.) A null terminating delimiter terminates on the next line consisting only of whitespace, but such a terminator will be assumed to have no indentation. (That is, it's assumed to match at the beginning of any whitespace.)
From t/spec/S02-literals/quoting.t lines 301–336 (no results): (skip)
Highlighted: small|fullInstead, Perl 6 takes the one-pass approach, and just lazily queues up the heredocs it finds in a line, and waits until it sees a "real" newline to look for the text and attach it to the appropriate heredoc. The downside of this approach is a slight restriction--you may not use the actual text of the heredoc in code that must run before the line finishes parsing. Mostly that just means you can't write:
BEGIN { say q:to/END/ }
Say me!
END
You must instead put the entire heredoc into the BEGIN:
BEGIN {
say q:to/END/;
Say me!
END
}
Version object, not a string. Only integers and certain wildcards are allowed; for anything fancier you must coerce a string to a Version:
v1.2.3 # okay
v1.2.* # okay, wildcard version
v1.2.3+ # okay, wildcard version
v1.2.3beta # illegal
Version('1.2.3beta') # okay
Note though that most places that take a version number in Perl accept it as a named argument, in which case saying :ver<1.2.3beta> is fine. See S11 for more on using versioned modules.
Version objects have a predefined sort order that follows most people's intuition about versioning: each sorting position sorts numerically between numbers, alphabetically between alphas, and alphabetics in a position before numerics. Missing final positions are assumed to be '.0'. Except for '0' itself, numbers ignore leading zeros. For splitting into sort positions, if any alphabetics (including underscore) are immediately adjacent to a number, a dot is assumed between them. Likewise any non-alphanumeric character is assumed to be equivalent to a dot. So these are all equivalent:
1.2.1alpha1.0
1.2.1alpha1
1.2.1.alpha1
1.2.1alpha.1
1.2.1.alpha.1
1.2-1+alpha/1
And these are also equivalent:
1.2.1_01
1.2.1_1
1.2.1._1
1.2.1_1
1.2.1._.1
001.0002.0000000001._.00000000001
1.2.1._.1.0.0.0.0.0
So these are in sorted version order:
1.2.0.999
1.2.1_01
1.2.1_2
1.2.1_003
1.2.1a1
1.2.1.alpha1
1.2.1b1
1.2.1.beta1
1.2.1.gamma
1.2.1α1
1.2.1β1
1.2.1γ
1.2.1
Note how the last pair assume that an implicit .0 sorts after anything alphabetic, and that alphabetic is defined according to Unicode, not just according to ASCII. The intent of all this is to make sure that prereleases sort before releases. Note also that this is still a subset of the versioning schemes seen in the real world. Modules with such strange versions can still be used by Perl since by default Perl imports external modules by exact version number. (See S11.) Only range operations will be compromised by an unknown foreign collation order, such as a system that sorts "delta" after "gamma".
Context Type OOtype Operator
------- ---- ------ --------
boolean bit Bit ?
From t/spec/S03-operators/context-forcers.t lines 114–142 (no results): (skip)
Highlighted: small|full integer int Integral int
numeric num Num +
From t/spec/S03-operators/context-forcers.t lines 67–78 (no results): (skip)
Highlighted: small|fullstring buf Str ~
From t/spec/S03-operators/context-forcers.t lines 94–113 (no results): (skip)
Highlighted: small|fullThere are also various container contexts that require particular kinds of containers (such as slice and hash context; see S03 for details).
.Bool property. Classes get to decide which of their values are true and which are false. Individual objects can override the class definition:
return 0 but True;
This overrides the .Bool method of the 0 without changing its official type (by mixing the method into an anonymous derived type).
.Bool for the most ancestral type (that is, the Mu type) is equivalent to .defined. Since type objects are considered undefined, all type objects (including Mu itself) are false unless the type overrides the definition of .Bool to include undefined values. Instantiated objects default to true unless the class overrides the definition. Note that if you could instantiate a Mu it would be considered defined, and thus true. (It is not clear that this is allowed, however.)
Just as with the standard types, user-defined types should feel free to partition their defined values into true and false values if such a partition makes sense in control flow using boolean contexts, since the separate .defined method is always there if you need it.
From t/spec/S02-builtin_data_types/lists.t lines 5–166 (no results): (skip)