Synopsis 2: Bits and Pieces
Larry Wall <larry@wall.org>
Maintainer: Larry Wall <larry@wall.org> Date: 10 Aug 2004 Last Modified: 16 Jun 2006 Number: 2 Version: 45
This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl 6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)
In practice, though, you're safest using matching characters with Ps/Pe properties, though ASCII angle brackets are a notable exception, since they're bidirectional but not in the Ps/Pe set.
Characters with no corresponding closing character do not qualify as opening brackets. This includes the second section of the BidiMirroring data table, as well as U+201A and U+201E.
If a character is already used in Ps/Pe mappings, then its entry in BidiMirroring is ignored. Therefore U+298D maps to U+298E, not U+2990, and U+298E itself is not a valid bracket opener.
The U+301D has two closing alternatives, U+301E and U+301F; Perl 6 only recognizes the one with lower code point number, U+301E, as the closing brace. This policy also applies to new one-to-many mappings introduced in the future.
# character and ending at the subsequent newline. They count as whitespace equivalent to newline for purposes of separation. Unlike in Perl 5, # may not be used as the delimiter in quoting constructs.=begin comment/=end comment correctly without the need for =cut. (Doesn't have to be "comment"--any unrecognized POD stream will do to make it a comment. Bare =begin and =end probably aren't good enough though, unless you want all your comments to end up in the manpage...) We have single paragraph comments with =for comment as well. That lets =for keep its meaning as the equivalent of a =begin and =end combined. As with =begin and =end, a comment started in code reverts to code afterwards.
Since there is a newline before the first =, the POD form of comment counts as whitespace equivalent to a newline.
# plus any user-selected bracket characters (see definition above): say #( embedded comment ) "hello, world!";
$object\#{ embedded comments }.say;
$object\ #「
embedded comments
」.say;There must be no space between the # and the opening bracket character. (There may be the appearance of space for some double-wide characters, however, such as the corner quotes above.) Brackets may be nested following the same policy as ordinary quote brackets.
As a special case to facilitate commenting out sections of code with s/^/#/, # on the left margin is always considered a line-end comment rather than an embedded comment, even if followed by a bracketing character.
say #{{
Comment contains unmatched } and { { { { plus a counted {{ ... }} pair.
}} q<< <<woot>> >> # says "<<woot>>"Note however that bare circumfix or postcircumfix <<...>> is not a user-selected bracket, but the ASCII variant of the «...» interpolating word list. Only # and the q-style quoters (including m, s, tr, and rx) enable subsequent user-selected brackets.
%hash\ .{$key}
@array\ .[$ix]
$subref\.($arg)This is useful for lining up postfixes. This is known as the "long dot", partly because it substitutes for a dot without the need for an extra dot:
$object\ .say();
The whitespace in the middle may include any of the comment forms above. Because comments always count as whitespace, the \. in
$object\#{ foo }.sayreduce to a "long dot". Valid ways to insert a line break into a sequence of method calls include:
$object\ # comment
.say
$object\#[ comment
].say
$object\
.sayThis is an unchanging deep rule, but the surface ramifications of it change as various operators and macros are added to or removed from the language, which we expect to happen because Perl 6 is designed to be a mutable language. In particular, there is a natural conflict between postfix operators and infix operators, either of which may occur after a term. If a given token may be interpreted as either a postfix operator or an infix operator, the infix operator requires space before it. Postfix operators may never have intervening space, though they may have an intervening dot. If further separation is desired, an embedded comment may be used as described above, as long as no whitespace occurs outside the embedded comment.
For instance, if you were to add your own infix:<++> operator, then it must have space before it. The normal autoincrementing postfix:<++> operator may never have space before it, but may be written in any of these forms:
$x++
$x.++
$x\ .++
$x\#( comment ).++
$x\#((( comment ))).++
$x\
.++
$x\ # comment
# more comment
.++
$x\#『 comment
more comment
』.++
$x\#[ comment 1
comment 2
=begin podstuff
whatever (pod comments ignore current parser state)
=end podstuff
comment 3
].++A consequence of the postfix rule is that (except when delimiting a quote or terminating a "long dot") a dot with whitespace in front of it is always considered a method call on $_ where a term is expected. If a term is not expected at this point, it is a syntax error. (Unless, of course, there is an infix operator of that name beginning with dot. You could, for instance, define a Fortranly infix:<.EQ.> if the fit took you. But you'll have to be sure to always put whitespace in front of it, or it would be interpreted as a postfix method call instead.)
For example,
foo .method
and
foo
.methodwill always be interpreted as
foo $_.method
but never as
foo.method
Use some variant of
foo\
.methodif you mean the postfix method call.
One consequence of all this is that you may no longer write a Num as 42. with just a trailing dot. You must instead say either 42 or 42.0. In other words, a dot following a number can only be a decimal point if the following character is a digit. Otherwise the postfix dot will be taken to be the start of some kind of method call syntax, whether long-dotty or not. (The .123 form with a leading dot is still allowed however when a term is expected, and is equivalent to 0.123.)
According to S12, properties are actually implemented by a kind of mixin mechanism, and such mixins are accomplished by the generation of an individual anonymous class for the object (unless an identical anonymous class already exists and can safely be shared).
length does not specify units. So .elems is the number of array elements. You can also ask for the length of an array in bytes or codepoints or graphemes. The same methods apply to strings as well: there is no .length on strings either.my Dog $spot by itself does not automatically call a Dog constructor. It merely installs an undefined Dog prototype as the object. The actual constructor syntax turns out to be my Dog $spot .= new;, making use of the .= mutator method-call syntax.my int @array is MyArray;
you are declaring that the elements of @array are native integers, but that the array itself is implemented by the MyArray class. Untyped arrays and hashes are still perfectly acceptable, but have the same performance issues they have in Perl 5.
Int, Num, Complex, Rational, Str, Bit, Regex, Set, Junction, Code, Block, List, Seq), as well as mutable (container) types, such as Scalar, Array, Hash, Buf, Routine, Module, etc. Non-object (native) types are lowercase: int, num, complex, rational, buf, bit. Native types are primarily intended for declaring compact array storage. However, Perl will try to make those look like their corresponding uppercase types if you treat them that way. (In other words, it does autoboxing. Note, however, that sometimes repeated autoboxing can slow your program more than the native type can speed it up.)
undefined role, and may contain an alternate set of attributes when undefined, such as the unthrown exception explaining why the value is undefined. Non-object types are not required to support undefinedness, but it is an error to assign an undefined value to such a location..meta method that returns the class instance managing the current kind of object. Any object (whether defined, undefined, or somewhere between) can be used as a "kind" when the context requires it.Int automatically supports promotion to arbitrary precision, as well as holding Inf and NaN values. (Num may support arbitrary-precision floating-point arithmetic, but is not required to unless we can do so portably and efficiently. Num must support the largest native floating point format that runs at full speed.)
Rational supports arbitrary precision rational arithmetic. However, dividing two Int objects produces fractionals as Num objects by default, not Rational objects. You can override this behavior with a pragma.
Lower-case types like int and num imply the native machine representation for integers and floating-point numbers, respectively, and do not promote to arbitrary precision, though larger representations are always allowed for temporary values. Unless qualified with a number of bits, int and num types default to the largest native types that run at full speed. Untyped numeric scalars use Int and Num semantics rather than int and num.
Inf (infinity) and NaN (not a number). Within a lexical scope, pragmas may specify the nature of temporary values, and how floating point is to behave under various circumstances. All IEEE modes must be lexically available via pragma except in cases where that would entail heroic efforts to bypass a braindead platform. The default floating-point modes do not throw exceptions but rather propagate Inf and NaN. The boxed object types may carry more detailed information on where overflow or underflow occurred. Numerics in Perl are not designed to give the identical answer everywhere. They are designed to give the typical programmer the tools to achieve a good enough answer most of the time. (Really good programmers may occasionally do even better.) Mostly this just involves using enough bits that the stupidities of the algorithm don't matter much.
Str is a Unicode string object. There is no corresponding native str type. However, since a Str object may fill multiple roles, we say that a Str keeps track of its minimum and maximum Unicode abstraction levels, and plays along nicely with the current lexical scope's idea of the ideal character, whether that is bytes, codepoints, graphemes, or characters in some language. For all builtin operations, all Str positions are reported as position objects, not integers. These StrPos objects point into a particular string at a particular location independent of abstraction level. The subtraction of two StrPos objects gives a StrLen object, which is still not an integer, because the string between two positions also has multiple integer interpretations depending on the units. A given StrLen may know that it represents 18 bytes, 7 codepoints, and 3 graphemes, but it knows this lazily because it actually just hangs onto the two StrPos objects. (It's much like a Range object in that respect.) If you use integers as arguments where position objects are expected, it will be assumed that you mean the units of the current lexically scoped Unicode abstraction level. (Which defaults to graphemes.) Otherwise you'll need to coerce to the proper units:
substr($string, 42.as(Bytes), 1.as(ArabicChars))
Of course, such a dimensional number will fail if used on a string that doesn't provide the appropriate abstraction level.
Buf is a stringish view of an array of integers, and has no Unicode or character properties without explicit conversion to some kind of Str. (A buf is the native counterpart.) Typically it's an array of bytes serving as a buffer. Bitwise operations on a Buf treat the entire buffer as a single large integer. Bitwise operations on a Str generally fail unless the Str in question can provide an abstract Buf interface somehow. Coercion to Buf should generally invalidate the Str interface. As a generic type Buf may be instantiated as (or bound to) any of buf8, buf16, or buf32 (or to any type that provide the appropriate Buf interface), but when used to create a buffer Buf defaults to buf8. Unlike Str types, Buf types prefer to deal with integer string positions, and map these directly to the underlying compact array as indices. That is, these are not necessarily byte positions--an integer position just counts over the number of underlying positions, where one position means one cell of the underlying integer type. Builtin string operations on Buf types return integers and expect integers when dealing with positions. As a limiting case, buf8 is just an old-school byte string, and the positions are byte positions. Note, though, that if you remap a section of buf32 memory to be buf8, you'll have to multiply all your positions by 4.
* indicates a global function or type name, but by itself, the * term captures the notion of "Whatever", which is applied lazily by whatever operator it is an argument to. Generally it can just be thought of as a "glob" that gives you everything it can in that argument position. For instance: if $x ~~ 1..* {...} # if 1 <= $x <= +Inf
my ($a,$b,$c) = "foo" xx *; # an arbitrary long list of "foo"
if /foo/ ff * {...} # a latching flipflop
@slice = @x[*;0;*]; # any Int
@slice = %x{*;'foo'}; # any keys in domain of 1st dimension
@array[*] # flattens, unlike @array[]
(*, *, $x) = (1, 2, 3); # skip first two elements
# (same as lvalue "undef" in Perl 5)Whatever is an undefined prototype object derived from Any. As a type it is abstract, and may not be instantiated as a defined object. If for a particular MMD dispatch, nothing in the MMD system claims it, it dispatches to as an Any with an undefined value, and usually blows up constructively. If you say
say 1 + *;
you should probably not expect it to yield a reasonable answer, unless you think an exception is reasonable. Since the Whatever object is effectively immutable, the optimizer is free to recognize * and optimize in the context of what operator it is being passed to.
A variant of * is the ** term. It is generally understood to be a multidimension form of * when that makes sense.
The *** variant serves as the insertion point of a list of pipes. That insertion point may be targeted by piping to *. See S06.
Other uses for * will doubtless suggest themselves over time. These can be given meaning via the MMD system, if not the compiler. In general a Whatever should be interpreted as maximizing the degrees of freedom in a dwimmey way, not as a nihilistic "don't care anymore--just shoot me".
$pkg'var syntax is dead. Use $pkg::var instead. $ scalar
@ ordered array
% unordered hash (associative array)
& code/rule/token/regex
:: package/module/class/role/subset/enum/type/grammar
@@ multislice view of @Within a declaration, the & sigil also declares the visibility of the subroutine name without the sigil within the scope of the declaration.
Within a signature or other declaration, the :: sigil followed by an identifier marks a parametric type that also declares the visibility of a package/type name without the sigil within the scope of the declaration. The first such declaration within a scope is assumed to be an unbound type, and takes the actual type of its associated argument. With subsequent declarations in the same scope the use of the sigil is optional, since the bare type name is also declared. A declaration nested within must not use the sigil if it wishes to refer to the same type, since the inner declaration would rebind the type. (Note that the signature of a pointy block counts as part of the inner block, not the outer block.)
$foo ordinary scoping
$.foo object attribute accessor
$^foo self-declared formal parameter
$*foo global variable
$+foo environmental variable
$?foo compiler hint variable
$=foo pod variable
$<foo> match variable, short for $/{'foo'}
$!foo explicitly private attribute (mapped to $foo though)Most variables with twigils are implicitly declared or assumed to be declared in some other scope, and don't need a "my" or "our". Attribute variables are declared with has, though, and environment variables are declared somewhere in the dynamic scope with the env declarator.
$ always means a scalar variable, @ an array variable, and % a hash variable, even when subscripting. Variables such as @array and %hash in scalar context simply return themselves as Array and Hash objects..perl method. This will put quotes around strings, square brackets around list values, curlies around hash values, constructors around objects, etc., such that standard Perl could reparse the result..as('%03d') method to do an implicit sprintf on the value. To format an array valueT separated by commas, supply a second argument: .as('%03d', ', '). To format a hash valueT or list of pairsT, include formats for both key and value in the first string: .as('%s: %s', "\n").@foo.[1] and %bar.{'a'}). Constant string subscripts may be placed in angles, so %bar.{'a'} may also be written as %bar<a> or %bar.<a>.If you need to force inner context to scalar, we now have convenient single-character context specifiers such as + for numbers and ~ for strings:
@x[f()] = g(); # list context for f() and g()
@x[f()] = +g(); # list context for f(), scalar context for g()
@x[+f()] = g(); # scalar context for f() and g()
# -- see S03 for "SIMPLE" lvalues
@x[f()] = @y[g()]; # list context for f() and g()
@x[f()] = +@y[g()]; # list context for f() and g()
@x[+f()] = @y[g()]; # scalar context for f(), list context for g()
@x[f()] = @y[+g()]; # list context for f(), scalar context for g():= binding operator that lets you bind names to Array and Hash objects without copying, in the same way as subroutine arguments are bound to formal parameters. See S06 for more about parameter binding.Capture) may be created with backslashed parens: $args = \(1,2,3,:mice<blind>)
Values in Capture are parsed as ordinary expressions, marked as invocant, positional, named, and so on.
Like List objects, Capture objects are immutable in the abstract, but evaluate their arguments lazily. Before everything inside a Capture is fully evaluated (which happens at compile time when all the arguments are constants), the eventual value may well be unknown. All we know is that we have the promise to make the bits of it immutable as they become known.
Capture objects may contain multiple unresolved iterators such as pipes or slices. How these are resolved depends on what they are eventually bound to. Some bindings are sensitive to multiple dimensions while others are not.
You may retrieve parts from a Capture object with a prefix sigil operator:
$args = \3; # same as "$args = \(3)"
$$args; # same as "$args as Scalar" or "Scalar($args)"
@$args; # same as '$args as Array" or "Array($args)"
%$args; # same as '$args as Hash" or "Hash($args)"When cast into an array, you can access all the positional arguments; into a hash, all named arguments; into a scalar, its invocant.
All prefix sigil operators accept one positional argument, evaluated in scalar context as a rvalue. They can interpolate in strings if called with parentheses. The special syntax form $() translates into $( $/ ) to operate on the current match object; the same applies to @(), %() and *() forms.
Capture objects fill the ecological niche of references in Perl 6. You can think of them as "fat" references, that is, references that can capture not only the current identity of a single object, but also the relative identities of several related objects. Conversely, you can think of Perl 5 references as a degenerate form of Capture when you want to refer only to a single item.
Signature) may be created with coloned parens: my ::MySig = :(Int,Num,Complex, Status :mice)
A signature's values are parsed as declarations rather than ordinary expressions. You may not put arbitrary expressions, but you may, for instance stack multiple types that all must match:
:(Any Num Dog|Cat $numdog)
Such a signature may be used within another signature to apply additional type constraints. When applied to a Capture argument, the signature allows you to specify the types of parameters that would otherwise be untyped:
:(Any Num Dog|Cat $numdog, MySig \$a ($i,$j,$k,$mousestatus))
&foo merely stands for the foo function as a Code object without calling it. You may call any Code object with parens after it (which may, of course, contain arguments): &foo($arg1, $arg2);
Whitespace is not allowed before the parens, but there is a corresponding .() operator, plus the "long dot" forms that allow you to insert optional whitespace and comments between dots:
&foo\ .($arg1, $arg2);
&foo\#[
embedded comment
].($arg1, $arg2);&foo may not be sufficient to uniquely name a specific function. In that case, the type may be refined by using a signature literal as a postfix operator: &foo:(Int,Num)
It still just returns a Code object. A call may also be partially applied by using an argument list literal as a postfix operator:
&foo\(1,2,3,:mice<blind>)
This is really just a shorthand for
&foo.assuming(1,2,3,:mice<blind>)
@array = <A B>;
@array[0,1,2]; # returns 'A', 'B', undef
@array[0,1,2]:p; # returns 0 => 'A', 1 => 'B'
@array[0,1,2]:kv; # returns 0, 'A', 1, 'B'
@array[0,1,2]:k; # returns 0, 1
@array[0,1,2]:v; # returns 'A', 'B'
%hash = (:a<A>, :b<B>);
%hash<a b c>; # returns 'A', 'B', undef
%hash<a b c>:p; # returns a => 'A', b => 'B'
%hash<a b c>:kv; # returns 'a', 'A', 'b', 'B'
%hash<a b c>:k; # returns 'a', 'b'
%hash<a b c>:v; # returns 'A', 'B'The adverbial forms all weed out non-existing entries.
Int or Num), a Hash object becomes the number of pairs contained in the hash. In a boolean context, a Hash object is true if there are any pairs in the hash. In either case, any intrinsic iterator would be reset. (If hashes do carry an intrinsic iterator (as they do in Perl 5), there will be a .reset method on the hash object to reset the iterator explicitly.)sort see S29.$*PID or @*ARGS.$_ and @_, as well as the new $/, which is the return value of the last regex match. $0, $1, $2, etc., are aliases into the $/ object.$#foo notation is dead. Use @foo.end or [-1] instead. (Or @foo.shape[$dimension] for multidimensional arrays.)$Foo::Bar::baz # the $baz variable in package Foo::Bar
Sometimes it's clearer to keep the sigil with the variable name, so an alternate way to write this is:
Foo::Bar::<$baz>
This is resolved at compile time because the variable name is a constant.
MY
OUR
GLOBAL
OUTER
CALLER
ENV
SUPER
COMPILINGOther all-caps names are semi-reserved. We may add more of them in the future, so you can protect yourself from future collisions by using mixed case on your top-level packages. (We promise not to break any existing top-level CPAN package, of course. Except maybe ACME, and then only for coyotes.)
::($expr) where you'd ordinarily put a package or variable name. The string is allowed to contain additional instances of ::, which will be interpreted as package nesting. You may only interpolate entire names, since the construct starts with ::, and either ends immediately or is continued with another :: outside the curlies. Most symbolic references are done with this notation: $foo = "Foo";
$foobar = "Foo::Bar";
$::($foo) # package-scoped $Foo
$::("MY::$foo") # lexically-scoped $Foo
$::("*::$foo") # global $Foo
$::($foobar) # $Foo::Bar
$::($foobar)::baz # $Foo::Bar::baz
$::($foo)::Bar::baz # $Foo::Bar::baz
$::($foobar)baz # ILLEGAL at compile time (no operator baz)Note that unlike in Perl 5, initial :: doesn't imply global. Package names are searched for from inner lexical scopes to outer, then from inner packages to outer. Variable names are searched for from inner lexical scopes to outer, but unlike package names are looked for in only the current package and the global package.
The global namespace is the last place it looks in either case. You must use the * (or GLOBAL) package on the front of the string argument to force the search to start in the global namespace.
Use the MY pseudopackage to limit the lookup to the current lexical scope, and OUR to limit the scopes to the current package scope.
$x and @y) are only looked up from lexical scopes, but never from package scopes. To bind package variables into a lexical scope, simply say our ($x, @y). To bind global variables into a lexical scope, predeclare them with use:
use GLOBAL <$IN $OUT>;
Or just refer to them as $*IN and $*OUT.
Foo::Bar::{'&baz'} # same as &Foo::Bar::baz
GLOBAL::<$IN> # Same as $*IN
Foo::<::Bar><::Baz> # same as Foo::Bar::BazUnlike ::() symbolic references, this does not parse the argument for ::, nor does it initiate a namespace scan from that initial point. In addition, for constant subscripts, it is guaranteed to resolve the symbol at compile time.
The null pseudo-package is reserved to mean the same search list as an ordinary name search. That is, the following are all identical in meaning:
$foo
$::{'foo'}
::{'$foo'}
$::<foo>
::<$foo>That is, each of them scans lexical scopes outward, and then the current package scope (though the package scope is then disallowed when "strict" is in effect).
As a result of these rules, you can write any arbitrary variable name as either of:
$::{'!@#$#@'}
::{'$!@#$#@'}You can also use the ::<> form as long as there are no spaces in the name.
MY. The current package symbol table is visible as pseudo-package OUR. The OUTER name refers to the MY symbol table immediately surrounding the current MY, and OUTER::OUTER is the one surrounding that one. our $foo = 41;
say $::foo; # prints 41, :: is no-op
{
my $foo = 42;
say MY::<$foo>; # prints "42"
say $MY::foo; # same thing
say $::foo; # same thing, :: is no-op here
say OUR::<$foo>; # prints "41"
say $OUR::foo; # same thing
say OUTER::<$foo>; # prints "41" (our $foo is also lexical)
say $OUTER::foo; # same thing
}You may not use any lexically scoped symbol table, either by name or by reference, to add symbols to a lexical scope that is done compiling. (We reserve the right to relax this if it turns out to be useful though.)
CALLER package refers to the lexical scope of the (dynamically scoped) caller. The caller's lexical scope is allowed to hide any variable except $_ from you. In fact, that's the default, and a lexical variable must be declared using "env" rather than my to be visible via CALLER. ($_, $! and $/ are always environmental.) If the variable is not visible in the caller, it returns failure. An explicit env declaration is implicitly readonly. You may add is rw to allow subroutines from modifying your value. $_ is rw by default. In any event, your lexical scope can access the variable as if it were an ordinary my; the restriction on writing applies only to subroutines.
ENV pseudo-package is just like CALLER except that it scans outward through all dynamic scopes until it finds an environmental variable of that name in that caller's lexical scope. (Use of $+FOO is equivalent to ENV::<$FOO> or $ENV::FOO.) If after scanning all the lexical scopes of each dynamic scope, there is no variable of that name, it looks in the * package. If there is no variable in the * package, it looks in %*ENV for the name, that is, in the environment variables passed to programT. If the value is not found there, it returns failure. Note that $+_ is always the same as CALLER::<$_> since all callers have a $_ that is automatically considered environmental. Note also that ENV and $+ always skip the current scope, since you can always name the variable directly without the ENV or + if it's been declared env in the current lexical scope. Subprocesses are passed only the global %*ENV values. They do not see any lexical variables or their values. The ENV package is only for internal overriding of environmental parameters. Change %*ENV to change what subprocesses see. [Conjecture: This might be suboptimal in the abstract, but it would be difficult to track the current set of environment variable names unless we actually passed around a list. The alternative seems to be to walk the entire dynamic scope and reconstruct %*ENV for each subprogram call, and then we only slow down subprogram calls.]
%Foo::. Just subscript the package object itself as a hash object, the key of which is the variable name, including any sigil. The package object can be derived from a type name by use of the :: postfix operator: MyType::<$foo>
MyType.::.{'$foo'} # same thing with dots
MyType\ .::\ .{'$foo'} # same thing with long dots(Directly subscripting the type with either square brackets or curlies is reserved for various generic type-theoretic operations. In most other matters type names and package names are interchangeable.)
Typeglobs are gone. Use binding (:= or ::=) to do aliasing. Individual variable objects are still accessible through the hash representing each symbol table, but you have to include the sigil in the variable name now: MyPackage::{'$foo'} or the equivalent MyPackage::<$foo>.
* package: $*UID, %*ENV. (The * may be omitted if you import the name from the GLOBAL package.) $*foo is short for $*::foo, suggesting that the variable is "wild carded" into every package.$*IN, standard output is $*OUT, and standard error is $*ERR. The magic command-line input handle is $*ARGS.= secondary sigil. $=DATA is the name of your DATA filehandle, for instance. All pod structures are available through %=POD (or some such). As with *, the = may also be used as a package name: $=::DATA.? secondary sigil. These are all values that are known to the compiler, and may in fact be dynamically scoped within the compiler itself, and only appear to be lexically scoped because dynamic scopes of the compiler resolve to lexical scopes of the program. All $? variables are considered constants, and may not be modified after being compiled in, except insofar as the compiler arranges in advance for such variables to be rebound (as is the case with $?SELF). $?FILE and $?LINE are your current file and line number, for instance. ? is not a shortcut for a package name like * is. Instead of $?OUTER::SUB you probably want to write OUTER::<$?SUB>.
Here are some possibilities:
&?ROUTINE Which routine am I inT?
Note that some of these things have parallels in the * space at run time:
$*OS Which OS I'm running under
$*OSVER Which OS version I'm running under
$*PERLVER Which Perl version I'm running underYou should not assume that these will have the same value as their compile-time cousins.
$? variables are constant to the run time, the compiler has to have a way of changing these values at compile time without getting confused about its own $? variables (which were frozen in when the compile-time code was itself compiled). The compiler can talk about these compiler-dynamic values using the COMPILING pseudopackage. References to COMPILING variables are automatically hoisted into the context currently being compiled. Setting or temporizing a COMPILING variable sets or temporizes the incipient $? variable in the surrounding lexical context that is being compiled. If nothing in the context is being compiled, an exception is thrown.
$?FOO // say "undefined"; # probably says undefined
BEGIN { COMPILING::<$?FOO> = 42 }
say $?FOO; # prints 42
{
say $?FOO; # prints 42
BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block
say $?FOO; # prints 43
BEGIN { COMPILING::<$?FOO> = 44 }
say $?FOO; # prints 44
BEGIN { say COMPILING::<$?FOO> } # prints 44, but $?FOO probably undefined
}
say $?FOO; # prints 42 (left scope of temp above)
$?FOO = 45; # always an error
COMPILING::<$?FOO> = 45; # an error unless we are compiling somethingNote that CALLER::<$?FOO> might discover the same variable as COMPILING::<$?FOO>, but only if the compiling context is the immediate caller. Likewise OUTER::<$?FOO> might or might not get you to the right place. In the abstract, COMPILING::<$?FOO> goes outwards dynamically until it finds a compiling scope, and so is guaranteed to find the "right" $?FOO. (In practice, the compiler hopefully keeps track of its current compiling scope anyway, so no scan is needed.)
Perceptive readers will note that this subsumes various "compiler hints" proposals. Crazy readers will wonder whether this means you could set an initial value for other lexicals in the compiling scope. The answer is yes. In fact, this mechanism is probably used by the exporter to bind names into the importer's namespace.
COMPILING::<$?PARSER>. Lexically scoped parser changes should temporize the modification. Changes from here to end-of-compilation unit can just assign or bind it. In general, most parser changes involve deriving a new grammar and then pointing COMPILING::<$?PARSER> at that new grammar. Alternately, the tables driving the current parser can be modified without derivation, but at least one level of anonymous derivation must intervene from the standard Perl grammar, or you might be messing up someone else's grammar. Basically, the current grammar has to belong only to the current compiling scope. It may not be shared, at least not without explicit consent of all parties. No magical syntax at a distance. Consent of the governed, and all that.0 no longer indicates octal numbers by itself. You must use an explicit radix marker for that. Pre-defined radix prefixes include: 0b base 2, digits 0..1
0o base 8, digits 0..7
0d base 10, digits 0..9
0x base 16, digits 0..9,a..f (case insensitive):2T<1.1> same as 0b1.1 (0d1.5)
Extra digits are assumed to be represented by 'a'..'z', so you can go up to base 36. (Use 'a' and 'b' for base twelve, not 't' and 'e'.) Alternately you can use a list of digits in decimal:
:60[12,34,56] # 12 * 3600 + 34 * 60 + 56
:100[3,'.',14,16] # piAny radix may include a fractional part. A dot is never ambiguous because you have to tell it where the number ends:
:16<dead_beef.face> # fraction
:16<dead_beef>.face # method call :16<dead_beef> * 16**8
:16<dead_beef*16**8>It's true that only radixes that define e as a digit are ambiguous that way, but with any radix it's not clear whether the exponentiator should be 10 or the radix, and this makes it explicit:
0b1.1e10 illegal, could be read as any of:
:2<1.1> * 2 ** 10 1536
:2<1.1> * 10 ** 10 15,000,000,000
:2<1.1> * :2<10> ** :2<10> 6So we write those as
:2<1.1*2**10> 1536
:2<1.1*10**10> 15,000,000,000
:2«1.1*:2<10>**:2<10>» 6The generic string-to-number converter will recognize all of these forms (including the * form, since constant folding is not available to the run time). Also allowed in strings are leading plus or minus, and maybe a trailing Units type for an implied scaling. Leading and trailing whitespace is ignored. Note also that leading 0 by itself never implies octal in Perl 6.
Any of the adverbial forms may be used as a function:
:2($x) # "bin2num"
:8($x) # "oct2num"
:10($x) # "dec2num"
:16($x) # "hex2num"Think of these as setting the default radix, not forcing it. Like Perl 5's old oct() function, any of these will recognize a number starting with a different radix marker and switch to the other radix. However, note that the :1T6() converter function will interpret leading 0b or 0d as hex digits, not radix switchers.
"\x123" (with \o and \d behaving respectively) or using square brackets: "\x[123]". Multiple characters may be specified within any of the bracketed forms by separating the numbers with comma: "\x[41,42,43]".qw/foo bar/ quote operator now has a bracketed form: <foo bar>. When used as a subscript it performs a slice equivalent to {'foo','bar'}. Much like the relationship between single quotes and double quotes, single angles do not interpolate while double angles do. The double angles may be written either with French quotes, «$foo @bar[]», or with "Texas" quotes, <<$foo @bar[]>>, as the ASCII workaround. The implicit split is done after interpolation, but respects quotes in a shell-like fashionT, so that «'$foo' "@bar[]"» is guaranteed to produce a list of two "words" equivalent to ('$foo', "@bar[]"). Pair notation is also recognized inside «...» and such "words" are returned as Pair objects. Fat arrow Adverbial pair
========= ==============
a => 1 :a
a => 0 :!a
a => 0 :a(0)
a => $x :a($x)
a => 'foo' :a<foo>
a => <foo bar> :a<foo bar>
a => «$foo @bar» :a«$foo @bar»
a => {...} :a{...}
a => [...] :a[...]
a => $a :$a
a => @a :@a
a => %a :%a
a => %foo<a> %foo:<a>Note that as usual the {...} form can indicate either a closure or a hash depending on the contents.
Note also that the <a b> form is not a subscript and is therefore equivalent not to .{'a','b'} but rather to ('a','b'). Bare <a> turns into ('a') rather than ('a',).
Two or more adverbs can always be strung together without intervening punctuation anywhere a single adverb is acceptable. When used as named arguments in an argument list, you may put comma between, because they're just ordinary named arguments to the function, and a fatarrow pair would work the same. When modifying an operator (that is, when one occurs where an operator is expected), you may not put commas between, and the fatarrow form is not allowd. See S06.
The negated form (:!a) and the sigiled forms (:$a, :@a, :%a) never take an argument and don't care what the next character is. They are considered complete.
The other forms of adverb (including the bare :a form) always look for an immediate bracketed argument, and will slurp it up. If that's not intended, you must use whitespace between the adverb and the opening bracket. The syntax of individual adverbs is the same everywhere in Perl 6. There are no exceptions based on whether an argument is wanted or not. Except as noted above, the parser always looks for the brackets. Despite not indicating a true subscript, the brackets are similarly parsed as postfix operators. As postfixes the brackets may be separated from their initial :foo with either dot or "long dot", but nothing else.
Regardless of syntax, adverbs used as named arguments generally show up as optional named parameters to the function in question--even if the function is an operator or macro. The function in question neither knows nor cares how weird the original syntax was.
:n :none No escapes at allT (unless otherwise adverbed)
[Conjectural: Ordinarily the colon is required on adverbs, but the "quote" declarator allows you to combine any of the existing adverbial forms above without an intervening colon:
quote qw; # declare a P5-esque qw//
quote qqx; # equivalent to P5's qx//
quote qn; # completely raw quote qn//
quote qnc; # interpolate only closures
quote qqxwto; # qq:x:w:to//]
If this is all too much of a hardship, you can define your own quote adverbs and operators. All the uppercase adverbs are reserved for user-defined quotes. All of Unicode above Latin-1 is reserved for user-defined quotes.
%hash = qw:c/a b c d {@array} {%hash}/;or
%hash = qq:w/a b c d {@array} {%hash}/;to interpolate items into a qw. Conveniently, arrays and hashes interpolate with only whitespace separators by default, so the subsequent split on whitespace still works out. (But the built-in «...» quoter automatically does interpolation equivalent to qq:ww/.../. The built-in <...> is equivalent to q:w/.../.)
q :w /.../.'', "", <>, «», ``, (), [], and {} have no special significance when used in place of // as delimiters. There may be whitespace before the opening delimiter. (Which is mandatory for parensT because q() is a subroutine call and q:w(0) is an adverb with arguments). Other brackets may also require whitespace when they would be understood as an argument to an adverb in something like q:z<foo>//. A colon may never be used as the delimiter since it will always be taken to mean another adverb regardless of what's in front of it. Nor may a # character be used as the delimiter since it is always taken as whitespace (specifically, as a comment).T macro quote:<qX> (*%adverbs) {...}Note: macro adverbs are automatically evaluated at macro call time if the adverbs are included in the parse. If an adverb needs to affect the parsing of the quoted text of the macro, then an explicit named parameter may be passed on as a parameter to the is parsed subrule, or used to select which subrule to invoke.
\qq[...] construct. Other "q" forms also work, including user-defined ones, as long as they start with "q". Otherwise you'll just have to embed your construct inside a \qq[...].TIn other words, this is legal:
"Val = $a.ord.as('%x')\n"and is equivalent to
"Val = { $a.ord.as('%x') }\n"print "The answers are @foo[]\n"
Note that this fixes the spurious "@" problem in double-quoted email addresses.
As with Perl 5 array interpolationT, the elements are separated by a space. (Except that a space is not added if the element already ends in some kind of whitespace. In particular, a list of pairs will interpolate with a tab between the key and value, and a newline after the pair.)
print "The associations are:\n%bar{}"
print "The associations are:\n%bar<>"Note that this avoids the spurious "%" problem in double-quoted printf formats.
By default, keys and values are separated by tab characters, and pairs are terminated by newlines. (This is almost never what you want, but if you want something polished, you can be more specific.)
print "The results are &baz().\n"
The function is called in scalar context. (If it returns a list anyway, that list is interpolated as if it were an array in string context.)
print "The attribute is $obj.attr().\n"
print "The attribute is $obj.attr<Jan>.\n"The method is called in scalar context. (If it returns a list, that list is interpolated as if it were an array.)
It is allowed to have a cascade of argumentless methods as long as the last one ends with parens:
print "The attribute is %obj.keys.sort.reverse().\n"
(The cascade is basically counted as a single method call for the end-bracket rule.)
print "The attribute is @baz[3](1,2,3){$xyz}<blurfl>.attr().\n"Note that the final period above is not taken as part of the expression since it doesn't introduce a bracketed dereferencer. Spaces are not allowed between the dereferencers even when you use the dotted forms.
list operator if necessary. The following means the same as the previous example.
print "The attribute is { @baz[3](1,2,3){$xyz}<blurfl>.attr }.\n"The final parens are unnecessary since we're providing "real" code in the curlies. If you need to have double quotes that don't interpolate curlies, you can explicitly remove the capability:
qq:c(0) "Here are { $two uninterpolated } curlies";or equivalently:
qq:!c "Here are { $two uninterpolated } curlies";Alternately, you can build up capabilities from single quote to tell it exactly what you do want to interpolate:
q:s 'Here are { $two uninterpolated } curlies';$a interpolates, so do $^a, $*a, $=a, $?a, $.a, etc. It only depends on the $. print "The dog bark is {Dog.bark}.\n" ${foo[$bar]}
${foo}[$bar]is dead. Use closure curlies instead:
{$foo[$bar]}
{$foo}[$bar](You may be detecting a trend here...)
"{.bark}"."{abs $var}".\v to mean vertical tab, whatever that is... (\v now match vertical whitespace in a regex.)\L, \U, \l, \u, or \Q. Use curlies with the appropriate function instead: "{ucfirst $word}".\c and square brackets: "\c[NEGATED DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE]"
Multiple codepoints constituting a single character may be interpolated with a single \c by separating the names with comma:
"\c[LATIN CAPITAL LETTER A, COMBINING RING ABOVE]"
Whether that is regarded as one character or two depends on the Unicode support level of the current lexical scope. It is also possible to interpolate multiple codepoints that do not resolve to a single character:
"\c[LATIN CAPITAL LETTER A, LATIN CAPITAL LETTER B]"
[Note: none of the official Unicode character names contains comma.]
:: type sigil when you're declaring a new one.) A consequence of this is that there's no longer any "use strict 'subs'". Since the syntax for method calls is distinguished from sub calls, it is only unrecognized sub calls that must be treated specially. You still must declare your subroutines, but a bareword with an unrecognized name is provisionally compiled as a subroutine call, on that assumption that such a declaration will occur by the end of the current compilation unit:
foo; # provisional call if neither &foo nor ::foo is defined so far
foo(); # provisional call if &foo is not defined so far
foo($x, $y); # provisional call if &foo is not defined so far
$x.foo; # not a provisional call; it's a method call on $x
foo($x); # not a provisional call; it's a method call on $xIf a postdeclaration is not seen, the compile fails at CHECK time. (You are still free to predeclare subroutines explicitly, of course.) The postdeclaration may be in any lexical or package scope that could have made the declaration visible to the provisional call had the declaration occurred before rather than after than the provisional call.
This fixup is done only for provisional calls. If there is any real predeclaration visible, it always takes precedence. In case of multiple ambiguous postdeclarations, either they must all be multis, or a compile-time error is declared and you must predeclare, even if one postdeclaration is obviously "closer". A single proto predeclaration may make all postdeclared multi work fine, since that's a run-time dispatch, and all multis are effectively visible at the point of the controlling proto declaration.
Parsing of a bareword function as a provisional call is always done the same way list operators are treated. If a postdeclaration bends the syntax to be inconsistent with that, it is an error of the inconsistent signature variety.
If the unrecognized subroutine name is followed by postcircumfix:<( )>, it is compiled as a provisional function call of the parenthesized form. If it is not, it is compiled as a provisional function call of the list operator form, which may or may not have an argument list. When in doubt, the attempt is made to parse an argument list. As with any list operator, an immediate postfix operator means there are no arguments, whereas anything following whitespace will be interpreted as an argument list if possible.
Based on the signature of the subroutine declaration, there are only four ways that an argument list can be parsed:
Signature # of expected args
() 0
($x) 1
($x?) 0..1
(anything else) 0..InfThat is, a standard subroutine call may be parsed only as a 0-arg term (or function call), a 1-mandatory-arg prefix operator (or function call), a 1-optional-arg term or prefix operator (or function call), or an "infinite-arg" list operator (or function call). A given signature might only accept 2 arguments, but the only number distinctions the parser is allowed to make is between void, singular and plural; checking that number of arguments supplied matches some number larger than one must be done as a separate semantic constraint, not as a syntactic constraint. Perl functions never take N arguments off of a list and leave the rest for someone else, except for small values of N, where small is defined as not more than 1. You can get fancier using macros, but macros always require predeclaration. Since the non-infinite-list forms are essentially behaving as macros, those forms also require predeclaration. Only the infinite-list form may be postdeclared (and hence used provisionally).
It is illegal for a provisional subroutine call to be followed by a colon postfix, since such a colon is allowed only on an indirect object, or a method call in dot form. (It is also allowed on a label when a statement is expected.) So for any undeclared identifier "foo":
foo.bar # foo().bar -- postfix prevents args
foo .bar # foo($_.bar) -- no postfix starts with whitespace
foo\ .bar # foo().bar -- long dot, so postfix
foo++ # foo()++ -- postfix
foo 1,2,3 # foo(1,2,3) -- args always expected after listop
foo + 1 # foo(+1) -- term always expected after listop
foo; # foo(); -- no postfix, but no args either
foo: # label -- must be label at statement boundary.
-- illegal otherwise
foo: bar: # two labels in a row
.foo: # $_.foo: 1 -- must be "dot" method with : args
.foo(1) # $_.foo(1) -- must be "dot" method with () args
.foo # $_.foo() -- must be "dot" method with no args
.$foo: # $_.$foo: 1 -- indirect "dot" method with : args
foo bar: 1 # bar.foo(1) -- bar must be predecl as class or sub
-- foo method call even if declared sub
foo bar 1 # foo(bar(1)) -- both subject to postdeclaration
-- never taken as indirect object
foo $bar: 1 # $bar.foo(1) -- indirect object even if declared sub
-- $bar considered one token
foo (bar()): # bar().foo(1) -- even if foo declared sub
foo bar(): # illegal -- bar() is two tokens.
foo .bar: # foo(.bar:) -- colon chooses .bar to listopify
foo bar baz: 1 # foo(baz.bar(1)) -- colon controls "bar", not foo.
foo (bar baz): 1 # bar(baz()).foo(1) -- colon controls "foo"
$foo $bar # illegal -- two terms in a row
$foo $bar: # illegal -- use $bar.$foo for indirection
(foo bar) baz: 1 # illegal -- use $baz.$(foo bar) for indirectionThe indirect object colon only ever dominates a simple term, where "simple" includes classes and variables and parenthesized expressions, but explicitly not method calls, because the colon will bind to a trailing method call in preference. An indirect object that parses as more than one token must be placed in parentheses, followed by the colon.
In short, only an identifier followed by a simple term followed by a postfix colon is ever parsed as an indirect object, but that form will always be parsed as an indirect object regardless of whether the identifier is otherwise declared.
use strict 'refs'" because symbolic dereferences are now syntactically distinguished from hard dereferences. @($arrayref) must now provide an actual array object, while @::($string) is explicitly a symbolic reference. (Yes, this may give fits to the P5-to-P6 translator, but I think it's worth it to separate the concepts. Perhaps the symbolic ref form will admit real objects in a pinch.)%x<foo> for constant hash subscripts, or the old standby %x{'foo'}. (It also works to say %x«foo» as long as you realized it's subject to interpolation.) But => still autoquotes any bare identifier to its immediate left (horizontal whitespace allowed but not comments). The identifier is not subject to keyword or even macro interpretation. If you say
$x = do {
call_something();
if => 1;
}then $x ends up containing the pair ("if" => 1). Always. (Unlike in Perl 5, where version numbers didn't autoquote.)
You can also use the :key($value) form to quote the keys of option pairs. To align values of option pairs, you may use the "long dot" postfix forms:
:longkey\ .($value)
:shortkey\ .<string>
:fookey\ .{ $^a <=> $^b }These will be interpreted as
:longkey($value)
:shortkey<string>
:fookey{ $^a <=> $^b } Old New
--- ---
__LINE__ $?LINE
__FILE__ $?FILE
__PACKAGE__ $?PACKAGE
__END__ =begin END
__DATA__ =begin DATAThe =begin END pod stream is special in that it assumes there's no corresponding =end END before end of file. The DATA stream is no longer special--any POD stream in the current file can be accessed via a filehandle, named as %=POD{'DATA'} and such. Alternately, you can treat a pod stream as a scalar via $=DATA or as an array via @=DATA. Presumably a module could read all its COMMENT blocks from @=COMMENT, for instance. Each chunk of pod comes as a separate array element. You have to split it into lines yourself. Each chunk has a .linenum property that indicates its starting line within the source file.
The lexical routine itself is &?ROUTINE; you can get its name with &ROUTINE.name. The current block is &?BLOCK. If the block has a label, that shows up in &?BLOCK.label.
<<, but with an adverb on any other quote construct: print qq:to/END/
Give $amount to the man behind curtain number $curtain.
ENDOther adverbs are also allowed:
print q:c:to/END/
Give $100 to the man behind curtain number {$curtain}.
END Context Type OOtype Operator
------- ---- ------ --------
boolean bit Bit ?
integer int Int int
numeric num Num +
string buf Str ~There are also various container contexts that require particular kinds of containers.
.bit property. Classes get to decide which of their values are true and which are false. Individual objects can override the class definition: return 0 but True;
list" operator which imposes a list context on its arguments even if list itself occurs in a scalar context. In list context, it flattens lazily. In a scalar context, it returns the resulting list as a single List object. (So the list operator really does exactly the same thing as putting a list in parentheses with at least one comma. But it's more readable in some situations.)[,] list operator may be used to force list context on its argument and also defeat any scalar argument checking imposed by subroutine signature declarations. This list flattens lazily.eager list operator. Don't use it on an infinite generator unless you have a machine with infinite memory, and are willing to wait a long time. It may also be applied to a scalar iterator to force immediate iteration to completion.This is not a problem for arguments that are arrays or hashes, since they don't have to care about their context, but just return themselves in any event, which may or may not be lazily flattened.
However, function calls in the argument list can't know their eventual context because the method hasn't been dispatched yet, so we don't know which signature to check against. As in Perl 5, list context is assumed unless you explicitly qualify the argument with a scalar context operator.
=> operator now constructs Pair objects rather than merely functioning as a comma. Both sides are in scalar context... operator now constructs Range objects rather than merely functioning as an operator. Both sides are in scalar context.Pair objects, in which case each pair provides a key and a value. You may, in fact, mix the two forms, as long as the pairs come when a key is expected. If you wish to supply a Pair as a key, you must compose an outer Pair in which the key is the inner Pair: %hash = (($keykey => $keyval) => $value);
enum function takes a list of keys or pairs, and adds values to any keys that are not already part of a key. The value added is one more than the previous key or pair's value. This works nicely with the new qq:ww form: %hash = enum <<:Mon(1) Tue Wed Thu Fri Sat Sun>>;
%hash = enum « :Mon(1) Tue Wed Thu Fri Sat Sun »;are the same as:
%hash = ();
%hash<Mon Tue Wed Thu Fri Sat Sun> = 1..7;Hash (or Pair) object. Binding to a "splat" hash requires a list of pairs or hashes, and stops processing the argument list when it runs out of pairs or hashes. See S06 for much more about parameter binding.glob function. while (<HANDLE>) {...}you now writeT
for =$handle {...}As a unary prefix operator, you may also apply adverbs to =:
for =$handle :prompt('$ ') { say $_ + 1 }or
for =($handle):prompt('$ ') { say $_ + 1 }or you may even write it in its functional form, passing the adverbs as ordinary named arguments.
for prefix:<=>($handle, :prompt('$ ')) { say $_ + 1 }is keyword, but are now called "traits". On the other hand, run-time properties are attached to individual objects using the but keyword instead, but are still called "properties".rw" attributes behave in all respects as variables, properties may therefore also be temporized with temp, or hypotheticalized with let.Lexing in Perl 6 is controlled by a system of grammatical categories. At each point in the parse, the lexer knows which subset of the grammatical categories are possible at that point, and follows the longest-token rule across all the active grammatical categories. (Ordering of grammatical categories matters only in case of a "tie", in which case the grammatical category that is notionally "first" in the grammar wins. For instance, a statement_control is always going to win out over a prefix operator of the same name. More specifically, you can't call a function named "if" directly because it would be hidden either by the statement_control category or the statement_modifier category.)
Here are the current grammatical categories:
term:<...> $x = {...}
quote:<qX> qX/foo/
prefix:<!> !$x (and $x.! if no postfix:<!>)
infix:<+> $x + $y
postfix:<++> $x++
circumfix:<[ ]> [ @x ]
postcircumfix:<[ ]> $x[$y] or $x .[$y]
regex_metachar:<,> /,/
regex_backslash:<w> /\w/ and /\W/
regex_assertion:<*> /<*stuff>/
regex_mod_internal:<perl5> m:/ ... :perl5 ... /
regex_mod_external:<nth> m:nth(3)/ ... /
trait_verb:<handles> has $.tail handles <wag>
trait_auxiliary:<shall> my $x shall conform<TR123>
scope_declarator:<has> has $.x;
statement_control:<if> if $condition {...} else {...}
statement_modifier:<if> ... if $condition
infix_postfix_meta_operator:<=> $x += 2;
postfix_prefix_meta_operator:{'»'} @array »++
prefix_postfix_meta_operator:{'«'} -« @magnitudes
infix_circumfix_meta_operator:{'»','«'} @a »+« @b
prefix_circumfix_meta_operator:{'[',']'} [*]Any category containing "circumfix" requires two token arguments, supplied in slice notation.