This page was generated by Text::SmartLinks v0.01 at 2014-10-31 18:01:11 GMT.
(syn eeb0e4e)
  [ Index of Synopses ]

TITLE

DRAFT: Synopsis 32: Setting Library - Str

AUTHORS

    Rod Adams <rod@rodadams.net>
    Larry Wall <larry@wall.org>
    Aaron Sherman <ajs@ajs.com>
    Mark Stosberg <mark@summersault.com>
    Carl Mäsak <cmasak@gmail.com>
    Moritz Lenz <moritz@faui2k3.org>
    Tim Nelson <wayland@wayland.id.au>
    Brent Laabs <bslaabs@gmail.com>

VERSION

    Created: 19 Mar 2009 extracted from S29-functions.pod
    Last Modified: 2014-08-24
    Version: 11

The document is a draft.

If you read the HTML version, it is generated from the Pod in the specs repository under https://github.com/perl6/specs/blob/master/S32-setting-library/Str.pod so edit it there in the git repository if you would like to make changes.

Str

General notes about strings:

A Str can exist at several Unicode levels at once. Which level you interact with typically depends on what your current lexical context has declared the "working Unicode level to be". Default is Grapheme. [Default can't be CharLingua because we don't go into "language" mode unless there's a specific language declaration saying either exactly what language we're going into or, in the absence of that, how to find the exact language somewhere in the environment.]

Attempting to use a string at a level higher it can support is handled without warning. The current highest supported level of the string is simply mapped Char for Char to the new higher level. However, attempting to stuff something of a higher level a lower-level string is an error (for example, attempting to store Kanji in a Byte string). An explicit conversion function must be used to tell it how you want it encoded.

Attempting to use a string at a level lower than what it supports is not allowed.

If a function takes a Str and returns a Str, the returned Str will support the same levels as the input, unless specified otherwise.

The following are all provided by the Str role:

chop
 multi method chop ( Str  $string: --> Str ) is export

Returns string with one Char removed from the end.

chomp
 multi method chomp ( Str $string: --> Str ) is export

Returns string with one newline removed from the end. An arbitrary terminator can be removed if the input filehandle has marked the string for where the "newline" begins. (Presumably this is stored as a property of the string.) Otherwise a standard newline is removed.

Note: Most users should just let their I/O handles autochomp instead. (Autochomping is the default.)

lc
 multi method lc ( Str $string: --> Str ) is export

Returns the input string after forcing each character to its lowercase form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context.

uc
 multi method uc ( Str $string: --> Str ) is export

Returns the input string after forcing each character to its uppercase (not titlecase) form. Note that one-to-one mapping is not in general guaranteed; different forms may be chosen according to context.

fc
 multi method fc ( Str $string: --> Str ) is export

Does a Unicode "fold case" operation suitable for doing caseless string comparisons. (In general, the returned string is unlikely to be useful for any purpose other than comparison.)

tc
 multi method tc ( Str $string: --> Str ) is export

Converts the first character of a string to titlecase form, leaving the rest of the characters unchanged, then returns the modified string. If there is no titlecase mapping for the first character, the entire string is returned unchanged. In any case, this function never changes any character after the first. (It is like the old Perl 5 ucfirst function in that respect.)

tclc
 multi method tclc ( Str $string: --> Str ) is export

Forces the first character of a string to titlecase and the rest of the characters to lowercase, then returns the modified string.

wordcase
 multi method wordcase ( Str $string: :&filter = &tclc, :$where = True --> Str ) is export

Performs a substitutional mapping of each word in the string, defaulting to the tclc mapping. Words are defined as Perl 6 identifiers, hence admit hyphens and apostrophes when followed by a letter. (Note that trailing apostrophes don't matter when casemapping.) The following should have the same result:

    .wordcase;
    .subst(:g, / <ident>+ % <[ \- ' ]> /, *.Str.tclc)

The filter function is applied to the first and last word always, and to any intermediate word that the where parameter smartmatches. Assuming suitable definitions of word lists, standard English capitalization might be handled with something like this:

    my $where = none map *.fc, @conjunctions, @prepositions;
    .wordcase(:$where);

(Note that the "standard" authorities disagree on the prepositions!)

The smartmatching is done case insensitively, so you should store your exceptions in fc form. If the where smartmatch does not match, then the word will be forced to lowercase unless the filter is &tcuc, in which case the exception will be forced to uppercase.

There is no provision for an alternate regex; if you need a custom word recognizer, you can write your own .subst as above.

samecase
 multi method samecase ( Str $string: Str $pattern --> Str ) is export

Has the effect of making the case of the string match the case pattern in $pattern. (Used by s:ii/// internally, see S05.)

samemark
 multi method samemark ( Str $string: Str $pattern --> Str ) is export

Has the effect of making the case of the string match the marking pattern in $pattern. (Used by s:mm/// internally, see S05.)

length

This word is banned in Perl 6. You must specify units.

chars
 multi method chars ( Str $string: --> Int ) is export

Returns the number of characters in the string in the current (lexically scoped) idea of what a normal character is, usually graphemes.

graphs
 multi method graphs ( Str $string: --> Int ) is export

Returns the number of graphemes in the string in a language-independent way.

codes
 multi method codes ( Str $string: $nf = $?NF --> Int ) is export

Returns the number of codepoints in the string as if it were canonicalized the specified way. Do not confuse codepoints with UTF-16 encoding. Characters above U+FFFF count as a single codepoint.

bytes

Gone. Use $str.encode($encoding).bytes instead.

encode
    multi method encode($encoding = $?ENC, $nf = $?NF --> Buf )

Returns a Blob which represents the original string in the given encoding and normal form. The actual return type is as specific as possible, so $str.encode('UTF-8') returns a utf8 object, $str.encode('ISO-8859-1') a blob8.

index
 multi method index( Str $string: Str $substring, StrPos $pos = StrPos(0) --> StrPos ) is export
 multi method index( Str $string: Str $substring, Int $pos --> StrPos ) is export

index searches for the first occurrence of $substring in $string, starting at $pos. If $pos is an Int, it is taken to be in the units of the calling scope, which defaults to "graphemes".

The value returned is always a StrPos object. If the substring is found, then the StrPos represents the position of the first character of the substring. If the substring is not found, a bare StrPos containing no position is returned. This prototype StrPos evaluates to false because it's really a kind of undefined value. Do not evaluate as a number, because instead of returning -1 it will return 0 and issue a warning.

pack
 multi pack( *@items where { all(@items) ~~ Pair } --> buf8 )
 multi pack( Str $template, *@items --> buf8 )

pack takes a list of pairs and formats the values according to the specification of the keys. Alternately, it takes a string $template and formats the rest of its arguments according to the specifications in the template string. The result is a sequence of bytes.

Templates are strings of the form:

  grammar Str::PackTemplate {
    regex TOP       { ^ <template> $ }
    regex template  { [ <group> | <specifier> <count>? ]* }
    token group     { \( <template> \) }
    token specifier { <[aAZbBhHcCsSiIlLnNvVqQjJfdFDpPuUwxX\@]> \!? }
    token count     { \*
                    | \[ [ \d+ | <specifier> ] \]
                    | \d+ }
  }

In the pairwise mode, each key must contain a single <group> or <specifier>, and the values must be either scalar arguments or arrays.

[ Note: Need more documentation and need to figure out what Perl 5 things no longer make sense. Does Perl 6 need any extra formatting

        features? -ajs ]

[I think pack formats should be human readable but compiled to an internal form for efficiency. I also think that compact classes should be able to express their serialization in pack form if asked for it with .packformat or some such. -law]

quotemeta
 multi method quotemeta ( Str $string: --> Str ) is export

Returns the input string with all non-"word" characters back-slashed. That is, all characters not matching "/<[A..Za..z_0..9]>/" will be preceded by a backslash in the returned string, regardless of any locale settings.

[Note from Pm: Should that be "/\w/" instead? Or, if the intent is to duplicate p5 functionality, perhaps it should be "p5quotemeta"? Do we even want this method at all?]

rindex
 multi method rindex( Str $string: Str $substring, StrPos $pos? --> StrPos ) is export
 multi method rindex( Str $string: Str $substring, Int $pos --> StrPos ) is export

Returns the position of the last $substring in $string. If $pos is specified, then the search starts at that location in $string, and works backwards. See index for more detail.

split
 multi split ( Str $delimiter, Str $input, Int $limit = Inf --> List )
 multi split ( Regex $delimiter, Str $input, Int $limit = Inf --> List )
 multi method split ( Str $input: Str $delimiter, Int $limit = Inf --> List )
 multi method split ( Str $input: Regex $delimiter, Int $limit = Inf, Bool :$all = False --> List )

Splits a string up into pieces based on delimiters found in the string.

String delimiters must not be treated as rules but as constants. The default is no longer ' ' since that would be interpreted as a constant. P5's split(' ') will translate to comb. Null trailing fields are no longer trimmed by default.

The split function no longer has a default delimiter nor a default invocant. In general you should use words to split on whitespace now, or comb to break into individual characters. See below.

If the :all adverb is supplied to the Regex form, then the delimiters are returned as Match objects in alternation with the split values. Unlike with Perl 5, if the delimiter contains multiple captures they are returned as submatches of single Match object. (And since Match does Capture, whether these Match objects eventually flatten or not depends on whether the expression is bound into a list or slice context.)

You may also split lists and filehandles. $*ARGS.split(/\n[\h*\n]+/) splits on paragraphs, for instance. Lists and filehandles are automatically fed through cat in order to pretend to be string. The resulting Cat is lazy. Accessing a filehandle as both a filehandle and as a Cat is undefined.

comb
 multi comb ( Str $matcher, Str $input, Int $limit = Inf, Bool :$match --> List )
 multi comb ( Regex $matcher, Str $input, Int $limit = Inf, Bool :$match --> List )
 multi method comb ( Str $input: Str $matcher, Int $limit = Inf, Bool :$match --> List )
 multi method comb ( Str $input: Regex $matcher = /./, Int $limit = Inf, Bool :$match --> List )

The comb function looks through a string for the interesting bits, ignoring the parts that don't match. In other words, it's a version of split where you specify what you want, not what you don't want.

That means the same restrictions apply to the matcher rule as do to split's delimiter rule.

By default it pulls out all individual characters. Saying

    $string.comb(/pat/, $n)

is equivalent to

    map {.Str}, $string.match(rx:global:x(0..$n):c/pat/)

You may also comb lists and filehandles. +$*IN.comb counts the characters on standard input, for instance. comb(/./, $thing) returns a list of single Char strings from anything that can give you a Str. Lists and filehandles are automatically fed through cat in order to pretend to be string. This Cat is also lazy.

If the :match adverb is applied, a list of Match objects (one per match) is returned instead of strings. This can be used to access capturing subrules in the matcher. The unmatched portions are never returned -- if you want that, use split :all. If the function is combing a lazy structure, the return values may also be lazy. (Strings are not lazy, however.)

lines
 multi method lines ( Str $input: Int $limit = Inf --> List ) is export

Returns a list of lines, i.e. the same as a call to $input.comb( / ^^ \N* /, $limit ) would.

words
 multi method words ( Str $input: Int $limit = Inf --> List ) is export

Returns a list of non-whitespace bits, i.e. the same as a call to $input.comb( / \S+ /, $limit ) would.

flip

The flip function reverses a string character by character.

     multi method flip ( $str: --> Str ) is export {
        $str.comb.reverse.join;
     }

This function will misplace accents if used at a Unicode level less than graphemes.

sprintf
 multi method sprintf ( Str $format: *@args --> Str ) is export

This function is mostly identical to the C library sprintf function.

The $format is scanned for % characters. Any % introduces a format token. Format tokens have the following grammar:

 grammar Str::SprintfFormat {
  regex format_token { '%': <index>? <precision>? <directive> }
  token index { \d+ '$' }
  token precision { <flags>? <vector>? <precision_count> }
  token flags { <[ \x20 + 0 \# \- ]>+ }
  token precision_count { [ <[1..9]>\d* | '*' ]? [ '.' [ \d* | '*' ] ]? }
  token vector { '*'? v }
  token directive { < % c s d u o x e f g X E G b p n i D U O F > }
 }

Directives guide the use (if any) of the arguments. When a directive (other than %) is used, it indicates how the next argument passed is to be formatted into the string.

The directives are:

 %   a literal percent sign
 c   a character with the given codepoint
 s   a string
 d   an integer, in decimal
 b   an integer, in binary
 o   an integer, in octal
 x   an integer, in hexadecimal
 X   like x, but using uppercase letters
 e   a floating-point number, in scientific notation
 f   a floating-point number, in fixed decimal notation
 g   a floating-point number, in %e or %f notation
 E   like e, but using an uppercase "E"
 G   like g, but with an uppercase "E" (if applicable)

Compatibility:

 i   a synonym for %d
 u   a synonum for %d
 D   a synonym for %d
 U   a synonym for %u
 O   a synonym for %o
 F   a synonym for %f

Perl 5 (non-)compatibility:

 n   produces a runtime exception
 p   produces a runtime exception
fmt
  multi method fmt( Scalar $scalar: Str $format = '%s' --> Str )
  multi method fmt( List $list:
                    Str $format = '%s',
                    Str $separator = ' '
                    --> Str )
  multi method fmt( Hash $hash:
                    Str $format = "%s\t%s",
                    Str $separator = "\n"
                    --> Str )
  multi method fmt( Pair $pair: Str $format = "%s\t%s" --> Str )

A set of wrappers around sprintf. A call to the scalar version $o.fmt($format) returns the result of sprintf($format, $o). A call to the list version @a.fmt($format, $sep) returns the result of join $sep, map { sprintf($format, $_) }, @a. A call to the hash version %h.fmt($format, $sep) returns the result of join $sep, map { sprintf($format, $_.key, $_.value) }, %h.pairs. A call to the pair version $p.fmt($format) returns the result of sprintf($format, $p.key, $p.value).

substr
 multi method substr (Str $string: StrPos $start, StrLen $length? --> Str ) is export
 multi method substr (Str $string: StrPos $start, StrPos $end --> Str ) is export
 multi method substr (Str $string: StrPos $start, Int $length --> Str ) is export
 multi method substr (Str $string: Int $start, StrLen $length? --> Str ) is export
 multi method substr (Str $string: Int $start, StrPos $end --> Str ) is export
 multi method substr (Str $string: Int $start, Int $length --> Str ) is export

substr returns part of an existing string. You control what part by passing a starting position and optionally either an end position or length. If you pass a number as either the position or length, then it will be used as the start or length with the assumption that you mean "chars" in the current Unicode abstraction level, which defaults to graphemes. A number in the 3rd argument is interpreted as a length rather than a position (just as in Perl 5).

Here is an example of its use:

 $initials = substr($first_name,0,1) ~ substr($last_name,0,1);

The function fails if either the start position or length is negative or undefined. (If the length argument is not given, it defaults to the rest of the string.) Either of start position or end position may be specified relative to the end of the string using a WhateverCode whose argument will be the position of the end of the string. While it is illegal for the start position to be outside of the string, it is allowed for the final position to be off the end of the string; in this case the entire rest of the string is returned, whatever is available.

substr-rw
 multi sub substr-rw($s is rw, $from = 0; $chars = $s.chars - $from) is rw

A version of substr that returns a writable reference to a part of a string variable:

 $string ~~ /(barney)/;
 substr-rw($string, $0.from, $0.to) = "fred";
trim
  multi method trim() is export;
  multi method trim-leading() is export;
  multi method trim-trailing() is export;

Returns a copy of the string, with leading and trailing whitespace removed.

The variants trim-leading and trim-trailing also return a copy of the string, but with whitespace removed only at their respective end of the string.

unpack
match
 method match(Str $self: Regex $search --> Match );

See "Substitution" in S05

subst
 method subst(Str $self: Regex $search, Str $replacement --> Str);
trans
 method trans(Str $self: Str $key, Str $val);
 multi trans(List of Pair %data --> Any );
indent
     multi method indent ($str: $steps --> Str ) is export;

Returns a re-indented string wherein $steps number of spaces have been added to each line. If a line already begins with horizontal whitespace, the new spaces are added to the end of those.

If the whitespace at the beginning of the line consists of only chr(32) spaces, chr(32) spaces are added as indentation as well. If the whitespace at the beginning of the line consists of some other kind of horizontal whitespace, that kind of whitespace is added as indentation. If the whitespace at the beginning of the line consists of two or more different kinds of horizontal whitespace, again chr(32) spaces are used.

If $steps is negative, removes that many spaces instead. Should any line contain too few leading spaces, only those are removed and a warning is issued. At most one such warning is issued per .indent call.

If $steps is *, removes just enough indentation to make some line have zero indentation.

Empty lines don't participate in re-indenting at all. That is, a line with 0 characters will still have 0 characters after the call. It also will not cause a warning to be issued.

The method will assume hard tabs to be equivalent to ($?TABSTOP // 8) spaces, and will treat any other horizontal whitespace character as equivalent to one chr(32) space. If the indenting doesn't "add up evenly", one hard tab needs to be exploded into the equivalent number of spaces before the unindenting of that line.

Decisions on how to indent each line are based solely on characters on that line. Thus, an .indent call on a multiline string therefore amounts to .lines».indent.join("\n") , modulo exotic line endings in the original string, and the proviso about empty lines.

IO

Returns an IO::Path, using the string as the file path.

path

A deprecated form of IO.

succ

Increments the string to the next numeric or alphabetic value, and returns the resulting string. The autoincrement operator ++ uses succ to determine the new value.

The last portion of the string before the first period (which may be the entire string) is incremented, using <rangechar > to determine which characters are eligible to be incremented. See "Autoincrement precedence" in S03 for details.

pred

Decrements the string to the next numeric or alphabetic value, and returns the resulting string. The autodecrement operator -- uses pred to determine the new value.

When attempting to decrement a string, such as "a0", where the result would remove the leftmost characters, pred returns failure instead.

The last portion of the string before the first period (which may be the entire string) is incremented, using <rangechar > to determine which characters are eligible to be incremented. See "Autoincrement precedence" in S03 for details.

Additions

Please post errors and feedback to perl6-language. If you are making a general laundry list, please separate messages by topic.

[ Top ]   [ Index of Synopses ]