About this chapter
This chapter describes Mops' string-handling classes. Strings are objects that contain variable-length sequences of text, with methods for deletion, insertion etc. Mops' powerful string handling facility provides an excellent base on which you can build various text-based utilities.
|Mops: Using Strings|
Mops strings are implemented as relocatable blocks of heap that can expand and contract as their contents change. A string object itself contains a handle to the heap block that contains the string's data. It also contains three other ivars which we will describe below.
Strings can be useful for a wide variety of programming needs. They can serve as file buffers, staging areas for text to be printed on the screen, dictionaries, or vehicles for parsing user input. You should consider using strings for any run of bytes whose length and/or contents are likely to change in the course of your program's execution. Strings are not restricted to ASCII text, although that will probably be their most common use. Note, however, that text constants can more efficiently be implemented as SCONs or string literals (see II.5 for more information).
Using strings is somewhat like using files, in that you must open the string before you use it and close it when you're through. This is done by sending a New: message to each string before you use it, to allocate the string's heap storage, and then sending a Release: message when you no longer need the string. Release: is actually inherited from String's superclass, Handle, and calls the Toolbox routine DisposeHandle.
There are two classes of strings in Mops. String supports basic string operations, such as Get:, Put: , Insert: and Add:. Class String+, a subclass of String, adds more methods, such as searching. Both classes are in the precompiled Mops.dic, and are really only split into two classes since String+ has some code methods, which require the Assembler for compilation, whereas we do require some string operations at an earlier point in the building of the full system, before the Assembler is available. But for all practical purposes you can treat the two classes as a single class. This is especially true in PowerPC Mops, where a number of the methods in String+ have been moved to String, because they were needed earlier, and String+ has been rewritten in high-level Mops.
Many of the String methods are built around the Toolbox Utilities routine Munger, which is a general-purpose string-processing primitive. You might read the IM Toolbox Utilities section on Munger to gain a deeper understanding of what characteristics it contributes to Mops string handling.
Strings have a current size, which is the same as the length of the relocatable block of heap containing the string's data. Strings also have two offets into the string data, called POS and LIM. POS marks the ‘current’ position, and LIM the ‘current’ end. Most string operations operate on the substring delimited by POS and LIM, which we call the active part of the string, rather than the whole string. We also keep the size of the string (the real size, that is) in an ivar, so that we can get it quickly without a system call.
Communicating with other objects
While most of the method descriptions below should be self-explanatory, several are worth additional comment. One group of String+'s methods takes the address of another String or String+ object as one of its parameters, and accesses the active part of this second string.
String+ also has several methods that simplify its use as a file buffer. ReadN:, ReadRest:, ReadAll: and ReadLine?: all accept a File object as one of the parameters, and will request that the File perform a read into the string, setting the size of the string to the number of bytes actually read. Doing things this way is very convenient, especially as the file data is left in a String+ object, and is therefore subject to all of the various manipulations that String+ can perform.
Finally, String+'s Draw: method accepts a Rect object and a justification parameter, and draws the contents of the string as justified text within the box specified by the rectangle.
Translate tables allow very fast searching of strings for specified sets of characters. In effect we are separating the specification of what we are searching for from the actual search operation itself. This allows an uncluttered and extremely fast search operation (the scan:, <scan:, scax: and <scax: methods of class String+), and it also allows a very flexible (and easily extensible) choice of what to search for. The setup time for translate tables can generally be factored out of inner loops, or done at compile time, and is quite fast, anyway.
We first define a class (trtbl) which is needed to define the table mapping lower case letters to upper case. This table is then used by some of the methods in the Trtbl class proper. However this is just an implementation convenience — these classes really should be thought of as one class, so we put all the methods together here.
|Superclass||(TrTbl), whose superclass is Object|
|Source file||StrUtilities zString+|
|tbl:||( -- addr )||Returns the address of TheTbl|
|clear:||( -- )||Clears all bytes of the table to zero|
|put:||( addr len -- )||Copies the bytes given by (addr len) into the table. If len is greater than 256, only the first 256 bytes are copied|
|selchars:||( addr len -- )||Selects each of the bytes given by (addr len). The table byte corresponding to each byte in the list will be set nonzero. The actual value used will be n, where this is the nth byte which has been selected since the last clear:. If two or more bytes in the list are the same (which means they select the same table position), the first will be used in determining the value of the table byte. The counting of n will nevertheless still continue for all the bytes in the passed-in list. Note that this rule only applies within one selchars: operation — if a character is selected by selchars: (or selchar: below) which has already been selected in a previous selection operation, and it is the nth character selected since the last clear:, the corresponding table byte will still be set to n even though it was already nonzero|
|selchar:||( c -- )||Selects the single character c. The value of the table byte is determined as in selchars:|
|selcharNC:||( c -- )||“Select char, no case”. Selects a character, and if it is a letter, enters the same value in the lower case and upper case positions of the table, so that case will in effect be ignored when the table is used|
|selRange:||( lo hi -- )||Selects all characters with values from lo to hi inclusive. The selected table bytes will all be set to 1 — when a range is selected, there isn't usually a need to distinguish the individual characters. Does nothing if hi < lo|
|invert:||( -- )||Reverses the current selection. All nonzero table bytes are cleared, and all zero bytes are set to -1. (There is no special significance in this value; it was just the simplest to do quickly, thanks to the SEQ machine instruction)|
|>uc:||( -- )||Copies the 26 bytes corresponding to A-Z into the a-z positions. Subsequently any translate operation using this table object will give identical results for upper and lower case letters. Note the direction of the copy — you need to first set up the UPPER case letter positions, then use >uc:|
|transc:||( c -- c' )|| Translates the single character c using the table, and returns the corresponding byte c' from the table.
All other translate table operations are methods of class String+
Error messages - None
String defines a variable-length string object with basic access methods whose data exists as a relocatable block of heap. Size is limited only by available memory.
|Source file||String pString|
|Inherits:||Handle, Var, Longword, Object|
|handle:||( -- handle )||Returns the handle to the string — replaces get: in the superclass Handle, since we will be redefining get: here with a different meaning|
|pos:||( -- n )||Returns the value of Pos|
|>pos:||( n -- )||Stores n in Pos|
|lim:||( -- n )||Returns the value of Lim|
|>lim:||( n -- )||Stores n in Lim|
|len:||( -- n )||Returns the value of Lim - Pos, i.e. the length of the active part|
|>len:||( n -- )||Adds n and Pos, and stores the result in Lim|
|skip:||( n -- )||Adds n to Pos|
|more:||( n -- )||Adds n to Lim|
|start:||( -- )||Clears Pos, so that the active part now starts at the ‘real’ start of the string.|
|begin:||( -- )||Clears both Pos and Lim. Useful for setting up for an iterative operation on the string|
|end:||( -- )||Sets both Pos and Lim to the size (i.e. the end) of the string. Useful for setting up for an iterative operation which has to go backwards through the string|
|nolim:||( -- )||Sets Lim to the end of the string|
|reset:||( -- )||Clears Pos, and sets Lim to the end of the string. The active part will now be the whole string|
|step:||( -- )||Steps forward in the string, setting Pos to Lim and then setting Lim to the end of the string|
|<step:||( -- )||Steps backward in the string, setting Lim to Pos and then clearing Pos|
|new:||( -- )||Creates a heap block for the string's data, and sets the handle. The initial size is zero. new: must be done before the string can be used|
|?new:||( -- )||Ensures a heap block is allocated, by calling new: if necessary (indicated by the handle being nilH). If a block is already allocated, does nothing|
|size:||( -- n )||Returns the size of the (whole) string|
|setSize:||( n -- )||Sets the size of the (whole) string to n, then does a reset:|
|clear:||( -- )||Ensures a heap block is allocated, calling new: if necessary, then sets its size to zero|
|get:||( -- addr len )||Returns the address and length of the active part of the string|
|all:||( -- addr len )||Returns the address and length of the entire string (not just the active part)|
|1st:||( -- c )||Returns the character at Pos|
|^1st:||( -- addr )||Returns the address of the character at Pos|
|uc:||( -- addr len )||Converts the active part to upper case and does a get:|
|put:||( addr len -- )||Ensures a heap block is allocated, calling new: if necessary, then replaces it with passed-in string, and does reset: as well|
|->:||( str -- )||Replaces the whole of this string (as in put:) with the active part of str, which may be a String or String+ (we use early binding, and assume the class)|
|insert:||( addr len -- )||Ensures a heap block is allocated, calling new: if necessary, then inserts the string given by (addr len) at Pos. Increments both Pos and Lim by len (thus the bytes at the Pos and Lim position will be the same as before, and the byte immediately preceding the Pos position will be the last of the inserted bytes)|
|$insert:||( str -- )||Inserts the active part of str, as for insert:|
|add:||( addr len -- )||Inserts (addr len) at the end of this string. Pos and Lim are then set to the (updated) end position|
|$add||( str -- )||Inserts the active part of str at the end of this string|
|+:||( c -- )||Appends the character c to the end of the string, and sets Pos and Lim to the (updated) end position|
|fill:||( c -- )||Overwrites each character in the active part of the string with the character c|
|search:||( addr len -- b )|| Searches the active part of this string, starting from the left (i.e. the Pos position), for the string (addr len). If a match is found, Lim is set to indicate the first of the matching characters and true is returned. If no match is found, Lim is unchanged and false is returned.
Note 1: an improved version with case control is provided in String+.
Note 2: We use Lim rather than Pos, since it often happens after a search that some operation needs to be done on the part of the string preceding the matching substring. If this isn't needed, step: is convenient for updating Pos to the matching substring position and preparing for another search
|chsearch:||( c -- b )||Searches the active part of this string for the character c. If it is found, Lim is set there and true is returned. If it isn't found, Lim is unchanged and false is returned|
|copyto:||( ^string-obj -- )||Overrides copyto: in class Object. The only change is that we set a flag in this object, marking it as a copy. This will mean that any future operation which would change the size of this object will be blocked with an error message. You will be able to alter Pos and Lim freely, but not insert or delete. It is frequently useful to have several copies of the same string object, in order to manipulate several active parts at once. But I have found that it's important to keep one as the ‘original’ object, and only insert/delete on this one. Failure to do this led to crashes|
|mark_original:||( -- )|| Overrides the above check, by clearing the flag, so that this string becomes ‘original’. Only use this method if you're quite sure what you're doing. The idea of the long name is that you won't type it accidentally!
|print:||( -- )||Displays the active part of the string, assuming it to be ASCII characters|
|dump:||( -- )||Gives a dump of the string, displaying various useful quantities such as Pos and Lim, and displaying the contents of the string as ASCII characters and in hex|
|rd:||( -- )||“Reset and dump”. Does reset:. then dump:. Short to type when debugging!|
| The stream methods read: and write: are meant to look the same for both strings and files (and for anything else we might think of later). By late binding to an object that supports these, we don't have to know or care exactly what it is. The object gives us bytes or accepts bytes, and tells us whether it was successful, and that's all we have to worry about.
For read:, we only use the active part of the string. We update POS by the number of bytes transferred. If we transfer the number asked for, we return a ‘no error’ code of zero, otherwise -1. (We don't use true and false so as to behave the same way as files). write: is basically the same as add:. There's no way this can fail unless we run out of memory, so we always return zero
|read:||( addr len -- code )||Copies the active part of the string to the memory area given by ( addr len ). Updates Pos by the number of bytes transferred. Returns zero if all the active part is transferred, -1 if not (i.e. the length of the active part was greater than len)|
|write:||( addr len -- 0 )||Similar to the add: method (see above). Always returns zero, indicating success|
|send:||( ^obj -- )||Serializes the string, by first sending the ivars, then the string itself|
|bring:||( ^obj -- )||Reconstitutes the string as serialized by send:|
|“String pointer(s) out of bounds”|
|Pos was found to be greater than Lim, or either was negative or greater than the size of the string. Pos and Lim are also displayed when this message is given. We check for this error condition whenever we access the actual characters of the string. Operations such as >pos: don't perform the check — this is for speed, and also because when we are doing manipulations on Pos and Lim we don't want to put any restriction on intermediate values.|
|“Can't do that on a string copy”|
|You attempted to insert, delete, or change the size of a string object which was flagged as a ‘copy’. See above under copyto:.|
String+ adds many useful methods to String. Note that in PowerMops, some of the methods listed here are actually defined in class String, since we needed them at that stage for the PowerPC code generator, but this shouldn't affect your source code at all
|Source file||String+ zString+|
|Instance variables||None (see String)|
|Inherits:||String, Handle, Var, Longword, Object|
|swapPos:||( n -- n' )||Swaps Pos with the top of the stack|
|save:||( -- handle pos lim )||Saves the current string parameters|
|restore:||( handle pos lim -- )||Restores the string parameters. Must match a save:|
|2nd:||( -- c )||Returns the second char in the active part, or 0 if the active part's length is 1. Gives an error if the active part is empty|
|last:||( -- c )||Returns the last char in the active part. Gives an error if the active part is empty|
|compare:||( addr len -- n )||Compares the string ( addr len ) with the active part of this string. Comparison is by CMPSTR, with the ( addr len ) string as the first operand. Case is significant if CASE? is set to true. Returns: -1 if the first string is low, 0 if strings are equal, 1 if the first string is high. We assume the lengths are both less than 64K|
|?:||( addr len -- n )||As for compare:, except that if the the ( addr len ) string is shorter than the active part of this string, only the first len chars in the active part are used. Note that this only makes a difference if an ‘equal’ result is obtained|
|=?:||( addr len -- b )||Compares as for ?:, but only tests for equal/not equal. Returns true on equal|
|ch=?:||( c -- b )||Compares the given single character against the character at Pos. Returns true on equal. If the active part of the string is empty, always returns false|
|search:||( addr len -- b )||Similar to search: in String, but has full case control, according to the setting of the value Case?. This also applies to all the following searching operations|
|<search:||( addr len -- b )||Backwards search. Searches the active part of this string, starting from the right (i.e. the Lim position), for the string (addr len). If a match is found, Pos is set to indicate the first (leftmost) of the matching characters and true is returned. If no match is found, Pos is unchanged and false is returned|
|sch&skip:||( addr len -- b )||Searches for the string ( addr len ) and if found, sets Pos to the character following the found substring. Leaves Lim unchanged|
|chsearch:||( c -- b )||Searches for the single character c. If found, returns true and leaves Lim pointing there. If not found returns false and leaves Lim unchanged|
|<chsearch:||( c -- b )||Backward search for the character c. If found, sets Pos|
|chsch&skip:||( c -- b )||What you'd expect. Searches as for chsearch:, and if the char is found, Pos is set pointing to the next character. Lim is unchanged|
|chskip?:||( c -- b )||Searches for the first character NOT equal to c. This method has a couple of differences to the other searching methods, dictated by what we normally need it for. If it suceeds, Pos (not Lim) is set to that position, and it is always case sensitive, regardless of CASE?|
|chskip:||( c -- )||As for chskip?:, but returns no boolean result|
|scan:||( trtbl -- n )||Searches for a single character, using a translate table. ‘Success’ is defined as a character which yields a non-zero value from the table. The return result is this non-zero value, or zero if none was found. On success, as usual, Lim is set to point to the found character|
|<scan:||( trtbl -- n )||Backward scan. If successful, Pos points to the character matched|
|scax:||( trtbl -- n )||“Scan excluding”. Same as scan:, but ‘success’ is defined as a character which yields a zero value from the table. The return result is the last byte fetched from the table, which will be zero on success, or otherwise it will be whatever table byte corresponds to the last char in the active part of the string — something nonzero, in any case|
|<scax:||( trtbl -- n )||Backward scax. If successful, Pos points to the character matched|
|translate:||( trtbl -- )||Translates the whole active part of the string, using the table. Replaces each byte in the string with the looked-up value from the table|
|trans1st:||( trtbl -- n )||Translates the first char in the active part of the string, and returns the looked-up value. The char in the string isn't changed. Returns zero if the active part is empty|
|>uc:||( -- )|| Converts any letters in the active part to upper case. This is done by
UCtbl translate: self This is faster than UPPER, and not limited to 64K
|ch>uc:||( -- )|| Converts the first char of the active part to upper case.
insertion, deletion, replacement
|chinsert:||( c -- )||Inserts the char c at Pos. Pos and Lim are incremented by 1|
|ovwr:||( addr len -- )||Overwrites the active part of this string with the string ( addr len ). Copying stops at the end of the active part, or when len characters have been transferred. Pos is incremented by the number of chars transferred. This operation is faster than normal replacement, as the length of this string cannot change, so we don't need to call Munger|
|chovwr:||( c -- )||Overwrites the char at Pos with c|
|$ovwr:||( str -- )||Overwrites the active part of this string with the active part of str|
|repl:||( addr len -- )||Replaces the active part of this string with the string (addr len). Pos and Lim are both set pointing just past the newly inserted characters|
|$repl:||( str -- )||Replaces the active part of this string with the active part of str|
|sch&repl:||( addr1 len1 addr2 len2 -- b )||Searches for the string (addr1 len1) in the active part of this string, using search:. If a match is found, the matching substring is replaced by the string (addr2 len2), Pos and Lim are both set pointing just past the newly inserted characters, and true is returned. If no match is found, Pos and Lim are unchanged and false is returned|
|replAll:||( addr1 len1 addr2 len2 -- )||Replaces all occurrences of (addr1 len1) by (addr2 len2) in the WHOLE of this string (i.e. ignoring Pos and Lim). After the operation, a reset: is done|
|delete:||( -- )||Deletes the active part. Lim is then set equal to Pos|
|deleteN:||( n -- )||From Pos, deletes n characters or up to Lim, whichever comes first. Lim is reduced by the number of characters deleted|
|line>:||( -- )||sets Lim to the end of the current line (i.e. starting from Pos, the next Return character or the end of the string). Pos is unchanged|
|nextline?:||( -- b )|| Sets Pos and Lim to delimit the next line. This means Pos will point to the char after the Return character (or to the first char of the string), and Lim will point to the next Return, or to one past the end of the string. If Lim initially does not point to a Return character, the ‘next’ line will actually be the rest of the current one, starting from where Lim pointed. This behavior means that if Pos and Lim are initially zero, calling nextline?: will actually yield the first line. This can be useful. The returned boolean is true if we actually get another line, and false if we don't, that is, if Lim was initially at the end of the string.
Note that if the string ends with a Return character, and Lim points to this character when nextline?: is called, this is not the same as Lim pointing to one past the end of the string, which is its real “end of string” value. Thus nextline?: will return true with an empty line. The next call will return false. (This behavior is correct. If a string ends with a Return, it ends with an empty line.)
|<nextline?:||( -- b )||The backwards equivalent. Sets Pos to the previous return character and Lim to the previous Pos|
|addline:||( addr len -- )||Adds the (addr len) string to this string as for add:. Also adds a Return at the end, if (addr len) doesn't already end with a Return|
|$addline:||( str -- )||Adds the active part of str to this string, as for addline:|
|readN:||( file n -- )||Reads n bytes using the passed-in file object. The file must already be open. The bytes read completely replace the WHOLE string (that is, Pos and Lim are ignored). A reset: is done at the end|
|readLine?:||( file n -- b )||Reads the next line up to a max of n chars into this string (as for readN:). Returns false if end of file. Reads a final Return character (if any) from the file, but doesn't include it in the bytes transferred to the string|
|readRest:||( file -- )||Reads all the rest of the file from its current position into the string|
|readAll:||( file -- )||Reads all the file into the string|
|readTop:||( -- )||Reads all of Topfile into the string, then closes and drops Topfile (see class FileList). Topfile must already be open|
|$write:||( file -- )||Writes the active part to the file|
|send:||( file -- )||Writes the whole string object to the file. See under class File for a full description of the standard methods send: and bring:, which can be implemented by any classes which need them|
|bring:||( file -- )|| Reads back the string object from the file, assuming that it was written by send:.
|draw:||( theRect justification -- )|| Draws the active part in rect theRect, using the Toolbox TextBox routine.
|printAll:||( -- )||Displays the whole string via TYPE. Handles any embedded Return characters by starting a new line for each one|
Error messages - None
|Basic Data Structures||Classes||Files|