Data-types that comprise smaller pieces are called compound data-types. Depending on what we are doing, we may want to treat a compound data type as a single thing, or we may want to access its parts. This ambiguity is useful.
The sequence is the ultimate compound data-type.
A line of text is commonly called a string since it is like a string of characters. While conventional languages have a special data type for strings, Euphoria can use the sequence to represent strings. This means that techniques that apply to strings can also be used with any sequence; what you learn about sequences in turn apply to strings.
The bracket operator selects a single character from a string.
sequence fruit = "banana" atom letter = fruit[1] puts(1, letter) -- bThe expression fruit[1] selects character number 1 from fruit. The variable letter refers to the result. When we output letter, we get: b.
The expression in brackets is called an index. An index specifies a member of an ordered set, in this case the set of characters in the string. The index indicates which one you want, hence the name.
The index must be an integer value. If you use a fractional value, then the value is truncated and that value is used for the index:
sequence fruit = "banana" puts(1, fruit[1.5] -- bThe value 1.5 truncates to the integer 1. Therefore the first letter is selected. If you try 0.5 as an index value then you get an error message--the truncated integer value of 0.5 is 0, which is an illegal index value.
The length() function returns the number of characters in a string:
sequence fruit = "banana" ? length(fruit) -- 6To get the last letter of a string, you could use the length() function like this:
sequence fruit = "banana" integer Length = length(fruit) atom last = fruit[Length] puts(1, last ) --aWritten more compactly:
sequence fruit = "banana" puts(1, banana[ length(banana) ] -- aYou may also use ( $ ) to index the last item in a sequence.
sequence fruit = "banana" puts(1, fruit[$] ) -- aIndex values must not be 0, a negative value, or a value greater than the length of the sequence. Such values all produce an error message--for strings and sequences.
Many computations involve processing a string one character at a time. Often they start at the beginning, select each character in turn, do something to it, and continue until the end. This pattern of processing is called a traversal. One way to encode a traversal is with a while loop:
integer index = 1 while index <= length(fruit) do letter = fruit[index] puts(1, letter ) puts(1, '\n' ) index = index + 1 end whileThis loop traverses the string and displays each letter on a line by itself. The loop condition is index < = length(fruit), so when index exceeds the length of the string, the condition is false, and the body of the loop is not executed. The last character accessed is the one with the index = length(fruit), which is the last character in the string.
for index = 1 to length( fruit ) do puts(1, fruit[index] ) end forEach time through the loop, the next character in the string is assigned to the variable index. The loop continues until no characters are left.
sequence prefixes = "JKLMNOPQ" sequence suffix = "ack" for letter = 1 to length( prefixes ) do puts(1, prefixes[letter] & suffix ) puts(1, '\n' ) end forThe output of this program is:
Jack Kack Lack Mack Nack Oack Pack QackOf course, that's not quite right because "Ouack" and "Quack" are misspelled.
A segment of a string is called a slice. Selecting a slice is similar to selecting a character:
sequence s s = "Peter, Paul, and Mary" puts(1, s[1 .. 5] ) puts(1, '\n' ) puts(1, s[8 ..12] ) puts(1, '\n' ) puts(1, s[18 .. 22] ) -- Peter -- Paul -- MarryThe operator [n .. m] returns the part of the string from the "n-th" character to the "m-th" character inclusively. You always need a start and end index.
sequence fruit = "banana" puts(1, fruit[ 3 .. $ ] -- anaIn this example:
sequence fruit = "banana" puts(1, fruit[3..3] --you have asked for a sequence of length zero, so "nothing" is output.
To compare strings (sequences) you must use either the equal() or compare() functions. Thus, to see if two strings are equal:
if equal( word, "banana" ) = 0 then puts(1, "Yes, we have no bananas!" ) end ifThe compare() function is useful for putting words into alphabetical order:
if compare( word, "banana" ) < 0 then puts(1, "Your word," & word & ", comes before banana." ) elsif compare( word > "banana" ) > 0 then puts(1, "Your word," & word & ", comes after banana." ) else puts(1, "Yes, we have no bananas!" ) end ifYou should be aware, though, that computer alphabets are not ordered they way you would expect.
FootNote{The ASCII chart gives the standard order used in all programming languages.}
All the uppercase letters come before all the lowercase letters. As a result:
Your word, Zebra, comes before banana.
A common way to address this problem is to convert strings to a standard format, such as all lowercase, before performing the comparison. Use the upper() function to do this. A more difficult problem is making the program realize that zebras are not fruit.
Comparisons are made on an element to element basis. That is why the ( = ) may not do the comparison you expect.
sequence w1 = "zebras" sequence w2 = "banana" ? w1 = w2 -- { 0, 0, 0, 0, 0, 0 }Yes, a comparison was made, but it is not a simple false or true result. If the lengths of the two sequences are not the same, you get an error message:
sequence w1 = "zebra" -- no 's' sequence w2 = "bananas" -- extra 's' ? w1 = w2 -- error -- sequence lengths are not the same (5 != 7)In general you should use only atom values in simple comparisons:
sequence w1 = "zebras" sequence w2 = "banana" if w1 = w2 then puts(1, "they are the same" ) end if -- error -- true/false condition must be an ATOMThis error message is a reminder that the ( = ) does not produce a simple false or true result when used in a compare sequences. (This is a common mistake for begining Euphoria users.)
Mutable means that you can change the value of any part of your string variable.
Euphoria lets you use the ( [ ] ) operator on the left side of an assignment, with the intention of changing a character in a string. For example:
sequence greeting = "Hello, world!" greeting[1] = 'J' puts(1, greeting ) -- Jello, world!
What does the following function do?
function find( integer ch, sequence str ) integer index = 1 while index <= length( str ) do if str[index] = ch then return index end if index = index + 1 end while return 0 end function ? find( 'n', "banana" ) -- 3In a sense, find() is the opposite of the ( [ ] ) operator. Instead of taking an index and extracting the corresponding character, it takes a character and finds the index where that character appears. If the character is not found, the function returns 0.
This is the first example we have seen of a return statement inside a loop. If str[index] = ch, the function returns immediately, breaking out of the loop prematurely.
If the character doesn't appear in the string, then the program exits the loop normally and returns 0.
This pattern of computation is sometimes called a "eureka" traversal because as soon as we find what we are looking for, we can cry "Eureka!" and stop looking.
We commonly call this traversal a search .
The following program counts the number of times the letter 'a' appears in a string:
sequence fruit fruit = "banana" integer count count = 0 for char=1 to length( fruit ) do if fruit[char] = 'a' then count = count + 1 end if end for ? count -- 3This program demonstrates another pattern of computation called a counter. The variable count is initialized to 0 and then incremented each time an 'a' is found.
Euphoria as several related functions, including a find() routine as built-in routines. See the library reference under "searching" and "matching." If a routine seems like it should universally useful, the odds are that someone has already written that routine for you. It may be in the Euphoria Library (look in the documentation), or it may be found in the Euphoria Archives (search the RDS webside).
You have access to may useful routines for the manipulation of sequences. These are well suited to string manipulations. Some, commonly used, routines are built-in; they are part of the Euphoria interpreter itself. A few must be included with an include command before you may use them. An include command makes available the contents of a file--containing code and routines--in your main program-code. Check with the Euphoria documentation before using one of these routines.
Two related functions are built-in: find() and match(). Even though this chapter is on strings, the good news is that these routines work on any sequences--the string being just one case of the sequence-type.
It helps to remember that a string is in reality a sequence of individual characters:
sequence string string = "bannana" ? string -- { 98, 97, 110, 110, 97, 110, 97 }
The find() function is designed to find an object in a sequence (needle in a haystack.)
? find( 'a', "banana" ) -- ? find( 97, {98,97,110,97,110,97}) -- 2 ^ ? find( "a", "banana" ) -- ? find( {97}, {98,97,110,97,110,97}) --0 ^ ? find( "na", "banana" ) -- ? find( {110,97}, {98,97,110,97,110,97}) --0 ^
The match() function is different, it is used to find a sequence as a slice of another sequence.
? match( 'a', "banana" ) -- ? match( 97, {98,97,110,97,110,97}) --error {} ? match( "a", "banana" ) -- ? match( {97}, {98,97,110,97,110,97} -- 2 { } ? match( "na", "banana" ) -- ? match( {110,97}, {98,97,110,97,110,97}) -- 3 { }
The find(), object in sequence, works on nested sequences:
? find( "pear", { "banana", "apple", "pear" } ) -- "pear" -- 3The match(), slice of sequence, works on nested sequences:
? match( { "apple","pear"}, { "banana", "apple", "pear" } ) -- {"apple", "pear" } -- 2When learning Euphoria, concentrate on how things work the same, instead of looking for exceptions.
So you have written a program with your personal find() routine and then decide to use the built-in
The string module includes a function named find that does the same thing as the function we wrote. To call it we have to specify the name of the module and the name of the function using dot notation.
sequence fruit = "banana" atom index index = find( fruit, 'a' ) ? index -- 1This example demonstrates one of the benefits of modules they help avoid collisions between the names of built-in functions and user-defined functions. By using dot notation we can specify which version of find we want.
Actually, string.find is more general than our version. First, it can find substrings, not just characters:
>>> string.find("\""banana"\"", "\""na"\"") 2Also, it takes an additional argument that specifies the index it should start at:
>>> string.find("\""banana"\"", "\""na"\"", 3) 4
Or it can take two additional arguments that specify a range of indices:
>>> string.find("\""bob"\"", "\""b"\"", 1, 2) -1
In this example, the search fails because the letter b does not appear in the index range from 1 to 2 (not including 2). }
It is often helpful to examine a character and distinguish between upper and lowercase, or distinguish between characters and digits.
One way to do this is to first define some string constants:
constant lowercase = "abcdefghijklmnopqrstuvwxyz" constant uppperase = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" constant digits = "1234567890"We can use these constants and find() to classify characters. For example, if find(lowercase, ch) returns a value other than -1, then ch must be lowercase:
function isLower( integer ch ) return find( ch, lowercase ) end function ? isLower( 'A' ) ? isLower( 'r' ) -- false -- trueNot a surprise, something useful like isLower() is available in the Euphoria library. Euphoria has
!# predefined utility data-types with many commonly used sets of characters.
<euphoria> include std/types.e ? t_lower( 'A' ) ? t_lower( 'r' ) false
1! return ch in string.lowercase As yet another alternative, we can use the comparison operator:
function isLower2( atom ch) --return 'a' < = ch < = 'z' if 'a' <= ch and ch <= 'z' then return ch else return 0 end if end function ? isLower2( 'A' ) ? isLower2( 'r' ) -- 0 -- 114If ch is between 'a' and 'z', it must be a lowercase letter.
Another constant may be useful:
constant whitespace = { ' ', '\t' , '\n' }
Whitespace characters move the cursor without printing anything. They create the "white space" between visible characters (at least on white paper). The sequence whitespace contains all the whitespace characters, including space, tab ( '\t' ), and newline ( '\n' ).
Examine the library routines for a wealth sequence routines that can be applied to strings. In addition, there are text utilities available in the Euphoria archives.