A value is one of the fundamental things--like a number or some text--that you want your program to manipulate. The values we have seen so far are 2 (the result when we added 1 + 1), and the sentence: "Hello, World!".
In Euphoria terms, it is the object that is the fundamental means by which achieve the goal of manipulating values.
Each object has a specific value. We can make some casual observations about these values. They are distinctive: it is clear that "numbers" and "text" look different. You (and sometimes the computer) can distinguish between them using clues such as: numbers are composed of the digits between 0 and 9, while text is enclosed between quotation ( '' ) marks. We also use these values differently: numbers are used in calculations, while text is arranged into literary compositions. The conventional view is that these values represent two different data-types. A data-type represents a unique set of values, and a corresponding set of operations on those values.
Conventional languages are designed with many data-types. Since numbers and text "look" different, each gets a data-type. To make things worse: small numbers and big numbers get separate data-types, numbers with and without decimal points get separate data-types, characters and strings get separate data-types... This gets complicated very quickly.
The Euphoria language is based on a critical observation. First, computers only work with numbers. Therefore all values (numbers, text, music, movies, ...) must exist as numbers. Next, comes the realization that you either have one number, or a collection of numbers. This now gives us only two data-types to consider: atom and sequence.
The atom is the data-type that represents a single numeric value. Atomic numbers (like physical atoms) can not be decomposed to smaller components without a loss of their "atomic" identity.
Examples of atoms:
2 4.1414 0.000001 119 'w'
The sequence is the data-type represents a collection of numeric values. There are no restrictions on what a sequence may contain. A sequence may be composed of any mixture of atoms, sequences, and sequences containing sequences. If you have a particular arrangement of data that interests you, then the sequence can be used to represent it.
Examples of sequences:
{ 'a' } { 2, 'b' , { 3, 'c' } } { "HELLO" }
Finally, when you think of something as being capable of being either an atom or sequence then the object is the universal data-type, it may be either an atom or sequence. This means that you could write a program, of any size and complexity, using only one data-type.
You may have noticed some items that do not "look" numeric included in the examples. A character represents a letter, digit, or other symbol used in printing text. Each character has a numeric value determined by a standard ASCII code; for example 'w' is equivalent to 119. Therefore each character is indeed a number--hence an atom. A string is a sequence of characters. Ultimately, a character string is just a sequence of numbers. Each character string is indeed numeric--hence a sequence.
The Euphoria data-type system is a tremendous simplification over the approaches used conventionally. With this simplification comes a power and flexibility that makes programming in Euphoria much easier than with other languages.
One of the most powerful ways to manipulate objects is to create variables. A variable is a identifier that refers to a object. Variables are the means by which we most often use objects. In practical terms, you think in terms of the value (of the object) and just use the identifier whenever you want to work with that value.
Creating a variable is a two step process. First, the declaration command creates a variable by associating an (i.e. name) of the variable with its data-type (object, atom, sequence). Next, the assignment command associates a value with that variable. Finally, you may use the variable for any programming task:
declaration command:
object message
assignment command:
message = "What's up, Doc?"
In this example we use the object-type, by using the object keyword, in the declaration. This is the easy choice and it will always work.
The ( = ) assignment operator is the traditional way to assign a value to a variable. This is not the same as the "equals" symbol used in mathematics.
Some languages replace ( = ) with a different symbol. This turns out to be annoying due to the extra typing needed. In Euphoria there are no situations where the equals symbol can be misinterpreted, so we stick with tradition.
As a convenience, it is possible to declare and assign a value to a variable as one command:
atom x = 4.1 sequence word = "Euphoria" object pair = { "one", 1 }
It is "good programming practice" to think about how the variable will be used and then choose between the atom and sequence types when you can.
In a program these choices may look like this:
sequence message = "What's up, Doc?" atom n = 17 atom pi = 3.14159 object fred = "Hello" object hour = 11
This example makes three assignments. The first assigns the string "What's up, Doc?" to a new variable named message. The second gives the integer 17 to n, and the third gives the decimal number 3.14159 to pi.
A common way to represent variables on paper is to write the name with an arrow pointing to the variable's value. This kind of figure is called a state diagram because it shows what state each of the variables is in (think of it as the variable's state of mind). This diagram shows the result of the assignment commands:
State diagrams are a way of doodling on paper when you are thinking about how a program works.
The object-type can be used for everything:
object fred, hour, minute fred = "Hello" hour = 11 minute = 59.33
It is a fundamental requirement that a variable declared and assigned a value before use. Until then it does not yet "exist". Trying to use a "non-existent" variable results in an immediate error and program stop.
An output command is used to display the value of a variable. There are several output procedures available, depending on how you want to display the variable.
The quickest way to get output is to use ( ? ):
? pi -- 3.14159
The ( ? ) works with any object. It will show you the numeric values and structure of a nested sequence:
object foo = { 4, '4', "dog", { "nested cat", 4545 } } ? foo --{ -- 4, -- 52, -- {100,111,103}, -- { -- {110,101,115,116,101,100,32,99,97,116}, -- 4545 -- } --}
You may think of ( ? ) as a quick way to get diagnostic output as you write your program.
The numbers only output may be inconvenient. That is when you use the pretty_print() procedure:
include std.e object foo = { 4, '4', "dog", { "nested cat", 4545 } } pretty_print(1, foo ) -- --{ -- 4, -- 52'4', -- {100'd',111'o',103'g'}, -- { -- 10'n',101'e',115's',116't',101'e',100'd',32' ',99'c',97'a',116't'}, -- 4545 -- } --}
This is a variation on the numbers only output. Values that could be either numbers or characters are now displayed twice--showing both the numeric and character value for each element.
If this is odd to you, recall that a sequence is defined to be a collection of numeric values. Euphoria only works with numbers! Euphoria has no way to know what your intended use for a sequence is. As a result, the default is to only show numbers.
But, you can comfortably work with text using Euphoria. In fact you may create variables containing text and display them without ever having to think about their numeric underpinnings:
sequence text = "This is a dead parrot." puts(1, text) -- This is a dead parrot.
A line of text is often called a string. Think of the 's' on the end of puts as a reminder that you are displaying a string. No need to think about numbers at all.
Hint: puts() works with a string, but not a nested string:
puts(1, { "dog, "elephant" } ) -- error -- sequence found inside character string
This is a common Euphoria error. It is easy to forget that puts() does not work with a nested sequence.
In general, use puts() to view text and use print() to view numbers. The output commands with a few variables:
sequence message = "What's up, Doc?" puts(1, message ) -- What's up, Doc? integer n = 17 print(1, n ) -- 17 atom pi = 3.14159 print(1, pi ) -- 3.14159
Output will be displayed on one line, each item one after another:
puts(1, message ) print(1, n ) print(1, pi ) -- What's up, Doc?173.14159
To control the line spacing use puts(1, '\n' ) where ( '\n' ) is an "invisible" code to cause the output to go to the next line.
puts(1, message ) puts(1, '\n' ) print(1, n ) print(1, '\n' ) print(1, pi ) -- What's up Doc? -- 17 -- 3.14159
Functions are available to identify the data-type of a value. They return a result of ( 0 ) for false and a value of ( 1 ) for true.
object message = "test this" ? sequence(message) -- 1 ? atom(message) -- 0 ? integer(message) -- 0 ? object(message) -- 1
Testing for data-type is important because it may change during the run of a program:
object x = "first a string" ? sequence( x ) -- 1 x = 45 ? atom( x ) -- 1
You may use integer() to determine if a value has a decimal component:
atom p = 5.55 ? integer(p) -- 0 ? atom(p) -- 1 p = 10 ? integer(p) -- 1 ? atom(p) -- 1
The four Euphoria data-types are sufficient for any programming task. You just pick the type that is most restrictive for your variable.
When you are thinking in terms of "numbers", just use the numbers you are interested in. To display these numbers you use the print() procedure:
print(1, 114.225 ) -- 114.225
If you are interested in integer values then enter the number without a fractional or decimal portion.
Characters are also numbers. The ASCII chart assigns a number value to each letter, digit and symbol we use in programming.
If you are thinking in terms of "characters", just use the characters you are interested in. You enter the character values by enclosing the character within start ( ' ) and end ( ' ) single quotes. Equally valid is to use the integer value for that character as found in the ASCII chart. To display these characters you use the puts() procedure:
puts(1, 'b' ) -- b
To emphasize that characters are indeed numbers, use print() to display their numeric value. Recall that print() displays the numeric value of any Euphoria object:
print(1, 'b' ) -- 98
Finally, use puts() to display the character value for any number.
puts(1, 114.225 ) -- r
When displaying a character value Euphoria first "ignores" the decimal--that is to say it truncates the value to an integer giving 114. Then it displays the character found in the 114 position of the ASCII chart giving 'r'. If there is no character equivalent for the number then you just get "noise."
The procedures used to display a value in no way alter the value of that object (atom or sequence). That is why you can switch between print() and puts()--nothing is altered or damaged.
The Euphoria atom is quite remarkable. It can be used to store and display values for three data-types: integer numbers, floating-point numbers, and characters. The atom eliminates the need for three disparate, data-types used in conventional languages.
Because a sequence is a collection of numeric values, it can be used to represent any kind of computer data!
The sequence does it all. Most computer languages require you to study various data-types and then to study of various data organizations. With Euphoria you only have to learn one thing--how atoms and sequences work.
The string, such as "Hello world!", is just one example of a sequence in action. The word "string" is traditionally used by programmers to suggest that words are strings of individual characters.
"Hello world!" { 'H','e','l','l','o',' ','w','o','r','l','d','!' }
The start ( " ) and end ( " ) double quotes are used as delimiters to show the beginning and end of a string of characters. The use of double quotes is a programming convenience when you want to represent a string value. The second line shows the same string, emphasizing its composition. Euphoria uses ( { ) and ( } ) to delimit any sequence. From the nature of atoms you can see that 'H' represents the first character of your string. Again from the nature of atoms, you will realize that this string value can also be represented by the following:
{ 72,101,108,111,32,119,111,114,108,100,33 }
This is the same sequence as before.
This can be demonstrated by switching between print() and puts():
puts(1, "Hello world!" ) -- Hello world! print(1, "Hello world!" ) -- { 72,101,108,111,32,119,111,114,108,100,33 }
To complete the demonstration you can show both aspects at once:
include std.e pretty_print(1, "Hello world!" ) -- {72'H',101'e',108'l',108'l',111'o',32' ',119'w',111'o', -- 114'r',108'l',100'd', 33'!'}
A string is therefore just one particular collection of numbers thanks to the organizing power of the sequence.
The pretty_print() routine has many extras beyond the default display. For example if you include the option {2} you see the string in its original form:
include std.e pretty_print(1, "Hello world!", {2} ) -- "Hello world!"
A sequence could contain only one atom, such as: { 15 }. Beware that this is not the same as the atom 15, without braces. A collection of numbers with only one element is not the same as an individual atom. It is possible to test for this difference and use it to advantage in programs.
A special case of a sequence, is the empty sequence, represented by {}.
You could use the object-type for any and all the variables in your program. This is prefectly legal and a great convenience for small programming tasks. It also means you could write huge programs using only one data-type!
It is a "good thing" to carefully select data-types to represent how your variables will be used in your program. This increases the readability of your program and helps minimize errors. Therefore it is best to chose either atom and sequences for variables rather than making everything an object.
The input of data to a variable is a good application of the object data-type. In this example it is hard to know in advance the nature of the object being entered, thus the object data-type allows either an atom or a sequence to be entered--a very flexible situation. Variables declared this way can prevent a program crash. For example you could record a persons age as either 25 or "twenty-five" using the same variable.
object age -- one person uses age = 25 -- anther person uses age = "twentyfive"
In this example the user is not forced to know the exact format needed for "age". Later, the program can detect the type and change the format needed for your program.
The object-type is universal. There are many programming situations where it is impractical to know in advance if a value will be an atom or a sequence. In these cases you must use the object type.
When you want to view the numeric value of any object you can use ( ? ) to display its contents:
? "Hello" -- { 72, 101, 108, 108, 111 } ? 454.000001 -- 454.000001
You will see the numeric value of the object and its structure if it has any.
Since we are accustomed to seeing text as text and numbers as numbers using ? x is a bit simplistic. You may alternately use pretty_print() to get a more user friendly display:
include misc.e pretty_print(1, "Hello", {} ) -- { 72'H', 101'e', 108'l', 108'l', 111'o' } pretty_print(1, 454.000001, {} ) -- 454.000001
Unlike the ( ? ) procedure, the pretty_print()} procedure is not part of the core-language. It is actually written and saved in a file called pretty.e . To use it without having to re-type it you just "include" the file as part of your program an the code becomes immediately available for your programming. Included files are used often in Euphoria to add extra features, but only when you wish to use them.
include pretty.e
pretty_print()
used for pretty display of Euphoria objects
start argument list
1,
select screen for display
"Hello",
the value to display
{}
display options, blank for now
)
end argument list
Even though an atom may represent an integer, Euphoria does provide for the integer data-type.
Computer hardware operates using binary (base 2) numbers. The familiar digital (base 10) values must be converted to binary before a computer may use them. The conversion to binary is so automatic you may not realize it is happening. Integer values are the simplest to convert to binary. Computers work best when calculating with integers; working with is integers faster and more efficient. Some decimal values may only be approximated in binary. Decimal values, called floating-point require more computer resources than integers. For this reason programmers often specify the integer data-type in the interest of efficiency.
Integers are used for counting, iteration, enumeration and similar tasks. The Euphoria integer is limited to values from -1073741824 to +1073741823 inclusive.
If you need to work with larger integers, then you will have use the atom data-type. This allows you to use integer values up to about 15 digits in length. Beyond that size the atom automatically converts values to the floating-point format for larger numbers.
You must use the atom-type when interfacing with C-code, even when an integer-value is expected.
Attempting to assign a floating-point value to an integer variable will result in an error message.
integer count count = 33.9 -- type-check error
You may even invent your own data-types. For example when counting votes in an election you can be very sure that negative values should not be included. You can invent your own data-type, lets call it votes, so that only positive values are permitted.
needs fixing
type votes( integer x ) if x > 0 then return x end if end type
You then use your new data-type to declare variables as before:
votes major major = 450
If by accident you assign a negative value to major then Euphoria will trigger an error message.
Creating user data-types can serve as a form of self documentation in a program. They can help prevent errors when you work on a large projects, or when several people work on the same project.
TOM Programmers generally choose names for their variables that are meaningful--they document what the variable is used for. Variable names can be arbitrarily long. They can contain both letters and numbers, but they have to begin with a letter. Although it is legal to use uppercase letters, often we don't--probably because programmers are too lazy to push the shift key. If you do, remember that case matters. Bruce and bruce are different variables. The underscore character ( _ ) can appear in a name. It is often used in names with multiple words, such as my_name or price_of_tea_in_china. If you give a variable an illegal name, you get a syntax error:
object 76trombones 76trombones = "big parade" -- a name is expected here -- object 76 trombones
object more$ more$ = 1000 -- SyntaxError: invalid syntax
object while while = "2 for one sale is on" -- SyntaxError: invalid syntax
76trombones is illegal because it does not begin with a letter. more$ is illegal because it contains an illegal character, the dollar sign. But what's wrong with while? It turns out that while is one of the Euphoria keywords. Keywords are are for the exclusive use of the Euphoria interpreter. Keywords are used to define the language's rules and structure, and they cannot be used for any other purpose.
Euphoria has the following keywords:
TOM list must be updated
if , end , then , procedure , else , for , return , do , elsif , while , type , constant , to , and , or , exit , function , global , by , not , include , with , without , xor
In addition there are words reserved for built-in routines. It is "best" not to use them in your programming.
length , puts , integer , sequence , position , object , append , prepend , print , printf , clear_screen , floor , getc , gets , get_key , rand , repeat , atom , compare , find , match , time , command_line , open , close , trace , getenv , sqrt , sin , cos , tan , log , system , date , remainder , power , machine_func , machine_proc , abort , peek , poke , call , sprintf , arctan , and_bits , or_bits , xor_bits , not_bits , pixel , get_pixel , mem_copy , mem_set , c_proc , c_func , routine_id , call_proc , call_func , poke4 , peek4s , peek4u , profile , equal , system_exec , platform
If you use a reserved identifier for your own special use, you have then done an override on that identifier. Special syntax is then needed to use that identifier for its original intent.
TOM list must be updated
You might want to keep this list handy. If the interpreter complains about one of your variable names and you don't know why, see if it is on this list.
In time you will end up memorizing the list, but until then I would suggest that you take advantage of a feature provided in many development environments: syntax-highlighting. As you type, different parts of your program should appear in different colors. For example, keywords might be blue, strings red, and other code black. If you type a variable name and it turns blue, watch out! You might get some strange behavior from the interpreter.
A command is an instruction that the Euphoria interpreter can execute. We have seen two kinds of commands: output and assignment.
When you type a command into your program, Euphoria executes it and displays the result, if there is one. The result output command is the display of a value. Assignment and declaration commands don't produce a visible result.
A program usually contains a sequence of commands. If there is more than one command, they are processed one at a time as the commands execute.
For example, the program:
print(1, 1 ) object x = 2 puts(1, '\n' ) print(1, x )
produces the output
1 2
Again, the assignment command produces no visible output.
Anything following ( -- ) marks a user comment, the two dashes and everything that follows is ignored by the Euphoria interpreter. Comments do not slow down your program. Comment lines let you document what your program is doing.
-- this is a comment -- comments are ignored by the interpreter
The ( -- ) comment lines are also used to create Euphoria documentation. The eudoc and creole programs are used to extract documentation from Euphoria source-code and produce documentation formatted much like what you are reading now.
An expression result is a combination of values, variables, and operators.
Operators are special symbols that represent computations like addition and multiplication. The values the operator uses are called operands The following are all legal Euphoria expressions whose meaning is more or less clear:
20+32 hour-1 hour*60+minute minute"/"60 (5+9)*(15-7)
The symbols ( + ), ( - ), and ( / ), and the use of parenthesis for grouping, mean in Euphoria what they mean in mathematics. The asterisk ( * ) is the symbol traditionally used for multiplication in Euphoria and other computer languages.
For exponents you use the power() function, for example 42 is written:
? power( 4, 2 ) -- 16
The power() function also works on sequences.
? power( { 2, 4, 6 }, 2 ) -- { 4, 16, 36 }
There is no "power" operator. This is in the interest of making the interpreter more efficient--power operations the least used and so are supported by a function only.
When a variable name appears in the place of an operand, it is replaced with its value before the operation is performed.
In Euphoria addition, subtraction, multiplication, division and exponentiation all behave like you would expect.
When more than one operator appears in an expression, the order of evaluation depends on the rules of precedence.
The precedence order is best determined by a chart:
-- highest precedence parentheses procedure function type unary - unary + not * / + - & < > <= >= = != and or xor { , , , } -- lowest precedence
The chart is similar to what you would "expect" from mathematical precedence rules. The chart is similar to what you "expect" in other languages. There lies the potential trap! Beware of subtle differences!
The only way to be sure is to follow the precedence order in the chart. In the event of ambiguity or uncertainty, clarify your intent with the generous use of parentheses (i.e. round ( ) brackets ). Parentheses have the highest precedence and can be used to force an expression to evaluate in the order you want.
Since expressions in parentheses are evaluated first:
2*(3-1) -- equals 4 power( (1+1), (5-2) ) -- equals 8
Parentheses may be used just to make an expression easier to read:
(minute * 100)/60 -- parentheses do not alter result
When you have operators at the same level of precedence, such as * / , the expression is evaluated from left to right:
6/3*5 -- equals 2*5 -- does not equal 6/15
The higher level of precedence always comes before lower precedence operators:
2*3-1 -- equals 4 -- does not equal 4 2/3-1 -- equals -0.3333 -- does not equal 1
The ( = ) used in an assignment command is not the same as ( = ) used as a comparison operator. You will see that the context in which they are used is enough to prevent ambiguity.
You can perform mathematical operations on strings . After all, strings are sequences, which are just arrangements of numbers.
Performing math on a string is generally not very meaningful unless you are doing cryptography or compression on the string. One interesting application is shown by:
? 'A' - 'a' -- 32
This is the way to inter-convert upper and lower case characters.
While arithmetic operations apply to strings as with any sequence, they are not generally useful.
There is a special operator ( & ) which performs concatenation which which can join sequences (or strings) together:
sequence first = "throat" sequence second = "warbler" puts(1, first & second ) -- throatwarbler
sequence one = { 2, 4, 8 } sequence two = { 8908.9, 234.234 } print(1, one & two ) -- { 2, 4, 8, 8908.9, 234.234 }
There is a function that performs repetition on a sequence (or string):
sequence afew = repeat( "hello", 3 ) puts(1, afew ) -- hellohellohello
sequence copies = repeat( 5, 4 ) ? copies -- { 5, 5, 5, 5 } sequence many = repeat( { -3, 'a', "dog" }, 2 ) pp(1, many ) -- { {-3, 'a', "dog"}, {-3, 'a', "dog"} }
Do not confuse concatenation with addition, nor confuse repetition with multiplication.
Concatenation is not the same as addition or subtraction. You may add (or subtract) two sequences if they both have the same ( one to one ) number of elements:
sequence a = { 2, 4, 1 } sequence b = { 10, 20, 60 } ? a + b -- { 12, 24, 61 }
You may multiply (or divide) a sequence by a number:
sequence more = { 3, 6, -0.1 , { 44, 1 } ) ? more * 2 -- { 6, 12, -0.2, { 88, 2 } }
The operations that are valid on a string are the same ones that apply to sequences in general. Euphoria was designed not to have "special cases." This is another reason Euphoria is easy to learn--everything works the same way all the time.