C++ (pronounced see plus plus) is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as a "middle-level" language, as it comprises a combination of both high-level and low-level language features.
Basics of C++ Programming
These pages attempt to give an overview of some basic concepts, including the structure of the C++ programming language. You can read through them below, or you can refer back to them from pages in the main tutorial when you feel you don't understand a particular topic or piece of terminology.
C++ Language Structures
Most people find learning by example easier than working from simple descriptions, so most people will find these pages most useful for reference when looking at examples elsewhere.
1. Basic Syntax: Spaces, words and punctuation in the C++ language.
2. Variables, Constants and Built-in Types: How to store data.
3. Operators and Basic Expressions: How to do math (and other things).
4. Flow Control and Statement Blocks: Repeating actions and responding to conditions.
5. Functions and Arguments: Organizing the program.
6. Modules and Scope: Dividing large programs into manageable pieces.
7. Arrays: Groups of the same type of variable.
8. Character Strings: How to deal with text.
9. Pointers: Variables which point at other variables.
10.Defining New Types: Variables beyond the basic types.
11.Classes and Objects: The basis of object-oriented programming.
12.Casts: Converting between types.
13.The Preprocessor: The preprocessing step applied to all C++ source code.
1. Basic Syntax: Spaces, words and punctuation in the C++ language.
Basic Syntax
Like English, C++ has words, space and punctuation. Also like English the number of spaces between words, or the position of line breaks (as long as they don't break a word in half) do not affect the meaning of a sentence. The compiler does not care how easy it is to read a piece of source code. Using spaces, line breaks, and indentation is for the convenience of human readers (such as yourself). However, computers are sticklers for 'proper' spelling, punctuation and, in the case of C++, capitalization. If you leave out a comma or a semi-colon, or use a lower-case letter where an upper case one should be used it will change the meaning of the code, and the compiler will probably complain. (Actually it's worse when the compiler doesn't complain, because you end up with a program that doesn't do what you wanted!)
Of course, C++ is not English. The punctuation, the use of capitalization, the words and their meanings are all different.
Tokens
In C++ a word or a single piece of punctuation is called a token. Tokens are separated by white space, which can mean any number of ordinary spaces, tab characters, or line breaks. Punctuation, like parentheses '(' ')', braces '{' '}', or commas and periods, don't need to be separated from other tokens by spaces. For example the tokens in this piece of code:
int main(int argc, char* argv[])
{
return 0;
}
Are the following, in order:
int
main
(
int
argc
,
char
*
argv
[
]
)
{
return
0
;
}
Written either way the code has the same meaning, but obviously the first one is easier to read.
Keywords
In C++ there are a relatively small set of words that have fixed meanings. These words are called keywords and they include:
The names of the basic types (int, char, float, double, bool, wchar_t) and their modifiers (short, long, signed, unsigned)
The boolean constants (true, false)
Keywords for variable and member declarations (extern, static, const, virtual, mutable, inline, auto, volatile, register, export)
Flow control constructs like loops (do, while, for), conditionals (if, else), switches (switch, case, default) and special flow control keywords (continue, break, goto)
Keywords for declaring and using new types (class, struct, union, enum, typedef, template, public, private, protected, friend, this)
Type casting keywords (static_cast, dynamic_cast, reinterpret_cast, const_cast)
Memory allocation keywords (new, delete)
Namespace and scope keywords (namespace, using)
Exception processing keywords (try, catch, throw)
Plus some others. Learning C++ is not a matter of memorizing a bunch of keywords, but it is good to be aware of reserved words like these, so that you don't try to use them as variable or function names.
Symbols
Words that are not keywords are available for use as the names of types (including classes, structures and enums), functions, variables, namespaces and such programmer-defined objects. Symbols, as such words are called, can generally consist of any combination of letters (upper-case and/or lower case), numbers and the underscore character '_'. One exception is that you can't begin a symbol name with a number. The following are all valid symbols:
x
name
nCount
caBuffer
A_long_variable_name
ReadFile
m4
_danger
The last example, that begins with an underscore, is a valid symbol. However, you should generally avoid such symbols since symbols with leading underscores are often used by libraries to avoid name conflicts (a better method is to use namespaces, but namespaces are a recent addition to C++).
Comments
In order to make your source code easier to read and understand it is useful (absolutely necessary in fact) to use comments in plain English (or whatever your native language might be) in the code to explain and clarify. Consider the difference comments make between:
WNDCLASSEX wc;
memset(&wc, 0, sizeof(WNDCLASSEX));
wc.size = sizeof(WNDCLASSEX);
and
WNDCLASSEX wc; /* Window class structure */
/* Prepare the window class structure to be filled in */
memset(&wc, 0, sizeof(WNDCLASSEX)); /* Zero clear */
wc.size = sizeof(WNDCLASSEX); /* size is the size of the structure in bytes */
There are two ways to include comments in your code. A comment can be enclosed in /* */, like the above. This type of comment can cover several lines or have ordinary C++ code on either side on the same line:
/* This is a
multi-line comment */
int main (int argc /* Argument count */, char* argv[] /* Array of arguments */)
The second way to include comments is to put two slashes (//) before a comment. After the two slashes everything up to the end of the line is ignored by the compiler.
// This is a single line comment.
int x; // A variable declaration.
2. Variables, Constants and Built-in Types: How to store data.
Variables
In math a variable is an abstract label that can represent any number. In programming a variable performs almost the same task, but the differences are important. In programming a variable is a label for a specific piece of memory where some data will be stored. A variable can be used to hold a piece of data which was input by the user, or the result of a calculation, or in fact anything that it is possible to represent as digital data.
Variable Declarations
In C and C++ a variable must be declared before it can be used. Declaring a variable tells the compiler the type of the variable and it's name along with any other special properties it has.
int x; /* x is an integer */
double dFirst, b; /* dFirst and b are double-precision floating-point numbers */
char c; /* c is a character */
The above are all examples of simple declarations. The structure of a declaration is like this:
type name-list;
The basic types available are described below. The name list is a list of one or more names separated by commas. Variable names are case sensitive (FOO, Foo and foo are all different variables) and can consist of any combination of letters or numbers and underscore characters '_' except for the first character, which cannot be a number. It is also not recommended to use the underscore as the first character, since that is used for special symbols supplied by the compiler or standard library authors.
Basic Types
Integers (int, short, long)
In math an integer is a positive or negative number with no fractional part, like -1, 0, 1, 2, 758 or -23476. In C or C++ an integer is a type that represents positive or negative whole numbers with a natural-sized word of memory. You can declare variables of the integer type using the type name int, like this:
int x; /* x is an integer variable */
When I say an integer (or "int", as C programmers often call them) is "natural-sized" I mean that it is the largest size that the computer can easily (i.e. quickly) manipulate with one simple instruction. (For example, it is generally possible to multiply two integers with a single instruction.) When programming for Win32, you are generally dealing with what is called a 32-bit machine, and that means the "natural size" for a word is 32-bits. This, in turn, means that an int can represent any number between about -2 billion and +2 billion.
By putting the keyword short in front of the int you can declare a short integer variable. You can also just use the word short by itself. Short integers are smaller than ints (but there is really not much call to use them these days). Generally a short will be 16 bits, capable of representing integers in the range -32767 to +32767.
On the other hand the keyword long declares a long integer type variable. Back in the days of 16-bit computers a long was 32 bits and an int was 16 bits (the same size as a short). Today a long is still generally 32 bits, but ints have caught up.
Unsigned
In math an integer can be either positive or negative, and normal int variables in C or C++ are the same. However, you can also declare a variable as an unsigned int, which means that it can only represent positive numbers (in the range of 0 to 4 billion or so), which doesn't make much sense in math, but that's C for you. You can also declare unsigned short int variables and unsigned long int variables.
There is also a keyword signed which can be used in the same way as unsigned to indicate that the variable can hold both positive and negative numbers. However, this is the default for ints, so it is not generally necessary.
Real Numbers (float, double)
A number with a decimal point, like 1.1, 3.141, or 6.03e24 (or, for that matter 1.000000), is called a real number in math, but in C this is called a float or possibly a double. The word "float" stands for floating point, which describes the general way these numbers are represented in memory. (What it comes down to is that they are basically represented in a kind of binary scientific notation, with a limited number of digits after the decimal point-- or maybe it should be called the binary point?) A double is just like a float except that it has more significant digits. Both types require considerably more memory to store and processing time to manipulate than ints and their relatives.
Nowadays floats are not used much, and normally you will see doubles where a real number is necessary.
Characters (char)
A character (signified by the type name char) is a type of variable which represents a letter, number, or punctuation mark (plus a few other symbols). A char variable can also represent a small integer number.
The signed and unsigned keywords can be used with char like they can with int. Whether char is by default signed or unsigned is not standardized, so you need to use one of those keywords if it is important that you know whether the variable is signed or unsigned. Signed characters can generally represent numbers in the range -128 to 127, while unsigned characters can represent numbers from 0 to 255. Ordinary English letters, numbers and punctuation marks are always represented with positive numbers.
Wide Characters (wchar_t)
Some languages, like Chinese and Japanese, cannot fit their alphabets into the 256 values available with a char. For these languages there is an extended character type called wchar_t. Unfortunately I don't know much more about it.
Strings
A string is a sequence of characters, for example a file name or a line of text from a book. There is built in string type in C, although there is one in C++. In C strings are represented as arrays of characters terminated with a 'null' character (with the value zero). For more about strings see the section on strings.
Void
Void is not a type used for actual variables directly. However it is used when declaring functions which return no value in place of a returned type. Void is also used to declare void pointers, which are variables that point at objects with any type.
Enumerations
An enumeration is a set of named constant integers. You specify an enumeration type like this:
enum eColor { black, red, green, blue, yellow, purple, white };
After you have done that you can declare a variable of the enumeration type:
enum eColor colorBackground = white;
In C you need to include the enum keyword when declaring variables of the enumeration type, in C++ you only need to include enum when specifying the enumeration, so the above variable declaration could be:
eColor colorBackground = white;
Boolean (C++ only)
In C++ there is a Boolean type, which has two possible values: true or false. The Boolean type is identified by the type name bool.
bool bFlag = false;
Boolean variables can be assigned the results of comparisons, like this:
bFlag = (x != 10);
Constants and Literals
Sometimes you don't want a variable that you can change, you just want to enter some raw data that the program can use. For example, in the ever-popular "Hello, world!" program shown below, the text "Hello, world!\n" is not a variable. Instead, this is an example of a literal, in particular, that is a piece of raw text, and a chunk of text is called a string, so a literal chunk of text is called a string literal.
int main (int argc, char* argv[])
{
printf("Hello, world!\n");
return 0;
}
There are several kinds of literals:
Character literals are single characters enclosed in single quotes, like 'x', 'A' or '\n'. (But the last one is not a single character you say? I'll get to that.)
Integer literals are just integers. They can be in regular decimal format or in hexadecimal if you put "0x" at the front. So 1999 and -28768 are integer literals in decimal, and 0xFFFF is an integer literal in hexadecimal (hexadecimal literals can't have a sign, i.e. they are in unsigned representation).
Unsigned integer literals are unsigned numbers with a 'U' at the end like 4095U.
Long integer literals are integers in the same format as regular integer literals, but followed by a letter 'L' as in 1939290L, or 0x10000000L.
Floating point literals are numbers with a decimal point like 6.4, -0.001 or 3.14123. They can also be written using scientific notation where "E" means "times ten to the power of" an integer following it. Thus 6.0E3 means 6.0 times 10 to the power of 3 (or 1000, so the final value is 6000), an 1.0e-10 means 0.0000000001.
String literals are pieces of text enclosed in quotes. "Hello, world!\n" or "Syntax error." are string literals.
Boolean literals are available only in C++. You can define a Boolean value simply using the word "true" or "false".
Here are some examples of variables being initialized using literals:
char c = 'A'; /* The character variable c contains the character "A". */
int x = 10; /* The integer variable x contains the value 10. */
long l = 2000000000L; /* The long integer variable l contains the value two billion. */
double d = 6.02e23; /* The double-precision floating point variable d contains
* the value 6.02 times ten to the power of 23. */
bool b = true; /* The boolean variable b contains the value true. */
char* message = "Whoa!"; /* The character pointer variable message points at the
* beginning of the word "Whoa!". */
That last one might be a bit tricky. Try looking at the discussion of strings in basic types above.
Escaped Characters
When declaring character or string literals you may want to include a character which cannot be included by directly typing it. For example, if you wanted to put a 'newline' character in a string you can't type something like this:
const char* szMessage = "This is not a
valid string literal."
You also couldn't include a quote character, because that would end the string literal.
const char* szInstructions = "Enclosed the name in "quotes"."; /* Won't work! */
To include special characters like this you need to use an escape code, which is a backslash (\) followed by a special code from the following list:
\n : newline
\r : carriage return
\t : tab
\v : vertical tab
\b : backspace
\f : form feed
\a : bell
\\ : backslash
\? : question mark (not usually necessary)
\' : single quote or apostrophe
\" : double quote
In addition there are two ways to specify a character using a number:
A backslash followed by one, two or three digits specifies a character using an octal number. The most common use for this is to specify a 'null character' like this : '\0'.
A backslash followed by an x and then followed by a sequence of hexadecimal digits specifies the character represented by that hexadecimal number.
Thus the above examples could be done like this (with an appropriate change in the content of the first string literal):
const char* szMessage = "This is a\nvalid string literal.";
const char* szInstructions = "Enclose the name in \"quotes\".";
Constant Variables
Sometimes you need a number in your program, but it doesn't need to be changed. In this case you can use constant variables. You declare a constant by adding the keyword const to the type name, like this:
const double pi = 3.1415923;
Since the variable is a constant, you have to initialize it with a value and you cannot change it. This kind of thing will cause an error (or at least a warning):
pi = pi * 2; /* Error: modifying const variable. */
External, Static and Automatic Variables
Generally, declaring a variable assigns the variable name to a piece of memory. However, including the keyword extern in front of the type says that "there is this variable of this type somewhere (perhaps in another module) and I want to use it." The variable name and type are known to the compiler while it reads the rest of your source file (so it won't tell you the variable does not exist), but it assumes that the actual variable will be defined, and perhaps initialized, elsewhere. When your program is linked the reference from the module that used the extern variable will be connected to the real variable in the module that defined it.
For example, consider two files, one called one.c and one called two.c. Here is the first:
/* In the file one.c */
extern int x; /* The variable x is actually defined elsewhere, but used below. */
int foo (int y)
{
return x + y; /* You can use x here, for example. This adds x and
* the supplied value, and returns the result. */
}
Here is the second file:
/* In the file two.c */
int x = 0; /* Notice how the variable is defined here. */
void bar (int z)
{
x += z; /* This adds a value to x and saves it in x. */
return;
}
In yet a third file main could do this:
int main (int argc, char* argv[])
{
int y = 1;
int z = 5;
/* foo returns x + y, and x is zero now. */
printf ("y is %d and x + y is %d\n", y, foo(y));
/* bar adds z to x, so x becomes 5 (the current value of z). */
bar (z);
/* Now x + y is 6. */
printf ("now x + y is %d\n", foo(y));
return 0;
}
The output of the program would be
y is 1 and x + y is 1
now x + y is 6
Notice how you don't need to add any special keyword where the variable is defined. Variables defined outside of the body of any function are automatically accessible as external variables. If you leave the extern off the declaration in the file one.c, it becomes an external definition, and the linker will complain, because there are two externally accessible variables with the same name.
The keyword static, on the other hand, declares a variable which cannot be used as an external variable from another file. If a variable is static then you can define a variable with the same name in two files, like this:
/* In the file one.c */
static int x = 0; /* This x is defined here, and used below. */
int foo (int y)
{
return x + y; /* You can use x here, for example. This adds x and
* the supplied value, and returns the result. */
}
Here is the second file:
/* In the file two.c */
static int x = 0; /* This x is defined here and is separate from the
* one above. */
void bar (int z)
{
x += z; /* This adds a value to the x defined in this file and
* saves it in that x. */
return;
}
This time main could do this:
int main (int argc, char* argv[])
{
int y = 1;
int z = 5;
/* foo returns x + y, and x is zero now. (meaning the x in one.c) */
printf ("y is %d and x + y is %d\n", y, foo(y));
/* bar adds z to x (in two.c), so x becomes 5 (the current value of z). */
bar (z);
/* But bar didn't change x in one.c, so x + y is still 1, not 6. */
printf ("now x + y is %d\n", foo(y));
return 0;
}
The output of the program would be
y is 1 and x + y is 1
now x + y is 1
In fact, extern and static don't work just for variables, they also work for functions.
There is a third class of variable, neither static nor extern, called automatic variables. Automatic variables are variables declared inside a function, unless the keyword extern or static was used to declare the variable (i.e. variables declared inside a function are automatic by default). Automatic variables only exist from the point where they are declared until the end of the function or statement block (a set of statements enclosed in '{' and '}') where they were declared. After the function (or statement block) ends, the variable is thrown away and its value is lost (and the memory it took up is free for use by other automatic variables in other functions).
In C, all the variables you use in a statement block (including the body of a function) have to be declared at the start of the block. In C++ you can declare variables anywhere in the function. The following fragment of code is fine in C++, but incorrect in C.
int x = 1;
int y = 2;
printf ("x + y is %d\n", x, y);
int z; /* Create a new variable z. */
z = x + y; /* Assign the sum of x and y to the variable z. */
printf ("z is %d\n", z);
To be correct in C you would have to move the definition of z to the front of the statement block like this.
{ /* Beginning of the block */
int x = 1;
int y = 2;
int z;
/* ... */
printf ("x + y is %d\n", x, y);
z = x + y; /* Assign the sum of x and y to the variable z. */
printf ("z is %d\n", z);
/* ... */
} /* End of the block. */
The arguments of a function are also automatic variables. You don't need to declare them inside the function body, because they are already declared in the argument list.
In C++ you can also declare variables inside the initializing statement of a for loop, like this:
for (int i = 0; i <= MAX_INDEX; i++)
{
// ... do some processing ...
}
Such variables are available until the end of the for loop, (that is, inside the body of the loop) but not afterwards.