Dealing with Text Data in C++
Copyright C. S. Tritt, Ph.D.
October 11, 2000
Dealing with text data is problematic in nearly all computer languages. This is particularly true of C++ due, in part, to its power and flexibility. There are at least three ways to represent character data in modern C++. These are: as single characters, as null terminated character arrays (C strings) and as Standard Template Library (STL) strings.
Single character representation is built into C++ by way of the char type. This variable type can hold only a single character. Literal constants of the char type are denoted using single quotes. For example, 't' is the letter t. The following line would declare and define a character variable, char letter = 'A'. Character values can be compared with the normal relational operators (For example, if (letter == 't')… is valid C++) and used in switch constructs.
The traditional way to represent multi-character text in C++ is by using C strings. These are null terminated arrays of characters. Declaration of this type variable is done as follows: char words[20]; which would set aside 20 character storage locations. Literal string constants are indicated using double quotes. For example, char text[] = "Some words"; and if (name[0] == 'A')…. The null character required at the end of the valid contents of C strings is generally appended automatically. The <cstring> (formerly <string.h>) header file must be included to use the C string functions.
One disadvantage of character arrays is that if an attempt is made store more characters in the array than there are storage locations the extra characters may overwrite other variables. When calculating the size of C strings, always remember to include space for the null character. Individual characters in C strings can be accessed as array elements using the [] operator symbol. For example, fi = fname[0]; would place the first letter of the C string fname into the char variable fi. Another problem with C strings is the equality operator, ==, compares string locations not string contents. One of the string compare functions described below must be used to compare string contents.
Common C string functions are listed on the in many C++ textbooks. Particularly useful C string functions are listed in the following table.
|
Copies contents of the source string into the target
string. The value of target is returned. |
char* strncpy |
Copies at most n characters from source to target.
The value of target is returned. |
char*
strcat(char* target, const char* source) |
Appends source to target. The value of target
is returned. |
char* strncat(char* target, const char* source |
Appends at most the first n characters from source
to target. The value of target is returned. |
int
strcmp(const char* s1, const char* s2) |
Compares contents of s1 with the contents of s2.
Returns 0 if they are the same, less than 0 if s1 is less than s2
or greater than 0 if s1 is greater than s2. |
int strncmp(const char* s1, const char* s2, |
Compares up to n characters from s1 with the
contents of s2. Returns 0 if they are the same, less than 0 if s1
is less than s2 or greater than 0 if s1 is greater than s2. |
int
strlen(const char* s) |
Returns the length
(number of characters preceding the terminating null character) of s. |
The
modern way to manipulate textual data in C++ programs is to use the STL string
class. The header file <string> must be included to make use of this
class. For example: string fullname = "John Doe";
would create
the string fullname and store the text John Doe in it.
Individual
characters in STL strings can be accessed using the [] operator symbol. For
example, fi = fname[0]; would place the first letter of the STL
string fname into the char variable fi. STL string operators
include: []
, =
, >>
, <<
, +
,
==
, !=
, <
,
<=
, >
and >=
. STL string elements are numbered starting at 0.
Particularly useful string class member functions and operators are listed in the following table.
|
Constructor. Argument size is optional but
recommended. Note lower case s. |
|
Returns the length of the string. |
|
Returns the substring starting at location start of
length size. |
|
Inserts a copy of s into string starting at
position n. The rest of the original string is shifted right. |
|
Removes characters from position from to through
position to from the string. Moves the rest of the string to the left.
Returns the modified string. |
|
Returns the starting position of the first occurrence of
substring ss. |
|
Places next line from is into s. The string
extractor (>>) only gets "words". Not actually a member
function. |
For example,
fullname =
fullname.insert(miloc, mi);
would insert the contents of string mi
(middle initial) into the string fullname at int location miloc.
Note that C strings and STL strings are
fundamentally different. However, in current versions of C++ the comparison
operators (==, >=, etc.) can be used to compare the contents of C strings and
STL strings.