Dealing with Text Data in C++

Copyright C. S. Tritt, Ph.D.

October 11, 2000

 

Dealing with text data is problematic in nearly all computer languages. This is particularly true of C++ due, in part, to its power and flexibility. There are at least three ways to represent character data in modern C++. These are: as single characters, as null terminated character arrays (C strings) and as Standard Template Library (STL) strings.

 

Single character representation is built into C++ by way of the char type. This variable type can hold only a single character. Literal constants of the char type are denoted using single quotes. For example, 't' is the letter t. The following line would declare and define a character variable, char letter = 'A'.  Character values can be compared with the normal relational operators (For example, if (letter == 't')… is valid C++) and used in switch constructs.

 

The traditional way to represent multi-character text in C++ is by using C strings. These are null terminated arrays of characters. Declaration of this type variable is done as follows: char words[20]; which would set aside 20 character storage locations. Literal string constants are indicated using double quotes. For example, char text[] = "Some words"; and if (name[0] == 'A')…. The null character required at the end of the valid contents of C strings is generally appended automatically. The <cstring> (formerly <string.h>) header file must be included to use the C string functions.

 

One disadvantage of character arrays is that if an attempt is made store more characters in the array than there are storage locations the extra characters may overwrite other variables. When calculating the size of C strings, always remember to include space for the null character. Individual characters in C strings can be accessed as array elements using the [] operator symbol. For example, fi = fname[0]; would place the first letter of the C string fname into the char variable fi. Another problem with C strings is the equality operator, ==, compares string locations not string contents. One of the string compare functions described below must be used to compare string contents.

 

Common C string functions are listed on the in many C++ textbooks. Particularly useful C string functions are listed in the following table.

 

char* strcpy (char* target, const char* source)

Copies contents of the source string into the target string. The value of target is returned.

char* strncpy(char* target, const char* source, int n)

Copies at most n characters from source to target. The value of target is returned.

char* strcat(char* target, const char* source)

Appends source to target. The value of target is returned.

char* strncat(char* target, const char* source, int n)

Appends at most the first n characters from source to target. The value of target is returned.

int strcmp(const char* s1, const char* s2)

Compares contents of s1 with the contents of s2. Returns 0 if they are the same, less than 0 if s1 is less than s2 or greater than 0 if s1 is greater than s2.

int strncmp(const char* s1, const char* s2, int n)

Compares up to n characters from s1 with the contents of s2. Returns 0 if they are the same, less than 0 if s1 is less than s2 or greater than 0 if s1 is greater than s2.

int strlen(const char* s)

Returns the  length (number of characters preceding the terminating null character) of s.

 

The modern way to manipulate textual data in C++ programs is to use the STL string class. The header file <string> must be included to make use of this class. For example: string fullname = "John Doe"; would create the string fullname and store the text John Doe in it.

 

Individual characters in STL strings can be accessed using the [] operator symbol. For example, fi = fname[0]; would place the first letter of the STL string fname into the char variable fi. STL string operators include: [], =, >>, <<, +, ==, !=, <, <=, > and >=. STL string elements are numbered starting at 0.

 

Particularly useful string class member functions and operators are listed in the following table.

 

string(int size)

Constructor. Argument size is optional but recommended. Note lower case s.

int .length()

Returns the length of the string.

string .substr(int start, int size)

Returns the substring starting at location start of length size.

string& .insert(int n, string s)

Inserts a copy of s into string starting at position n. The rest of the original string is shifted right.

string& .erase(int from, int to)

Removes characters from position from to through position to from the string. Moves the rest of the string to the left. Returns the modified string.

int .find(string ss)

Returns the starting position of the first occurrence of substring ss.

getline(istream is, string s)

Places next line from is into s. The string extractor (>>) only gets "words". Not actually a member function.

 

For example, fullname = fullname.insert(miloc, mi); would insert the contents of string mi (middle initial) into the string fullname at int location miloc.

 

Note that C strings and STL strings are fundamentally different. However, in current versions of C++ the comparison operators (==, >=, etc.) can be used to compare the contents of C strings and STL strings.