Character string manipulation library

July, 2008

Introduction

This is a string library that is intended to be compatible with the class string library in the C++ standard. My version is for strings of characters of type char only.

It is for people who do not have access to an official version of the string library or wish to use a version without templates.

It follows the standard class string as I understand it, except that a few functions that are relevant only to the template version are omitted, and all the functions involving iterators are omitted.

I use the name String rather than string to prevent conflicts with other string libraries (as in BC 5.0).

The initial version was taken from Tony Hansen's book The C++ answer book, but very little of Tony's code remains.

Permission is granted to use/modify/distribute this. If you distribute it or put it on your web site please include a link to my site. If you distribute a modified version please make it clear which bits are mine and which are yours. I take no responsibility for errors, omissions etc, but please tell me about them.

This library links into my exception package. If you are using a very old compiler, you may need to edit the file include.h to determine whether to use simulated exceptions or compiler supported exceptions or simply to disable exceptions. More information on the exception package is given in the documentation for my matrix library, newmat11.

The package uses a limited form of copy-on-write (see Tony Hansen's book for more details) and also attempts to avoid repeated reallocation of the string storage during a multiple sum. This results in some saving in space and time for some operations at the expense of an increase in the complexity of the program and an increase in the time used by a few operations. As with newmat it is still an open question whether the extra complexity is really warranted. Or under what circumstances it is really warranted.

This package includes simple functions for manipulating strings and a class for extracting information from the command line.

It also includes class libraries to help format numerical output and to edit ASCII files. They documented in separate files.

Files in this package

The following files are included in this package

str.h header file for the string library
str.cpp function bodies
str_fns.h header file for string functions
str_fns.cpp string functions bodies
commline.h command line class header
commline.cpp command line bodies
myexcept.h header for the exceptions simulator
myexcept.cpp bodies for the exceptions simulator
include.h options header file (see documentation in newmat11)
strtst.cpp test program
strtst.dat data file used by test program
strtst.txt output from the test program
test_exs.cpp test exceptions
test_exs.txt output from test_exs
readme.txt readme file
string.htm this file
rbd.css style sheet for use with htm files
st_gnu.mak make file for gnu c++
st_cc.mak make file for CC compiler
st_b55.mak make file for Borland C++ 5.5
st_b56.mak make file for Borland C++ 5.6
st_b58.mak make file for Borland C++ 5.8
st_m6.mak make file for Visual C++ version 6 or 7
st_m8.mak make file for Visual C++ version 8
st_i8.mak make file for Intel compiler for Windows, v8,9
st_i10.mak make file for Intel compiler for Windows, v10
st_il8.mak make file for Intel compiler for Linux, v8,9,10
st_ow.mak Make file for Open Watcom compiler
str.lfl library file list for make file generator
st_targ.txt target file for make file generator
format.h header file for format program
format.cpp bodies for format program
formtest.cpp test program for format program
formtest.txt output from test program
format.htm documentation for format program
gstring.h header file for gstring ascii file editor
gstring.cpp bodies for gstring program
liststr.cpp bodies for gstring program
lstst.cpp test program
fox.dat test data file
lstst.dat test data file
lstst.txt output from test program
gstring.htm documentation for gstring program

Testing and getting started

I have tested this program on recent versions of the Borland, Microsoft, Gnu, Intel, Sun, Open Watcom compilers.

You may need to edit include.h - but it will probably work for you as is. See the newmat documentation for more information about editing include.h.

Activate the _STANDARD_ option to use the form of include statements used in standard C++ (automatic for recent versions of Borland, Microsoft, Gnu and Intel compilers).

Activate the use_namespace to put the string library in namespace RBD_STRING.

The GString library which is included in this package uses nested classes and will not compile under older compilers.

Some CC compilers generate 33 error messages when running the strtst test program. I suspect these are due to a slightly different convention in deleting temporaries and don't matter.

For the indexes, lengths etc I use unsigned int (typedefed to uint). This is instead of size_type in the official package.

You will need to #include files include.h and str.h in your programs that use this package. Don't forget to edit include.h to determine whether exceptions are to be used, simulated or disabled. If you use the simulated exceptions you should turn off the exception capability of a compiler that does support exceptions.

I have included make files for a variety of compilers for compiling the test programs. Make files for some other compilers can be generated using my genmake utility. The file st_targ.txt gives the list of targets for genmake and str.lfl has the list of names of the libraries. See the genmake documentation for more details about the make files.

The public member functions

Static variable

static uint npos String::npos is the largest possible value of uint and is used to indicate that a find function has failed to find its target. All Strings must have length strictly less than String::npos

Constructors, destruction, operator=

String() construct a String of zero length
String(const String&str) copy constructor (not explicitly in standard)
String(const String&str, uint pos, uint n = npos) construct a String from str starting at location pos (first location = 0) and continuing for the length of the String or for n characters, whichever occurs first
String(const char* s, uint n) construct a String from s taking a maximum of n characters or the length of the String
String(const char* s) construct a String from s
String(uint n, char c) construct a String consisting of n copies of the character c
~String() the destructor
String& operator=(const String& str) copy a String (except that it may be able to avoid copying)
String& operator=(const char* s) set a String equal to a c-style character string pointed to by s
String& operator=(const char c) set a String equal to a character

Storage control

uint size() const the length of the String (does not include a trailing zero - in most cases there isn't one)
uint length() const same as size
uint max_size() const the maximum size of a String, I have set it to npos-1
void resize(uint n, char c = 0) change the size of a String, either by truncating or filling out with copies of character c (std does default separately)
uint capacity() const the total space allocated for a String (always >= size())
void reserve(uint res_arg = 0) change the capacity of a String to the maximum of res_arg and size(). This may be an increase or a decrease in the capacity.
void clear() erase the contents of the string
bool empty() const true if the String is empty; false otherwise

Character access

char operator[](uint pos) const return the pos-th character; return 0 if pos = size()
char& operator[](uint pos) return a reference to the pos-th character; undefined if pos>=size() - I throw an exception. This reference may become invalid after almost any manipulation of the String
char at(uint n) const same as operator[] const
char& at(uint n) same as operator[]. Throw an exception of pos >=size()

The editing functions

For conditions under which references and pointers to data are invalidated by these functions see policy on reallocation.

String& operator+=(const String& rhs) append rhs to a String
String& operator+=(const char* s) append the c-string defined by s to a String
String& operator+=(char c) append the character c to a String
String& append(const String& str) append str to a String
String& append(const String& str, uint pos, uint n) append String(str,pos,n)
String& append(const char* s, uint n) append String(s,n)
String& append(const char* s) append String(s)
String& append(uint n, char c) append character c
void push_back(char c) operator+=(c)
String& assign(const String& str) replace the String by str (this function is not explicitly in the standard)
String& assign(const String& str, uint pos, uint n) replace the String by String(str,pos,n)
String& assign(const char* s, uint n) replace the String by String(s, n)
String& assign(const char* s) replace the String by String(s)
String& assign(uint n, char c) replace the String by String(c)
String& insert(uint pos1, const String& str) insert str before character pos1
String& insert(uint pos1, const String& str, uint pos2, uint n) insert String(str,pos2,n) before character pos1
String& insert(uint pos, const char* s, uint n = npos) insert String(s,n) before character pos (std does default separately)
String& insert(uint pos, uint n, char c) insert character c(s,n) before character pos
String& erase(uint pos = 0, uint n = npos) erase characters starting at pos and continuing for n characters or till the end of the String. This was originally called remove
String& replace(uint pos1, uint n1, const String& str) erase(pos1,n1); insert(pos1,str)
String& replace(uint pos1, uint n1, const String& str, uint pos2, uint n2) erase(pos1,n1); insert(pos1,str,pos2,n2)
String& replace(uint pos, uint n1, const char* s, uint n2 = npos) erase(pos,n1); insert(pos,s,n2); (std does default separately)
String& replace(uint pos, uint n1, uint n2, char c) erase(pos,n1); insert(pos,n2,c)
uint copy(char* s, uint n, uint pos = 0) const copy a maximum of n characters from a string starting at position pos to memory starting at location given by s. Return the number of characters copied. I assume that the program has already allocated space for the characters
void swap(String&) a.swap(b) swaps the contents of Strings a and b. The standard also provides for a function swap(a,b) - see binary operators

Pointer to data

const char* c_str() const return a pointer to the contents of a String after appending (char)0 to the String. This pointer will be invalidated by almost any operation on the String
const char* data() const return a pointer to the contents of a String. This pointer will be invalidated by almost any operation on the String

The find functions

uint find(const String& str, uint pos = 0) const find the first location of str in a String starting at position pos. The location is relative to the beginning of the parent String. Return String::npos if not found
uint find(const char* s, uint pos, uint n) const find(String(s,n),pos)
uint find(const char* s, uint pos = 0) const find(String(s),pos)
uint find(const char c, uint pos = 0) const find(String(1,c),pos)
uint rfind(const String& str, uint pos = npos) const find the last location of str in a String starting at position pos. ie begin the search with the first character of str at position pos of the target String. The location is relative to the beginning of the parent String. Return String::npos if not found
uint rfind(const char* s, uint pos, uint n) const rfind(String(s,n),pos)
uint rfind(const char* s, uint pos = npos) const rfind(String(s),pos)
uint rfind(const char c, uint pos = npos) const rfind(String(1,c),pos)
uint find_first_of(const String& str, uint pos = 0) const find first of any element in str starting at pos. Return String::npos if not found
uint find_first_of(const char* s, uint pos, uint n) const find_first_of(String(s,n),pos)
uint find_first_of(const char* s, uint pos = 0) const find_first_of(String(s),pos)
uint find_first_of(const char c, uint pos = 0) const find_first_of(String(1,c),pos)
uint find_last_of(const String& str, uint pos = npos) const find last of any element in str starting at pos. Return String::npos if not found
uint find_last_of(const char* s, uint pos, uint n) const find_last_of(String(s,n),pos)
uint find_last_of(const char* s, uint pos = npos) const find_last_of(String(s),pos)
uint find_last_of(const char c, uint pos = npos) const find_last_of(String(1,c),pos)
uint find_first_not_of(const String& str, uint pos = 0) const find first of any element not in str starting at pos. Return String::npos if not found
uint find_first_not_of(const char* s, uint pos, uint n) const find_first_not_of(String(s,n),pos)
uint find_first_not_of(const char* s, uint pos = 0) const find_first_not_of(String(s),pos)
uint find_first_not_of(const char c, uint pos = 0) const find_first_not_of(String(1,c),pos)
uint find_last_not_of(const String& str, uint pos = npos) const find last of any element not in str starting at pos. Return String::npos if not found
uint find_last_not_of(const char* s, uint pos, uint n) const find_last_not_of(String(s,n),pos)
uint find_last_not_of(const char* s, uint pos = npos) const find_last_not_of(String(s),pos)
uint find_last_not_of(const char c, uint pos = npos) const find_last_not_of(String(1,c),pos)

The substring function

String substr(uint pos = 0, uint n = npos) const return String(*this, pos, n)

The compare functions

int compare(const String& str) const a.compare(b) compares a and b in normal sort order. Return -1, 0 or 1
int compare(uint pos, uint n, const String& str) const a.compare(pos,n,b) compares String(a,pos,n) and b in normal sort order. Return -1, 0 or 1
int compare(uint pos1, uint n1, const String& str, uint pos2, uint n2) const a.compare(pos1,n1,b,pos2,n2) compares String(a,pos1,n1) and String(b,pos2,n2) in normal sort order. Return -1, 0 or 1
int compare(const char* s) const return compare(String(s))
int compare(uint pos1, uint n1, const char* s, uint n2 = npos) const return compare(pos1, n1, String(s,n2))

The binary String functions

+ means concatenate, otherwise the meanings are obvious.

String operator+(const String& lhs, const String& rhs)
String operator+(const char* lhs, const String& rhs)
String operator+(char lhs, const String& rhs)
String operator+(const String& lhs, const char* rhs)
String operator+(const String& lhs, char rhs)
bool operator==(const String& lhs, const String& rhs)
bool operator==(const char* lhs, const String& rhs)
bool operator==(const String& lhs, const char* rhs)
bool operator!=(const String& lhs, const String& rhs)
bool operator!=(const char* lhs, const String& rhs)
bool operator!=(const String& lhs, const char* rhs)
bool operator<(const String& lhs, const String& rhs)
bool operator<(const char* lhs, const String& rhs)
bool operator<(const String& lhs, const char* rhs)
bool operator>(const String& lhs, const String& rhs)
bool operator>(const char* lhs, const String& rhs)
bool operator>(const String& lhs, const char* rhs)
bool operator<=(const String& lhs, const String& rhs)
bool operator<=(const char* lhs, const String& rhs)
bool operator<=(const String& lhs, const char* rhs)
bool operator>=(const String& lhs, const String& rhs)
bool operator>=(const char* lhs, const String& rhs)
bool operator>=(const String& lhs, const char* rhs)
void swap(const String& A, const String& B)

The stream functions - slightly rough implementation as yet:

istream& operator>>(istream& is, String& str)

   ... read token from istream

ostream& operator<<(ostream& os, const String& str)

   ... output a String

istream& getline(istream is, String& str, char delim = '\n')

   ... read a line

The policies

Reallocation policy

This section discusses under what circumstances the String data in a String object will be moved. It is unclear to me what the standard allows. Moving the String data invalidates the const char* returned by .data() and .c_str() and any reference returned by the non-const versions of .at() or operator[] (and any iterators referring to the string).

I describe here what my program does. Another standard String package may (and probably does) follow different rules.

The value returned by .c_str will most likely become invalid under almost any operation of the String which changes the value of the String. Also a call to .c_str will invalidate a const char* returned by .data() and any reference returned by .at() or operator[].

If A is a String that has been assigned a capacity with the reserve function then the following functions will not cause a reallocation (so the value returned by .data() etc. will remain valid)

   A += ...
   A.assign(...)
   A.append(...)
   A.insert(...)
   A.erase(...)
   A.replace(...)

where ... denotes a legitimate argument, providing the resulting String will fit in the assigned capacity (as set by a call to reserve).

If the resulting String will not fit into the assigned capacity the String data will be moved (so the value returned by .data() etc. will not remain valid). Also the String will no longer be regarded as having an assigned capacity.

The concept of having an assigned capacity is important in considering the behaviour of assign, erase and replace when the parameters are such that length of the String is reduced. For example

   String A = "0123456789";
   A.reserve(1); // will set capacity to A.size() = 10
   const char* d = A.data();
   A.erase(1,9);

will leave a valid value in d whereas

   String A = "0123456789";
   const char* d = A.data();
   A.erase(1,9);

will not leave a valid value in d since the storage of the String data will have been moved.

The operator= does not conform to these rules. A = something will always remove any assigned capacity for A (and will not pick up any capacity from the something).

In this package A.reserve() or A.reserve(0) will remove any assigned capacity. i.e. it will be as though no capacity had ever been assigned. So an erase or a replace that changes a length will cause a reallocation.

But don't expect anyone else's package to follow these rules.

Policy on operator+, operator+= and append

The evaluation of the concatenation expression A+B is delayed until the expression is used or until the value is referred to twice. This means the expressions such as A+B+C are evaluated in one sweep rather than having A+B formed as a temporary before evaluating A+B+C.

Unfortunately, this means that in expressions such as A + c_string the c-string c_string will be converted to a String object, before the overall String is formed. Since c-strings will usually be small I don't see this as a serious problem.

Likewise A+=X or A.append(X) will not be evaluated until the result is used (unless A has been assigned a capacity that is large enough to accommodate X). This means that sequences like

   A += X1;
   A += X2;
   ...

will not cause repeated reallocations of the space used by the String data.

String functions

These are a set of simple functions for manipulating strings. You need the header file str_fns.h and body file str_fns.cpp.

String ToString(int i) Convert int to string
String ToString(long i) Convert long to string
String ToString(double f, int ndec = 4) Convert double to string; ndec determines the number of decimal places
void UpperCase(String& S) Convert string to upper case
void LowerCase(String& S) Convert string to lower case
bool IsInt(const String& S) Does a string represent an integer?
bool IsFloat(const String& S) Does a string represent a floating point number (includes integer, does allow for E format)?
inline bool Contains(const String& S, const String& str)
inline bool Contains(const String& S, const char* s)
inline bool Contains(const String& S, char c)
Does S contain str, s or c, respectively?
inline bool ContainsAnyOf(const String& S, const String& str)
inline bool ContainsAnyOf(const String& S, const char* s)
inline bool ContainsAnyOf(const String& S, char c)
Does S contain any of the characters of str, s or c, respectively?
inline bool ContainsOnly(const String& S, const String& str)
inline bool ContainsOnly(const String& S, const char* s)
inline bool ContainsOnly(const String& S, char c)
Does S contain only characters of str, s or c, respectively?
int sf(String& S, const String& s1, const String& s2);
int sl(String& S, const String& s1, const String& s2);
int sa(String& S, const String& s1, const String& s2);
Suppose S contains a contains a copy of s1. The function sf replaces the first copy by s2, sl replaces the last copy and sa replaces all copies. Return number of changes (0 or 1 for sf and sl).

Command line class

This is a simple class for extracting the information from the command line (when you call a program from a text window). See the genmake program as an example. I assume you call your program with a command like

   program -options A B C

where program is the name of the program, options is a sequence of single letter options with no spaces and A B C is a sequence of names separated by spaces.

Start your main program with

   #include "str.h"
   #include "commline.h"
   
   int main(int argc, char** argv)
   {
      CommandLine CL(argc, argv);
      ...

Here are the member functions for the CommandLine class.

CommandLine(int argc, char** argv) Constructor: argc, argv from main(int argc, char** argv)
int argc() Return argc
char** argv() Return argv
String GetArg(int i) Get the i-th name; i=1 for first name after options
String GetOptions() Get option sequence
int NumberOfArgs() Return number of arguments excluding options
bool Options() True if there are options
bool HasOption(const String& s) True if options has any character in s
bool HasOptionCI(const String& s) Case independent version of HasOption

To do list

 

History

August, 1998 changes

September, 1998 changes

July, 2001 changes

January, 2002 changes

April, 2004 changes

June, 2004 changes

May, 2005 changes

April, 2006 changes

July, 2008 changes