Strings

A String is a sequence of characters. In older versions of Python a character was represented by a 7-bit ASCII code. The 7 bits allowed us to represent 128 unique characters. This was sufficient to represent all the upper and lower case letters of the alphabet of English, all the punctuation marks, and digits. There were even enough symbols left over to represent control characters that could represent end of line, or spaces and tabs. The 7 bits were padded to the left by a single 0 bit and a character came to be represented by a byte. Since computer programming is an international activity, the letters in the alphabet of other languages had to be included. In the newer versions of Python, characters are represented by 16 bit Unicode. ASCII forms a subset of Unicode.

String Creation

A string literal is defined within single or double quotes.

  firstName = 'Alan'
  lastName = "Turing"
To read string input from the console, the function raw_input() has to be used. You can also use the function input() provided you supply the quotation marks in your input.
  name = raw_input ("What is your name? ")
  print name

  What is your name? Alan Turing
  Alan Turing

  OR

  name = input ("What is your name? ")
  print name

  What is your name? "Alan Turing"
  Alan Turing

String Indexing

You can think of a string as a sequence of characters. The length of a string is given by the operator len. The length gives the number of characters in a string including blank spaces. The index of a character in a string gives its position in the string where the first character has an index of 0 and the last character has an index of (length - 1). Python also allows for negative indexing. The index -1 represents the last character, and the index -2 represents the last but one character, and so on. You can use a for loop to iterate through the string one character at a time.

  str = "Hello World"
  print str[8]
  print str[-5]
  print len (str)
  
  for ch in str:
    print ch,

  r
  W
  11
  H e l l o  W o r l d

Concatenation and Repetition

The + symbol is the concatenation operator. And the * symbol is the repetition operator.

  str = "spam" + "a" + "lot"
  print str

  str = 2 * "spam" + "a" + "lot" * 3
  print str

  spamalot
  spamspamalotlotlot

String Slicing

You can slice a string into substrings. Python provides an easy way to slice strings. You must provide the starting index and the ending index, like so:

  start = 2
  end = 9
  subStr = str[start:end]
The substring that Python returns will contain all the characters from the start index and up to but not including the character at the end index. If you omit the start index then Python will return all the characters starting with the first character. If you omit the end index, then Python will return all the characters to the end of the original string.

String Library

Python has an extensive number of string functions that are stored in a string library. To use these functions, this library has to be included in your program. You can do this explicitly by writing the following statement at the very beginning of your program:

  import string
Here is the actual reference to the string library.

Function Meaning
capitalize() Returns a copy of the string with only its first character capitalized.
center (width) Returns a copy of the string centered in another string of length width.
count (sub) Returns the number of occurrences of substring sub.
endswith (suffix) Returns True if the string ends with the specified suffix and False otherwise.
find (sub) Returns the lowest index in the string where the substring sub is found and -1 if it is not found.
isalnum () Returns True if all the characters are alphanumeric and there is at least one character, and False otherwise.
isalpha () Returns True if all the characters in the string are alphabetic and there is at least one character, and False otherwise.
isdigit () Returns True if all the characters in the string are digits and there is at least one character, and False otherwise.
islower () Returns True if all alphabetic charactes are in lower case, and there is at least one character, and False otherwise.
isspace () Returns True if there are only white space characters, and there is at least one character, and False otherwise.
isupper () Returns True if all alphabetic characters are in upper case and there is at least one character, and False otherwise.
join (seq) Returns a string that is a concatenation of elements of the sequence seq.
ljust (width) Returns a string of length width with the original string left justified in it.
lower () Returns a copy of the string converted to lowercase.
lstrip () Returns a string with leading whitepace characters removed.
replace (old, new) Returns a copy of the string with all occurences of the substring old replaced with new.
rfind (sub) Returns the highest index in the string where substring sub is found and -1 if the substring is not found.
split ([sep]) Returns a list of substrings of the string using the sep as the delimiter.
startswith (prefix) Returns True if the string starts with the prefix and False otherwise.
strip () Returns a copy of the string with the leading and trailing characters removed.
swapcase () Returns a copy of the string with uppercase characters converted to lower case and vice versa.
upper () Returns a copy of the string converted to uppercase.

String Related Functions

Strings are immutable, i.e. once created they cannot be changed. Even though you have functions like replace() that give the appearance of changing characters in a string, the reality is that the original string is untouched and new copy with the replacements is returned. If the orginal variable is assigned the address of the new string, then the space in memory occupied by the old string is reclaimed by the garbage collector.

Internally, the characters in a string are represented in binary code. Python allows you to get the numerical value of that binary code using the function ord(). It also allows you to convert a valid numerical value to a character using the chr() function.

  print ord ('5')
  print chr (75)

  53 
  K
You can also force Python to evaluate a string as if it were an expression by using the eval() function. For example, "2 + 3" is a string. However, doing eval ("2 + 3") will return the result of the expression 2 + 3, i.e. 5. Similarly you can convert an expression into a string by using the str() function. To convert the literal floating point number 3.14 into a string you do str (3.14).

String Formats

The % symbol in arithmetic operations represents the modulo or remainder operation. The % operator is also used to indicate the format in which an output is going to be printed out. The general syntax for formatted output is as follows:

  format-string % (val1, .., valn)
The format-string not only has the specifications for the each of the variables but also any additional output that you would like to add. The placement and number of format specifications must match the order and number of variables that you wish to print.

The general form of a format specifier is:

  % [flag][width][precision] type 
Each format specifier must start with the % sign, followed by an optional flag, width field, and an optional precision that begins with a period. The type is not optional.

Since width is the amount of space (measured in number of characters) that will be allocated to print out a variable, if the width denoted is smaller than needed, Python will expand the width to use just the right amount of space. When you do not know the size of the variable to be printed it is best to give the width a value of 0. When the width is larger than the number of characters to be printed out, the value of the variable is right justified. To fill the empty spaces to the left with zeroes on the left add a 0 flag to the left of the width. To left justify place a negative sign as a flag before the width.

The flags are:
Flag Meaning
# Value conversion will use alternate form.
0 Zero padded
- Left justified
+ The sign character (+ or -) will precede the value

The conversion types are:
Type Meaning
d or i Signed integer decimal
o Unsigned octal
u Unsigned decimal
x or X Unsigned hexadecimal (lower / upper case)
e or E Floating point exponential format
f or F Floating point decimal format
g or G Floating point format. Uses exponential format if less than precision and decimal format otherwise
c Single character
s String

Formatting Examples

x = 1234

>> print "The value of x is %4d." % (x)
The value of x is 1234.

>> print "The value of x is %+-6d." % (x)
The value of x is +1234 .

>> print "The value of x is %4o in octal." % (x)
The value of x is 2322 in octal.

>> print "The value of x is %0x in hex." % (x)
The value of x is 4d2 in hex.

pi = 3.14159265358979323846

>> print "pi is %0.4f" % (pi)
pi is 3.1416

>> print "pi is %0.4e" % (pi)
pi is 3.1416e+00

>> print "pi is %0.7g" % (pi)
pi is 3.141593

>> print "sigma is %0.4g" % (sigma)
sigma is 5.671e-05

ch = 97
>> print "The character is \"%c\"" % (ch)
The character is "a"

lumberJack = "Michael Palin"
>> print "%s gave one rendition of the Lumberjack Song." % (lumberJack)
Michael Palin gave one rendition of the Lumberjack Song.