Week 8: Characters and Strings (1)
In the C programming language, characters and strings are fundamental data types used to store text. Here's a short summary of their basics:
Character (char): In C, a character is stored using the
char
data type, which typically requires one byte of memory. It can represent a single character, like 'A' or 'z', and is enclosed in single quotes. Thechar
data type can also store integer values as per the ASCII table, where each character is associated with a specific integer value.String: A string in C is essentially a sequence of characters terminated by a null character (
\0
). This null character indicates the end of the string. In memory, strings are stored in contiguous memory locations, effectively making them character arrays.
String literals
String literals are sequences of characters enclosed in double quotes (" "
). They represent constant arrays of characters and include an implicit null terminator ('\0'
) at the end, which marks the end of the string. To define the string literals:
Character Array vs. String Literal: You can define a string using a character array or a string literal. When using a character array, you need to explicitly provide space for the null terminator. For example,
char str[6] = "hello";
defines a string with 5 characters plus a null terminator.String and Pointer: Alternatively, when you use a string literal like
char *str = "hello";
, the compiler automatically appends the null terminator. As you see, strings can be manipulated using pointers. You can use a char pointer (char *
) to refer to the beginning of a string. Through pointer arithmetic and dereferencing, you can iterate over and access individual characters in the string.
Important note on the string literals:
Immutable: String literals are stored in read-only sections of memory. Attempting to modify the content of a string literal results in undefined behavior. For instance,
char *str = "hello"; str[0] = 'H';
is not allowed.Type: The type of a string literal is
char [N]
, whereN
is the number of characters in the literal including the null terminator. However, when a string literal is used to initialize a pointer, the pointer is of typechar *
.Sharing: Compilers may optimize storage of string literals by making identical string literals share the same memory location. This is allowed because string literals are immutable.
Usage: String literals are used for initializing arrays of characters and pointers to characters, and as arguments to functions that expect strings.
Escape Sequences: String literals can include escape sequences, such as (newline), (tab),
\\
(backslash),\"
(double quote), and others, to represent special characters.Concatenation: Adjacent string literals are automatically concatenated by the C compiler. For example,
"Hello, " "world!"
is treated as a single string literal"Hello, world!"
.Lifetime: The lifetime of a string literal is the entire execution of the program, meaning they exist from program start to program termination.
Integer to represent character
in C, you can return integers to represent characters, as each character has an associated integer value according to the ASCII (American Standard Code for Information Interchange) table. Each character is represented by a unique integer value. For example, the character 'A' is represented by the integer 65, and 'Z' is represented by 90.
Character-handling library
The character-handling library in C provides a set of functions that are used to classify and transform individual characters. These functions are part of the standard library and are included via the <ctype.h>
header file. Here's a brief overview of some of the key functions available in the character-handling library:
Character Classification Functions: These functions check whether a character belongs to a particular category, such as a digit, an alphabetic character, a space, etc. Here are some examples:
isalpha(int c)
: Checks if the character is an alphabetic character (a-z, A-Z).isdigit(int c)
: Checks if the character is a digit (0-9).isalnum(int c)
: Checks if the character is an alphanumeric character (either a digit or an alphabetic character).isspace(int c)
: Checks if the character is a white-space character (space, tab, newline, etc.).isupper(int c)
: Checks if the character is an uppercase letter.islower(int c)
: Checks if the character is a lowercase letter.
Character Conversion Functions: These functions are used to convert characters from one form to another:
toupper(int c)
: Converts a character to its uppercase equivalent if it is a lowercase letter; otherwise, the character is returned unchanged.tolower(int c)
: Converts a character to its lowercase equivalent if it is an uppercase letter; otherwise, the character is returned unchanged.
Others: There are additional functions in the
<ctype.h>
library that provide more specific character checks, such asisxdigit(int c)
, which checks if a character is a hexadecimal digit, andispunct(int c)
, which checks if a character is a punctuation character.
These functions typically take an int
as an argument and return an int
as well. The return value is usually non-zero (true) if the character meets the specified condition and zero (false) otherwise. It's important to note that these functions expect an unsigned char
value or EOF
as an input. Passing a signed char value that is not representable as unsigned char
or any value outside of unsigned char
and EOF
can lead to undefined behavior.
Playing with character-handling library
ASCII printable characters
The ASCII printable characters are those in the range of decimal values 32 to 126. These characters include the standard English letters, digits, punctuation marks, and a few miscellaneous symbols. Here's a list of the ASCII printable characters categorized by their types:
Space (32):
Space:
' '
Punctuation and Special Characters (33-47, 58-64, 91-96, 123-126):
! " # $ % & ' ( ) * + , - . /
: ; < = > ? @
[ \ ] ^ _ \
{ | } ~
Digits (48-57):
0 1 2 3 4 5 6 7 8 9
Uppercase Letters (65-90):
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
Lowercase Letters (97-122):
a b c d e f g h i j k l m n o p q r s t u v w x y z
String-Conversion Functions
String conversion functions are used to convert string data into numerical values. These functions are part of the C standard library, and they provide a robust way to parse numbers from strings, handling various formats and error conditions gracefully. Here's a summary of three commonly used string conversion functions: strtod
, strtol
, and strtoul
.
strtod (String to Double):
Prototype:
double strtod(const char *str, char **endptr);
Purpose: Converts a string to a double-precision floating-point number (double).
The
strtod
function converts the initial portion of the string pointed to bystr
to adouble
representation.If
endptr
is notNULL
, a pointer to the character after the last character used in the conversion is stored in the location pointed to byendptr
.If no conversion is performed, zero is returned, and
str
is stored in the location pointed to byendptr
.Handles various formats, including regular decimal and scientific notation.
strtol (String to Long):
Prototype:
long int strtol(const char *str, char **endptr, int base);
Purpose: Converts a string to a long integer.
Converts the initial portion of the string pointed to by
str
to along int
value according to the givenbase
, which must be between 2 and 36 inclusive, or be the special value 0.The
base
specifies the number base for the conversion, allowing for binary, octal, decimal, and hexadecimal conversions.If
endptr
is notNULL
,strtol
stores the address of the first invalid character in*endptr
. If there were no digits at all,str
is stored in*endptr
.Provides detailed error handling, including setting
errno
toERANGE
if the value converted is out of range.
strtoul (String to Unsigned Long):
Prototype:
unsigned long int strtoul(const char *str, char **endptr, int base);
Purpose: Converts a string to an unsigned long integer.
Similar to
strtol
, but converts the string to anunsigned long int
value.The
base
parameter works the same way as instrtol
, allowing various numerical bases for the conversion.Error handling is similar to
strtol
, including the setting oferrno
on out-of-range values.
These functions are essential for converting string data to numerical values, particularly when dealing with user input or parsing text files. They provide robust error checking and support a wide range of numerical formats, making them versatile tools for various programming scenarios in C.
When endptr
is not NULL
in the strtod
, strtol
, or strtoul
functions, it is used to store the address of the first character after the number in the string, allowing you to check where the number conversion ended. This is particularly useful for parsing strings where the number is followed by non-numeric characters.
Standard Input/Output Library Functions
The standard input/output library in C, defined in the <stdio.h>
header, provides a variety of functions for character and string manipulation, enabling interactions with the console or files, as well as handling formatted input and output. Here's an overview of the functions you mentioned:
getchar
:Prototype:
int getchar(void);
getchar
is used to read the next character from the standard input (usually the console). It returns the character read as an unsigned char cast to an int or EOF on end of file or error.
fgets
:Prototype:
char *fgets(char *str, int num, FILE *stream);
fgets
reads in at most one less thannum
characters fromstream
and stores them into the buffer pointed to bystr
. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0'
) is stored after the last character in the buffer.
putchar
:Prototype:
int putchar(int char);
putchar
writes a character (an unsigned char) specified by the argument char to stdout. It returns the character written as an unsigned char cast to an int or EOF on error.
puts
:Prototype:
int puts(const char *str);
puts
writes the stringstr
and a trailing newline to stdout. It returns a non-negative number on success, or EOF on error.
sprintf
:Prototype:
int sprintf(char *str, const char *format, ...);
sprintf
sends formatted output to a string pointed to bystr
. It writes the output under the control of a format string that specifies how subsequent arguments are converted for output.
sscanf
:Prototype:
int sscanf(const char *str, const char *format, ...);
sscanf
reads formatted input from a string. It reads data fromstr
and stores them according to the parameterformat
into the locations pointed by the additional arguments. The number of successfully filled items is returned.
Example: using getchar
The getchar
function in C reads the next available character from the standard input stream (stdin
) and returns it as an int
. This function is commonly used to read input character by character. Here's a simple example demonstrating how to use getchar
to read characters until the newline character is encountered, which typically signifies the end of input:
Example: using fgets
and putchar
fgets
and putchar
Do you understand how the above program work? Why does the reverse text show up at the end of program together, not one-by-one character?
The putchar
function in C doesn't keep things in memory in the way you might be thinking. Instead, it writes a single character to the standard output, which is typically the console or terminal. The character is displayed immediately or might be buffered by the standard output stream before being displayed, depending on the environment and buffering mode.
Here's a brief explanation:
Buffering: Standard output in C can be fully buffered, line buffered, or unbuffered. If it's line buffered or fully buffered, characters sent to the standard output are stored in a buffer and are not displayed until the buffer is flushed. This flushing can happen when the buffer is full, a newline character is encountered (in line-buffered mode), or when
fflush(stdout)
is called explicitly. However, in unbuffered mode, characters are displayed immediately.Memory: When you use
putchar
, the character is sent to the output stream and not stored in any user-accessible memory location. Onceputchar
is called, the character is handled by the output stream, and your program doesn't maintain any reference to it.Direct Output: When
putchar
is invoked, the character is sent directly to the standard output. There's no mechanism provided byputchar
itself to recall or retrieve characters once they've been output.
Additional example to understand putchar. Observe carefully how the characters are not displayed immediately but are held until the buffer is flushed or a newline is encountered.
Example: using sprintf
sprintf
Example: using sscanf
sscanf
The sscanf
function is used to read data from a string according to a specified format. It works similarly to scanf
, but instead of reading from standard input, it reads from a given string. This function is very useful for parsing strings to extract data in a structured format.
Homework - Week 8
Adapt the
getchar
example to perform different actions based on the character read: (i) Convert lowercase letters to uppercase, (ii) Leave uppercase letters as they are, (iii) Replace digits with a'*'
, (iv) Ignore spaces.Use random-number generation to create sentences. Your program should use four arrays of pointers to char called article, noun, verb and preposition. Create a sentence by selecting a word at random from each array in the following order: article, noun, verb, preposition, article and noun. The arrays should be filled as follows: The article array should contain the articles "the", "a", "one", "some" and "any"; the noun array should contain the nouns "boy", "girl", "dog", "town" and "car"; the verb array should contain the verbs "drove", "jumped", "ran", "walked" and "skipped"; the preposition array should contain the prepositions "to", "from", "over", "under" and "on". Reminder that, to follow English writing rule, words should be separated by spaces. The final sentence should start with a capital letter and end with a period. Generate 20 such sentences.
Last updated