Week 8: Characters and Strings (1)

In the C programming language, characters and strings are fundamental data types used to store text. Here's a short summary of their basics:

  1. Character (char): In C, a character is stored using the char data type, which typically requires one byte of memory. It can represent a single character, like 'A' or 'z', and is enclosed in single quotes. The char data type can also store integer values as per the ASCII table, where each character is associated with a specific integer value.

  2. String: A string in C is essentially a sequence of characters terminated by a null character (\0). This null character indicates the end of the string. In memory, strings are stored in contiguous memory locations, effectively making them character arrays.

String literals

String literals are sequences of characters enclosed in double quotes (" "). They represent constant arrays of characters and include an implicit null terminator ('\0') at the end, which marks the end of the string. To define the string literals:

  • Character Array vs. String Literal: You can define a string using a character array or a string literal. When using a character array, you need to explicitly provide space for the null terminator. For example, char str[6] = "hello"; defines a string with 5 characters plus a null terminator.

  • String and Pointer: Alternatively, when you use a string literal like char *str = "hello";, the compiler automatically appends the null terminator. As you see, strings can be manipulated using pointers. You can use a char pointer (char *) to refer to the beginning of a string. Through pointer arithmetic and dereferencing, you can iterate over and access individual characters in the string.

Important note on the string literals:

  1. Immutable: String literals are stored in read-only sections of memory. Attempting to modify the content of a string literal results in undefined behavior. For instance, char *str = "hello"; str[0] = 'H'; is not allowed.

  2. Type: The type of a string literal is char [N], where N is the number of characters in the literal including the null terminator. However, when a string literal is used to initialize a pointer, the pointer is of type char *.

  3. Sharing: Compilers may optimize storage of string literals by making identical string literals share the same memory location. This is allowed because string literals are immutable.

  4. Usage: String literals are used for initializing arrays of characters and pointers to characters, and as arguments to functions that expect strings.

  5. Escape Sequences: String literals can include escape sequences, such as (newline), (tab), \\ (backslash), \" (double quote), and others, to represent special characters.

  6. Concatenation: Adjacent string literals are automatically concatenated by the C compiler. For example, "Hello, " "world!" is treated as a single string literal "Hello, world!".

  7. Lifetime: The lifetime of a string literal is the entire execution of the program, meaning they exist from program start to program termination.

#include <stdio.h>

int main() {
    const char *greeting = "Hello, World!\n";
    printf("%s", greeting);
    return 0;
}

Integer to represent character

in C, you can return integers to represent characters, as each character has an associated integer value according to the ASCII (American Standard Code for Information Interchange) table. Each character is represented by a unique integer value. For example, the character 'A' is represented by the integer 65, and 'Z' is represented by 90.

#include <stdio.h>

int main() {
    for (char ch = 'A'; ch <= 'Z'; ch++) {
        printf("The integer representation of %c is: %d\n", ch, (int)ch);
    }
    return 0;
}

Character-handling library

The character-handling library in C provides a set of functions that are used to classify and transform individual characters. These functions are part of the standard library and are included via the <ctype.h> header file. Here's a brief overview of some of the key functions available in the character-handling library:

  1. Character Classification Functions: These functions check whether a character belongs to a particular category, such as a digit, an alphabetic character, a space, etc. Here are some examples:

    • isalpha(int c): Checks if the character is an alphabetic character (a-z, A-Z).

    • isdigit(int c): Checks if the character is a digit (0-9).

    • isalnum(int c): Checks if the character is an alphanumeric character (either a digit or an alphabetic character).

    • isspace(int c): Checks if the character is a white-space character (space, tab, newline, etc.).

    • isupper(int c): Checks if the character is an uppercase letter.

    • islower(int c): Checks if the character is a lowercase letter.

  2. Character Conversion Functions: These functions are used to convert characters from one form to another:

    • toupper(int c): Converts a character to its uppercase equivalent if it is a lowercase letter; otherwise, the character is returned unchanged.

    • tolower(int c): Converts a character to its lowercase equivalent if it is an uppercase letter; otherwise, the character is returned unchanged.

  3. Others: There are additional functions in the <ctype.h> library that provide more specific character checks, such as isxdigit(int c), which checks if a character is a hexadecimal digit, and ispunct(int c), which checks if a character is a punctuation character.

These functions typically take an int as an argument and return an int as well. The return value is usually non-zero (true) if the character meets the specified condition and zero (false) otherwise. It's important to note that these functions expect an unsigned char value or EOF as an input. Passing a signed char value that is not representable as unsigned char or any value outside of unsigned char and EOF can lead to undefined behavior.

Playing with character-handling library

#include <stdio.h>
#include <ctype.h>

int main() {
    char ch1 = 'A';
    char ch2 = '3';
    char ch3 = 'b';
    char ch4 = ' ';
    char ch5 = 'G';
    char ch6 = 'h';

    // Using isalpha - checks if the character is alphabetic
    if (isalpha(ch1)) {
        printf("%c is an alphabetic character.\n", ch1);
    } else {
        printf("%c is not an alphabetic character.\n", ch1);
    }

    // Using isdigit - checks if the character is a digit
    if (isdigit(ch2)) {
        printf("%c is a digit.\n", ch2);
    } else {
        printf("%c is not a digit.\n", ch2);
    }

    // Using isalnum - checks if the character is alphanumeric (either alphabetic or digit)
    if (isalnum(ch3)) {
        printf("%c is an alphanumeric character.\n", ch3);
    } else {
        printf("%c is not an alphanumeric character.\n", ch3);
    }

    // Using isspace - checks if the character is a white-space character
    if (isspace(ch4)) {
        printf("The character is a white-space character.\n");
    } else {
        printf("The character is not a white-space character.\n");
    }

    // Using isupper - checks if the character is uppercase
    if (isupper(ch5)) {
        printf("%c is an uppercase character.\n", ch5);
    } else {
        printf("%c is not an uppercase character.\n", ch5);
    }

    // Using islower - checks if the character is lowercase
    if (islower(ch6)) {
        printf("%c is a lowercase character.\n", ch6);
    } else {
        printf("%c is not a lowercase character.\n", ch6);
    }

    return 0;
}

ASCII printable characters

The ASCII printable characters are those in the range of decimal values 32 to 126. These characters include the standard English letters, digits, punctuation marks, and a few miscellaneous symbols. Here's a list of the ASCII printable characters categorized by their types:

  1. Space (32):

    • Space: ' '

  2. Punctuation and Special Characters (33-47, 58-64, 91-96, 123-126):

    • ! " # $ % & ' ( ) * + , - . /

    • : ; < = > ? @

    • [ \ ] ^ _ \

    • { | } ~

  3. Digits (48-57):

    • 0 1 2 3 4 5 6 7 8 9

  4. Uppercase Letters (65-90):

    • A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

  5. Lowercase Letters (97-122):

    • a b c d e f g h i j k l m n o p q r s t u v w x y z

#include <stdio.h>
#include <ctype.h>

int main() {
    char ch = 122;

    // Cast char to unsigned char when passing it to the function
    if (isalpha((unsigned char)ch)) {
        printf("%c is an alphabetic character.\n", ch);
    } else {
        printf("%c is not an alphabetic character.\n", ch);
    }

    return 0;
}

String-Conversion Functions

String conversion functions are used to convert string data into numerical values. These functions are part of the C standard library, and they provide a robust way to parse numbers from strings, handling various formats and error conditions gracefully. Here's a summary of three commonly used string conversion functions: strtod, strtol, and strtoul.

  1. strtod (String to Double):

    • Prototype: double strtod(const char *str, char **endptr);

    • Purpose: Converts a string to a double-precision floating-point number (double).

    • The strtod function converts the initial portion of the string pointed to by str to a double representation.

    • If endptr is not NULL, a pointer to the character after the last character used in the conversion is stored in the location pointed to by endptr.

    • If no conversion is performed, zero is returned, and str is stored in the location pointed to by endptr.

    • Handles various formats, including regular decimal and scientific notation.

  2. strtol (String to Long):

    • Prototype: long int strtol(const char *str, char **endptr, int base);

    • Purpose: Converts a string to a long integer.

    • Converts the initial portion of the string pointed to by str to a long int value according to the given base, which must be between 2 and 36 inclusive, or be the special value 0.

    • The base specifies the number base for the conversion, allowing for binary, octal, decimal, and hexadecimal conversions.

    • If endptr is not NULL, strtol stores the address of the first invalid character in *endptr. If there were no digits at all, str is stored in *endptr.

    • Provides detailed error handling, including setting errno to ERANGE if the value converted is out of range.

  3. strtoul (String to Unsigned Long):

    • Prototype: unsigned long int strtoul(const char *str, char **endptr, int base);

    • Purpose: Converts a string to an unsigned long integer.

    • Similar to strtol, but converts the string to an unsigned long int value.

    • The base parameter works the same way as in strtol, allowing various numerical bases for the conversion.

    • Error handling is similar to strtol, including the setting of errno on out-of-range values.

These functions are essential for converting string data to numerical values, particularly when dealing with user input or parsing text files. They provide robust error checking and support a wide range of numerical formats, making them versatile tools for various programming scenarios in C.

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>

int main() {
    const char *doubleStr = "23.45e6";
    const char *longStr = "-1234567890";
    const char *ulongStr = "9876543210";
    char *endPtr;

    // Convert string to double
    double dValue = strtod(doubleStr, &endPtr);
    printf("String '%s' converted to double: %f\n", doubleStr, dValue);

    // Convert string to long
    long lValue = strtol(longStr, &endPtr, 10);
    printf("String '%s' converted to long: %ld\n", longStr, lValue);

    // Convert string to unsigned long
    unsigned long ulValue = strtoul(ulongStr, &endPtr, 10);
    printf("String '%s' converted to unsigned long: %lu\n", ulongStr, ulValue);

    return 0;
}

When endptr is not NULL in the strtod, strtol, or strtoul functions, it is used to store the address of the first character after the number in the string, allowing you to check where the number conversion ended. This is particularly useful for parsing strings where the number is followed by non-numeric characters.

#include <stdio.h>
#include <stdlib.h>

int main() {
    const char *str = "123.45abc";
    char *endPtr;

    // Convert string to double
    double dValue = strtod(str, &endPtr);
    printf("String '%s' converted to double: %f\n", str, dValue);
    printf("Non-numeric part: %s\n", endPtr);

    // Convert another string to long with base 16 (hexadecimal)
    const char *hexStr = "7f3xyz";
    long lValue = strtol(hexStr, &endPtr, 16);
    printf("String '%s' converted to long (base 16): %ld\n", hexStr, lValue);
    printf("Non-numeric part: %s\n", endPtr);

    // Convert string to unsigned long
    const char *uLongStr = "4294967295next";
    unsigned long ulValue = strtoul(uLongStr, &endPtr, 10);
    printf("String '%s' converted to unsigned long: %lu\n", uLongStr, ulValue);
    printf("Non-numeric part: %s\n", endPtr);

    return 0;
}

Standard Input/Output Library Functions

The standard input/output library in C, defined in the <stdio.h> header, provides a variety of functions for character and string manipulation, enabling interactions with the console or files, as well as handling formatted input and output. Here's an overview of the functions you mentioned:

  1. getchar:

    • Prototype: int getchar(void);

    • getchar is used to read the next character from the standard input (usually the console). It returns the character read as an unsigned char cast to an int or EOF on end of file or error.

  2. fgets:

    • Prototype: char *fgets(char *str, int num, FILE *stream);

    • fgets reads in at most one less than num characters from stream and stores them into the buffer pointed to by str. Reading stops after an EOF or a newline. If a newline is read, it is stored into the buffer. A terminating null byte ('\0') is stored after the last character in the buffer.

  3. putchar:

    • Prototype: int putchar(int char);

    • putchar writes a character (an unsigned char) specified by the argument char to stdout. It returns the character written as an unsigned char cast to an int or EOF on error.

  4. puts:

    • Prototype: int puts(const char *str);

    • puts writes the string str and a trailing newline to stdout. It returns a non-negative number on success, or EOF on error.

  5. sprintf:

    • Prototype: int sprintf(char *str, const char *format, ...);

    • sprintf sends formatted output to a string pointed to by str. It writes the output under the control of a format string that specifies how subsequent arguments are converted for output.

  6. sscanf:

    • Prototype: int sscanf(const char *str, const char *format, ...);

    • sscanf reads formatted input from a string. It reads data from str and stores them according to the parameter format into the locations pointed by the additional arguments. The number of successfully filled items is returned.

Example: using getchar

The getchar function in C reads the next available character from the standard input stream (stdin) and returns it as an int. This function is commonly used to read input character by character. Here's a simple example demonstrating how to use getchar to read characters until the newline character is encountered, which typically signifies the end of input:

#include <stdio.h>

int main() {
    int c;
    printf("Enter some text (press Enter to finish): ");

    while ((c = getchar()) != '\n' && c != EOF) {
        putchar(c);  // Echo the input back to the output
    }

    printf("\nYou entered the above text.\n");
    return 0;
}

Example: using fgets and putchar

// Using functions fgets and putchar
#include <stdio.h>
#define SIZE 80

void reverse(const char * const sPtr);

int main(void) {
    char sentence[SIZE] = "";
    puts("Enter a line of text:");
    fgets(sentence, SIZE, stdin); // read a line of text
    printf("\n%s", "The line printed backward is:\n"); 
    reverse(sentence);
    puts("");
}

// recursively outputs characters in string in reverse order
void reverse(const char * const sPtr) {
    // if end of the string
    if ('\0' == sPtr[0]) { // base case
        printf("N\n");
        return; 
    }   
    else {
        printf("pointer point to = %p; pointer address%p\n",(void*)sPtr,(void*)&sPtr);
        reverse(&sPtr[1]);
        putchar(sPtr[0]);
    }
}

Do you understand how the above program work? Why does the reverse text show up at the end of program together, not one-by-one character?

The putchar function in C doesn't keep things in memory in the way you might be thinking. Instead, it writes a single character to the standard output, which is typically the console or terminal. The character is displayed immediately or might be buffered by the standard output stream before being displayed, depending on the environment and buffering mode.

Here's a brief explanation:

  1. Buffering: Standard output in C can be fully buffered, line buffered, or unbuffered. If it's line buffered or fully buffered, characters sent to the standard output are stored in a buffer and are not displayed until the buffer is flushed. This flushing can happen when the buffer is full, a newline character is encountered (in line-buffered mode), or when fflush(stdout) is called explicitly. However, in unbuffered mode, characters are displayed immediately.

  2. Memory: When you use putchar, the character is sent to the output stream and not stored in any user-accessible memory location. Once putchar is called, the character is handled by the output stream, and your program doesn't maintain any reference to it.

  3. Direct Output: When putchar is invoked, the character is sent directly to the standard output. There's no mechanism provided by putchar itself to recall or retrieve characters once they've been output.

Additional example to understand putchar. Observe carefully how the characters are not displayed immediately but are held until the buffer is flushed or a newline is encountered.

#include <stdio.h>
#include <unistd.h> // For sleep function

int main() {
    printf("Printing without newline: ");
    fflush(stdout);  // Flush to ensure "Printing without newline: " is displayed before sleeping
    
    putchar('H');
    putchar('e');
    putchar('l');
    putchar('l');
    putchar('o');

    sleep(5); // Wait for 5 seconds
    fflush(stdout);  // Flush to ensure the sleep message is displayed
    
    sleep(5); // Wait for 5 seconds
    // Now let's end the line and see the output
    putchar('\n');

    printf("End of program.\n");
    sleep(5); // Wait for 5 seconds
    return 0;
}

Example: using sprintf

#include <stdio.h>

int main() {
    char buffer[100];
    int number = 25;
    float decimal = 93.5;
    char *string = "formatted string";

    // Use sprintf to format data into the buffer
    sprintf(buffer, "Integer: %d, Float: %.2f, String: %s", number, decimal, string);

    // Print the formatted string
    printf("Formatted string: %s\n", buffer);

    return 0;
}

Example: using sscanf

The sscanf function is used to read data from a string according to a specified format. It works similarly to scanf, but instead of reading from standard input, it reads from a given string. This function is very useful for parsing strings to extract data in a structured format.

#include <stdio.h>

int main() {
    char info[] = "123 456.789 Hello";
    int integerValue;
    float floatValue;
    char stringValue[50];

    // Parse the string
    sscanf(info, "%d %f %s", &integerValue, &floatValue, stringValue);

    // Display the extracted values
    printf("Integer: %d\n", integerValue);
    printf("Float: %f\n", floatValue);
    printf("String: %s\n", stringValue);

    return 0;
}

Homework - Week 8

  1. Adapt the getchar example to perform different actions based on the character read: (i) Convert lowercase letters to uppercase, (ii) Leave uppercase letters as they are, (iii) Replace digits with a '*', (iv) Ignore spaces.

  2. Use random-number generation to create sentences. Your program should use four arrays of pointers to char called article, noun, verb and preposition. Create a sentence by selecting a word at random from each array in the following order: article, noun, verb, preposition, article and noun. The arrays should be filled as follows: The article array should contain the articles "the", "a", "one", "some" and "any"; the noun array should contain the nouns "boy", "girl", "dog", "town" and "car"; the verb array should contain the verbs "drove", "jumped", "ran", "walked" and "skipped"; the preposition array should contain the prepositions "to", "from", "over", "under" and "on". Reminder that, to follow English writing rule, words should be separated by spaces. The final sentence should start with a capital letter and end with a period. Generate 20 such sentences.

Last updated