C and related programming languages

Hayo Thielecke www.cs.bham.ac.uk/~hxt

Who should take this course

There are a number of reasons for learning about C:

  1. C and C++ are widely used in industry
  2. Systems-level programming is done in C
  3. iPhone and iPad apps are written in Objective-C, which is essentially C with an OO layer on top
  4. C programming gives you a much better understanding of what really goes when a program runs, and if it is efficient
  5. C is important in computer security due to vulnerabilities such as buffer overflows and double free
  6. Programming C can be fun if you like solving puzzles. With pointers you can write very small programs that do non-trivial things.

What this course is not

This course is not one of those “C in 7 Days by Cutting and Pasting” books. Not all C constructs will be covered, let alone all of C++. Instead, C will be seen in a wider computer science context.

                                                   

What you need to know

The minimum prerequisite for this course is some basic imperative programming. For instance, you should know how to write the factorial function via recursion or a loop.

For simple programs of this kind, Java and C are almost identical, so we will not cover this material again.

Syllabus

  1. Strings, arrays and pointers; string operations and buffer overflow
  2. Pointers and linked lists; doubly-linked lists
  3. Trees in C: struct and union, enum and switch
  4. Memory management; stack vs heap allocation
  5. Writing a simple memory allocator
  6. Polymorphism, structures and function pointers
  7. Languages design and differences between C/C++ and Java
  8. C++ classes and objects
  9. C++ templates
  10. Object orientation in C++ versus Objective-C (optional, time permitting)

Introduction

In this course, we first concentrate on C before moving on to some of C++. C gives us the ability to write code close to the hardware, as needed in systems programming. C++ adds features for structuring large programs. As C is (essentially) a subset of C++, what is said about C also applies to C++. We will concentrate on code that you cannot write in Java.

The themes of this course are:

  1. pointers
  2. polymorphism
  3. parsimonious memory usage
  4. paranoia about buffer overflows

C syntax has become part of the pop culture, so much so that almost everyone can write bad C code. Parts of the syntax of C are such that an obfuscated C contest has been running for many years. We will mostly avoid going into any dark corners of the syntax.

C was never intended for beginners and works best when you understand how the compiler works.

Our first C program is Hello World, which your IDE may generate for you automatically:

#include <stdio.h>      /* printf etc */

int main(int argc, char * argv[])

{

printf("Hello again, World!\n");

return 0;

}

At first sight, the code looks very much as it would in Java. We have a main function that takes the parameters from the command line as an array called argv. But if we look closer into how this array is passed, we see differences to Java. In fact, this is our first example of pointers in C,  and pointers will be one of the main topics of this course.

The type constructor for pointers is written as *.

Pointers are not integers. p + 10 makes sense, but p + p  does not,

Here is a function that uses pointers for copying strings.

void stringcopy(char * to, char * from, int len)

{

    int i = 0;

    while ((i < len) && *from)

        *to++ = *from++;

    *to = '\0';

}

String copying without bounds checks is the cause of buffer overflows.

Stack vs heap

We can create pointers into the stack using the address operator &. Specifically, this is used to achieve call-by-reference in functions such as scanf. To use pointers into the stack safely, we need to be aware of how the call stack works.

Doubly linked lists

Doubly linked lists are widely used in systems programming (such as the Linux kernel and malloc). They have the advantage that the list can be manipulated very efficiently, for instance removing a node from the middle of the list.

The use of a void pointer gives us a certain amount of polymorphism.

struct doublylinked

{

    void * data;

    struct doublylinked * prev;

    struct doublylinked * next;

};

This is also an example of a recursive type. The structure tag sets up the recursion.

Trees from structures and unions

The syntax of types is one of the less polished parts of C, perhaps because C evolved from the untyped language B and evolved types as an afterthought. In modern languages like CAML and Haskell, there is a clean way to build up expression for complex types from simpler ones. By contrast, C uses a curiously inside-out syntax that takes some getting used to.

In C, postfix operators bind more tightly than prefix. This matters for type syntax:

char *argv[] for an array of pointers

char *f() for a function returning a pointer

char (*f)() for a pointer to function

{ string s; float f; }

string AND float

string OR float

(6.7.2.1) struct-or-union-specifier:

struct-or-union identifieropt { struct-declaration-list } struct-or-union identifier

                                        

(6.7.2.1) struct-or-union:

struct

union

Here the first identifier names the structure. The whole phrase forms a specifier, which works like primitive types like int.

Various forms of trees are ubiquitous in programming. A leading example is parse trees for grammars. In functional languages, we use data types to build in trees. In Java, we have to tie ourselves in knots and use the Composite patterns. In C, there is an idiom of using structures and unions, together with enum, to construct trees.

Objects in C from structures and function pointers

C is flexible enough that we can write object-oriented code, even if it is laborious. We can write C code using structures and arrays of funtion pointers that closely follows the C++ object model.

Objects and classes in C++

We will focus mainly on the use of virtual functions in C++ for polymorphism.

Java classes are very closely based on those In C++.

Inheritance has a bright future behind it. C++ was never as fundamentalist about objects as Java. It retained struct and used templates for containers.

C++ has multiple inheritance, whereas Java only allows a single base class. Multiple inheritance is approximated in Java by interfaces.

C++ makes a distinction between public and private inheritance.

Templates in C++

Templates give us the ability to write code that is parametric in a type. This is similar to the polymorphism in Haskell or ML. A leading application for templates is writing type-safe container classes.

Features that were intentionally not covered

  1. operator overloading in C++
  2. the C preprocessor, other than #include
  3. const specifier

Recommended links and books

Bjarne Stroustrup: Foundations of C++

http://www.stroustrup.com/ETAPS-corrected-draft.pdf

Draft ISO standard for C:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

C++11 standard for C: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2011/n3242.pdf

Brian W. Kernighan and Dennis M. Ritchie: The C Programming Language. Prentice Hall

James O. Coplien: Advanced C++ Programming Styles and Idioms. Addison Wesley

 

Robert C. Seacord: Secure Coding in C and C++. Addison Wesley