------------------------------------------------------------------------------ 1.0 INTRODUCTION 1.1 Identification This document is Copyright 1991, Expert Solutions Australia Pty. Ltd. Permission is hereby granted for copying of this document (or parts thereof) provided it is not for commercial gain and provided ESA is acknowldedged as its source. 1.2 Introduction This document is a guide to what is considered good style for C code. Style covers general rules to be applied when designing and writing code, and actually can be applied to any programming language. The document is structured into a number of general headings, and within each heading is a list of recommendations. 1.3 Referenced Documents C Coding Standard, ESA Internal Document, 1990. 1.4 Philosophy and Objectives The suggestions in this document cover how code should be designed and written, as opposed to how the syntactic elements should be laid out, which is the scope of the C Coding Standard. A company standard style has the advantage that the intent and workings of a piece of code are easier to grasp because the way things are done are familiar. The recommendations in this document are a distillation of the experience of ESA staff, summarized to provide a suggested style with reasons why it is considered 'a good thing'. 1.5 Definition of Terms The definitions used in the C Coding Standard are used in this document. 2.0 Comments 2.1 General Sooner or later, and more likely sooner, someone is going to have to maintain your code. They will need to understand what it is trying to do, how it is doing it, and why it is doing it the way it is - so document these decisions in the code as you make them. 2.2 Inline Comments vs. Summary Comments Inline comments are small comments in front of the piece of code they are describing. They make that piece of code easier to understand. Summary comments are large comments at the start of the function they are describing. They make the function as a whole easier to understand. These should be descriptive - an explanation in English of what the function is trying to do and how it is doing it. Consistently use one or the other (or even use both, though this may increase the work maintaining the code, although descriptive summary comments should not need changing unless the design of the function changes). 3.0 Automatic Variables 3.1 Initialization Avoid initializing an automatic variable where it is defined. Instead, initialize it closer to where the decision is made about what its value will be, or where it is used. This makes it easier to see that you have considered all paths through the code, or that the variable is initialized correctly, e.g. NO YES int max = 0; int max; : : : max = 0; for (i = 0; i < n; i++) for (i = 0; i < n; i++) { { if (vec [i] > max) if (vec [i] > max) { { max = vec [i]; max = vec [i]; } } } } In the second example, it is more obvious to someone looking at the for loop (which may be a long way from the definition of max) that max is set to a minimum value. 3.2 Error Code / Return Value Avoid initializing a return value variable to an obscure default value for the reasons given above, and also because it may happen that it is not set explicitly when it should be - it can be hard to determine that you have correctly covered all the possible cases, e.g. NO YES Status status = NO_ENT; Status status; if (find_name(label)) if (find_name(label)) { { : : status = OK; status = OK; } } else /* not found */ { status = NO_ENT; } The second example more obviously covers both cases correctly. 4.0 Control Flow 4.1 Nested if statements Use nested if statements if there are alternative actions (i.e. there is something in the else clause), or if something done by a successful evaluation of the condition has to be undone. Don't use nested if statements if there are actions only in the if clause, e.g. NO status = delta_create((Callback) NULL, &delta); if (status == NDB_OK) { status = delta_record_condition(delta, ...); if (status == NDB_OK) { status = delta_field_condition(delta, ...); if (status == NDB_OK) { status = delta_field_condition(...); if (status == NDB_OK) { status = delta_commit(delta, ...); } } } (VOID) ndb_destroy_delta(delta); } return status; YES status = delta_create((Callback) NULL, &delta); if (status == NDB_OK) { if ( (status = delta_record_condition(...)) == NDB_OK) && (status = delta_field_condition(...)) == NDB_OK) && (status = delta_field_condition(...)) == NDB_OK)) { status = delta_commit(delta, ...); } (VOID) ndb_destroy_delta(delta); } 4.2 Multiple Returns vs One Return Having a single return statement at the end of a function has the advantage that there is a single, known point which is passed through at the termination of execution of the function. It forces you to write structured code, using if-then-else statements 'properly'. It does mean that the code can become deeply indented, but if this becomes a problem, you probably should break the function into a number of sub-functions. Avoid using multiple returns if the code can be just as clearly written with one return. If you think that the multiple return version would be superior, it may be tolerated, e.g. NO for (i = 0; i < max; i++) { if (vec [i] == key) { return TRUE; } } return FALSE; YES found = FALSE; for (i = 0; i < max && !found; i++) { if (vec [i] == key) { found = TRUE; } } return found; The latter has the advantage that if we later on decide that there is more to do after the search, we just put it between the for loop and the return, whereas with the former we have to modify the for loop somehow. 4.3 Break statements For the same reasons as given above, avoid using break statements where boolean variables can be used, e.g. the second example above could be written to break out of the loop when a match is found, and the test i < max used instead of the variable found - this would be less clear and so harder to maintain. 5.0 Pointers 5.1 NULL NULL is the null, or invalid, pointer. You can assume it is #defined to the literal value 0, although some machines (none we use regularly) define it to be ((char*) 0) - they are brain-dead. 0 is the only literal pointer value, and its internal representation may not be the same as the integer literal 0 (although it is on the machines we use regularly). So, for the compiler to use the right representation, it must know that the 0 is to be treated as a pointer - this is usually achieved by assigning it to, or comparing it with, a pointer variable. Implicit conversion to a pointer does not, and cannot occur (in a pre-ANSI compiler) - this is why you must cast NULL to a pointer when it is passed as a function argument (unless you know the code is intended only for ANSI compliant compilers and you have correctly prototyped the function being called). o Do not cast NULL when it is being assigned to, or compared with, a pointer - the compiler will get it right, and you may not. o Do cast NULL when it is being passed as a function argument - the non-ANSI compiler won't get it right, so you'd better. o Do not use 0 as the literal null pointer - the compiler will get it right, but make it obvious to us mere mortals by using NULL. o Do not use NULL as a literal integer - it may be defined to something other than 0, and this is very confusing anyway. o Do not do an implicit pointer comparison with 0 - the compiler will get it right, but make it obvious by comparing it against NULL. o Do not do an implicit integer comparison with 0 - the compiler will get it right, but make it obvious by comparing it against 0. * Note that by "implicit comparison with 0" I mean "if (i) ..." and "if (!i) ...". If i is not intended to be a boolean, it can be difficult working out what the test is really trying to do (it is not glaringly obvious that "if (!i) ..." is true when i is zero), and it indicates that you are not handling all the cases properly (do you mean `i > 0' or `i < 0' or `i != 0'). 6.0 Data Structures 6.1 Using struct A data structure defines the information about some thing (aka an object). All information about an object should be grouped into a struct to obviate the fact that they relate to the same thing, e.g. NO YES GLOBAL Int width; GLOBAL struct GLOBAL Int height; { Int width; Int height; } rectangle; This can often be omitted when there is just one instance of the thing - for automatic variables, it is often best to not do it, but variables with global or file scope should be turned into structs. The utility of this approach is clearer when there is an array of such objects, e.g. NO YES GLOBAL Int width [100]; GLOBAL struct GLOBAL Int height [100]; { Int width; Int height; } rectangle [100]; With the former, we don't know for sure (without inspecting the code) that the arrays must have the same size - conceptually we have n widths and n heights, which as far as we know are unrelated entities. With the latter, conceptually we have n rectangles, each of which has a width and a height. Whenever such a struct is deemed necessary, you will usually find that the new type should also have its own set of methods to manipulate it. These are invariably better made explicit and separate from any other type's methods (in other words, each struct should have its own set of manipulation routines). 6.2 Global Data Program global data should be avoided at all costs since it cannot be protected against change. Every data item in a program should belong to some `thing', and it is that thing's responsibility to maintain it. Rather than making a datum global, it is far better to provide a function to access it, since the internal details of the datum can be hidden, it is protected against uncontrolled change, and if necessary, it can be abstracted such that the function provides an 'ideal' view of the data while internally it is different, e.g. NO EXTERN Colour colour_table []; : rectangle_draw(x, y, width, height, colour_table[c]); : YES GLOBAL Colour colour_table(int colour); : rectangle_draw(x, y, width, height, colour_table(c)); : With the former, a later change which involves calling some X function to determine the Colour for an int means that any code that uses colour_table has to be modified, whereas with the former, only the function colour_table has to be modified. This principle should also be applied to file global data - it is invariably better to provide a functional interface (possibly using macros) to file global data as it abstracts them better, making it easier to later change their representation. 7.0 Functions 7.1 Function vs. Procedure A function is a callable unit which returns a value. A procedure is a callable unit which doesn't return a value (except perhaps an error or status code). The C language only has functions since all units return a value (even though that value's type may be void or unused). In general, C functions should be differentiated into procedures and functions (I will use italics here to make it clear that I don't mean a C function [bad luck for you ASCII readers!]): o A procedure should do something to (i.e. affect the state of, or change) one or more of its arguments (usually its first argument, which it is considered a method of). Successive calls to a procedure will generally have different effects. A procedure should return a status code indicating that it has succeeded or failed in 'doing something'. o A function should not do anything to any of its arguments - it should simply return some indication of the state of its argument(s). Successive calls to a function should produce exactly the same results (i.e. it should be idempotent). A function should return whatever is required to indicate the state being interrogated. The rationale for this is to focus procedures and functions on doing one thing - this makes them more reusable, and easier to write, understand, and maintain. Complex functions which do multiple things are required (e.g. generating a report involves a lot of different activities), but it is better to have these as a third class of [usually] non-reusable functions, e.g. NO el1 = (El ement *) list_delete_current(list); : el2 = (Element *) list_advance_to_next(list); YES el1 = (Element *) list_current_element(list); (VOID) list_delete_current(list); /* assume will not fail */ : (VOID) list_advance_to_next(list); /* assume will not fail */ el2 = (Element *) list_current_element(list); 7.2 Size of functions Functions should not be too large. This is deliberately not quantified, because any set of figures would be somewhat arbitrary. Do not make functions so large that 'keeping it all in your head' becomes too difficult. 7.3 Function Description The purpose of a header description of a function is to provide enough information to fully describe its intent (i.e. what it does), so allowing someone to use it without having to read the body of the function. It should describe: o the arguments it accepts, including, o type o acceptable values or ranges of values, o whether they are input, input+output, or output, o what it is intended to do, o in English and without any of the details of how it does it, o under what pre-conditions it promises to do this, o including temporal conditions (e.g. must be called immediately after object_do_something()), and state conditions (e.g. both list_before_first() and list_after_last() must be false), o what post-conditions will apply once the function has finished, o generally, these will be state conditions, o what the input+output and output arguments will be set to, o covering all possible cases, including when they will not be set, o if it is a function, what it returns, o covering all possible cases. Note that a function must handle all inputs which are listed as acceptable, and should check all pre-conditions using ASSERT() which will do the checking in development code but should do nothing in production code. It is a 'good thing' if the post-conditions are also checked with ASSERT(). Neither the pre-conditions nor post-conditions must be checked with if statements - this is the caller's responsibility, not the function's. 7.4 Reentrant Functions Using structs for all related data, passing a pointer to a struct as a function argument, and using no static data makes the code reentrant. This should be done where possible because of the event driven nature of our software. 8.0 Miscellaneous 8.1 Setting Memory There are a number of ways of setting memory in C: o Direct assignment, including structure assignment, which should be used whenever possible as the compiler will work out how much to copy. o The str... functions, which should be used to copy strings (i.e. arrays of characters). Note that str_ncpy is preferred over strncpy as it always terminates the destination with a NUL. NUL terminated strings are preferred because sizes are not required. o The bcopy and bzero routines (memcpy and memset on System V) which should be used to copy any other blocks of memory (e.g. opaque structs). These are the least preferable ways of setting memory because an explicit size is required. Initializing blocks of memory biases the design toward using 0 as an initial value (which is not always for the best) and puts a magic number (i.e. 0) in the design. Use explicit initialization of fields instead - defensive programming says to initialize everything, so initialize data with no obvious initial value to an invalid value (e.g. NULL for pointers). 8.2 Allocating and Freeing Memory If a module allocates memory for some object, it should also be the one to free it. Similarly, if a module simply fills in a given block of memory, it should not be the one to free the block. A corollary to this is that chunks of allocated memory should not be handed over by one module to another (except for modules specifically built to do this - e.g. a heap allocation module). This is poor practice since: o it means that two different modules share the data, making them interdependent and introducing potential for confusion and bugs; o it means that both modules need to understand the structure of the block of memory. A better solution is for some unique identifier to be generated for a block of memory and that identifier be used by all modules (other than the owning module) to reference a particular object instance. Note that this recommendation should not be followed religiously since it is relatively inefficient, but it does have a few advantages: o The identifiers are not specific to a process so can be saved in databases, unless an address which is meaningless to any other process. o Any memory or data structure corruption problems can be almost guaranteed to be in the owning module since no other module uses the address of the affected object(s). NO YES addr = object_new(); id = object_new(); addr->field= value; object_set_field(id, value); object_action(addr); object_action(id); utility_function(... utility_function(... addr, id, ...); ...); free(addr); object_destroy(id); addr may be inadvertently resued now, probably causing some awful memory allocation bug that is a nightmare to track down. id can also be reused, but the object routines will complain about it no longer being valid. ------------------------------------------------------------------------------- Matt Atterbury [matt@bacchus.esa.oz.au] Expert Solutions Australia, Melbourne UUCP: ...!uunet!munnari!matt@bacchus.esa.oz.au "klaatu barada nikto" or: ...!uunet!murtoa!bacchus.esa.oz.au!matt "consider this a divorce" ARPA: matt%bacchus.esa.oz.AU@uunet.UU.NET "life? don't talk to me about life!"