RegExp Analysis

This part of the report is based on running different regular expressions against the code. This way we can make statements about the following aspects of the code quality: Disclaimer: Some of the numbers found through regular expressions should be considered to be rough estimates, most of the time they will be pretty exact, in some other cases they maybe off by quite a bit.

General Considerations

The project has a total of 123 files and consists of 143846 brutto lines of code. The total project size is 4 MByte.

The distribution of files according to their major file type is

Comparing header files to source files, we find that % of the total code (counting LoC) is found in header files. Since header files should only contain declarations, but no implementation, any number above 10% here is suspicious.

Next, we can estimate the ratio of C to C++ code, by looking at the code found in C files versus C++ files.

Method C C++ %
Number of files ending with (.h+.c) vs (.hh+.cc+.cpp) %
Lines of Code in files with (.h+.c) vs (.hh+.cc+.cpp) %

In general, a project should either be a pure C or a pure C++ project. Any mixing will lead to problems sooner or later. This is an indication that the project is in a transition from one to the other (usually C to C++) and the transition has not been completed. Also, when using UML tools, usually the C part of the project is not modelled inside the UML tool.

Another interesting question to ask is how many of the 143846 brutto lines of code contain really useful information. For this the following list maybe helpful:

For details on file types please see the General Details. For details on statements please see the Statement Details.

Comments

Comments are very important. They help developers (old and new) to quickly understand the code. A rule of thumb says that about 10% to 20% of the code should be developer generated comments (the closer the language is to assembly the higher this number should be).

That being said it is important to point out, that quite a large portion of the comments are auto-generated by IDE's or by tools such as Rational Rose. Hence, to make a qualified statement about how good the code is commented one needs to separate the auto-generated from the human-generated comments. This is a little non-trivial, and is hard to do automated.

The whole project has 143846 brutto lines of code. Of those we found that:

Note, that for multi-line comments, we only counted the occurance of '/*'. Assume that multi-line comments use somewhere between 3 to 5 lines on average, so multiply this number by 3 to 5. As for the Rose comments, those are already counted in the single-line comments ('//').

In case a UML type tool such as Rose has been used, it is possible to use this information to get an estimate on which percentage of the code is really controlled by this tool. Naturally this percentage should be close to 100%. Especially for hybrid systems that may often not be the case and cause problems.

In addition it is wise to check the comments and make sure that

Details on comments can be found in Comment Details.

Copyright and Authorship

Protecting intellectual property is very important for many reasons. For companies this is related to revenue, for the opensouce community this is important to protect a common goal. In any event, it should always be made clear who owns the copyright and what licensing policy is being used (like GPL). So even if code is in the open source, it should still be declared as such. Hence, every file should have a note on who owns the copyright, and under what are the licensing terms for this file is.

In addition, for every file, all authors should be named who participated in creating that file. For one, this is important to identify the copyright, and second, in case of questions, it makes it possible to identify the person who may know the answer to questions relating to that file.

An author who is proud of his or her work will sign it. An author who does not sign it, may have several reasons for this, but none of those reasons has a positive connotation. Hence, in my humble opinion, programmers who don't sign their work, should not be allowed to program.

For details on copyright and authorship, please see the Author and Copyright Details.

Code Quality

There are many ways to guarantee code quality. Two of the most well known ones are asserts and exception handling via try-catch blocks. Whereas the former one maybe replaced by extensive unit testing, the later one, should never be missing.

For details please see the Code Quality Details.

Memory Leaks

Memory leaks can happen in many different ways. Quite often, however, it is related to the wrong use of the keywords 'new', 'delete', 'malloc' and 'free'.

The rules of thumb are as follows: in C++ projects, the keywords 'malloc' and 'free' should not be used. Furthermore, every object that has been created with a 'new' statement has to be removed with a 'delete' statement. So as long as there is no twisted logic (which there shouldn't), then for every 'new' there should be exactly one 'delete'.

For details please see the Memory Leak Details.

Preprocessor Directives

Preprocessor directives are o.k. when used moderately. However when used excessively, they are problematic, mostly because they circumvent the compiler. So, for instance, never should there be any functionality implemented in #define directives. So if the number of #define's is significantly larger than the number of files, usually some action needs to be taken.

For a detailed listing of each occurance, please see the Preprocessor Directive Details.

Enum, Typedef, Structs

In this section we look at the use of templates, namespaces, structs, typedefs and enums. Whereas the use of the first two is encouraged, the use of the second two is strongly debatable. As for enums, they don't hurt, but having too many of them, may not make the code very readable.

For details please seethe Enum, Typedef and Struct Details.

Classes and Inheritance

C++ is an object-oriented language and should be used as such. Very often one sees the mix of C and C++ code, or the half-hearted transition from C to C++. This leads to many problems and usually results in code that is less reliable.

For details please see the Classes and Inheritance Details.

File IO, System Calls, etc

Both file IO and system calls use valuable operating system resources. Those resources should be used with care. The following shows where, in what context and how often these resources were used. Usually one would want these resources to be managed centrally.

For details please see the File and System Calls Details.