RegExp Analysis

This part of the report is based on running different regular expressions against the code. This way we can make statements about the following aspects of the code quality:

General Information
Preprocessor Directives
File IO, System Calls, etc
Enum, Typedef, Structs
Author and Copyright
Comments
Code Quality
Classes and Inheritance

Disclaimer: Some of the numbers found through regular expressions should be considered to be rough estimates, most of the time they will be pretty exact, in some other cases they maybe off by quite a bit.

General Considerations

The project has a total of 123 files and consists of 143846 brutto lines of code. The total project size is 4 MByte.

The distribution of files according to their major file type is

29949 lines in 36 .h files
64097 lines in 38 .cpp files

Comparing header files to source files, we find that % of the total code (counting LoC) is found in header files. Since header files should only contain declarations, but no implementation, any number above 10% here is suspicious.

Next, we can estimate the ratio of C to C++ code, by looking at the code found in C files versus C++ files.

Method	C	C++	%
Number of files ending with (.h+.c) vs (.hh+.cc+.cpp)			%
Lines of Code in files with (.h+.c) vs (.hh+.cc+.cpp)			%

In general, a project should either be a pure C or a pure C++ project. Any mixing will lead to problems sooner or later. This is an indication that the project is in a transition from one to the other (usually C to C++) and the transition has not been completed. Also, when using UML tools, usually the C part of the project is not modelled inside the UML tool.

Another interesting question to ask is how many of the 143846 brutto lines of code contain really useful information. For this the following list maybe helpful:

39514 lines contain statements (';')
13782 lines contain single line comments ('//')
1379 lines contain multi line comments ('/*')
3692 lines contain preprocessor statements ('#')
15080 lines contain empty lines
7570 lines contain only an opening block statement ({)
7567 lines contain only a closing block statement (})

For details on file types please see the General Details. For details on statements please see the Statement Details.

Comments

Comments are very important. They help developers (old and new) to quickly understand the code. A rule of thumb says that about 10% to 20% of the code should be developer generated comments (the closer the language is to assembly the higher this number should be).

That being said it is important to point out, that quite a large portion of the comments are auto-generated by IDE's or by tools such as Rational Rose. Hence, to make a qualified statement about how good the code is commented one needs to separate the auto-generated from the human-generated comments. This is a little non-trivial, and is hard to do automated.

The whole project has 143846 brutto lines of code. Of those we found that:

Rose comments ('//##') occured 1 times.
regular comments ('//') occured 13782 times
multi-line comments ('/*...*/') occured 1379 times

Note, that for multi-line comments, we only counted the occurance of '/*'. Assume that multi-line comments use somewhere between 3 to 5 lines on average, so multiply this number by 3 to 5. As for the Rose comments, those are already counted in the single-line comments ('//').

In case a UML type tool such as Rose has been used, it is possible to use this information to get an estimate on which percentage of the code is really controlled by this tool. Naturally this percentage should be close to 100%. Especially for hybrid systems that may often not be the case and cause problems.

In addition it is wise to check the comments and make sure that

some comments may be commented source code
comments maybe automatically generated and thus useless

Details on comments can be found in Comment Details.

Copyright and Authorship

Protecting intellectual property is very important for many reasons. For companies this is related to revenue, for the opensouce community this is important to protect a common goal. In any event, it should always be made clear who owns the copyright and what licensing policy is being used (like GPL). So even if code is in the open source, it should still be declared as such. Hence, every file should have a note on who owns the copyright, and under what are the licensing terms for this file is.

In addition, for every file, all authors should be named who participated in creating that file. For one, this is important to identify the copyright, and second, in case of questions, it makes it possible to identify the person who may know the answer to questions relating to that file.

The keyword 'copyright' was found 138 times out of 123 files. It appears that most files have the required copyright notice, but checking a couple of files, just to make sure, may still be a good idea.
The keyword 'author' was found 31 times out of 123 files. It seems that in this project the author was not very proud of his work.

An author who is proud of his or her work will sign it. An author who does not sign it, may have several reasons for this, but none of those reasons has a positive connotation. Hence, in my humble opinion, programmers who don't sign their work, should not be allowed to program.

For details on copyright and authorship, please see the Author and Copyright Details.

Code Quality

There are many ways to guarantee code quality. Two of the most well known ones are asserts and exception handling via try-catch blocks. Whereas the former one maybe replaced by extensive unit testing, the later one, should never be missing.

The keyword 'assert' was found 691 times.
The keyword 'try' was found 0 times.
The keyword 'catch' was found 4 times.

For details please see the Code Quality Details.

Memory Leaks

Memory leaks can happen in many different ways. Quite often, however, it is related to the wrong use of the keywords 'new', 'delete', 'malloc' and 'free'.

The rules of thumb are as follows: in C++ projects, the keywords 'malloc' and 'free' should not be used. Furthermore, every object that has been created with a 'new' statement has to be removed with a 'delete' statement. So as long as there is no twisted logic (which there shouldn't), then for every 'new' there should be exactly one 'delete'.

The keyword 'new' was found 939 times.
The keyword 'delete' was found 497 times.
The keyword 'malloc' was found 5 times.
The keyword 'free' was found 44 times.

For details please see the Memory Leak Details.

Preprocessor Directives

Preprocessor directives are o.k. when used moderately. However when used excessively, they are problematic, mostly because they circumvent the compiler. So, for instance, never should there be any functionality implemented in #define directives. So if the number of #define's is significantly larger than the number of files, usually some action needs to be taken.

#pragma: 0
#define: 149 (compared to a total of 123 files)
#undef: 123
#include: 259
#if: 49
#elif: 33
#else: 0
#ifdef: 569
#ifndef: 703
#endif: 0
#error: 0
#warning: 0

For a detailed listing of each occurance, please see the Preprocessor Directive Details.

Enum, Typedef, Structs

In this section we look at the use of templates, namespaces, structs, typedefs and enums. Whereas the use of the first two is encouraged, the use of the second two is strongly debatable. As for enums, they don't hurt, but having too many of them, may not make the code very readable.

The keyword 'template' was found 19 times.
The keyword 'namespace' was found 10 times.
The keyword 'struct' was found 39 times.
The keyword 'typedef' was found 18 times.
The keyword 'enum' was found 84 times.
The keyword 'inline' was found 400 times.

For details please seethe Enum, Typedef and Struct Details.

Classes and Inheritance

C++ is an object-oriented language and should be used as such. Very often one sees the mix of C and C++ code, or the half-hearted transition from C to C++. This leads to many problems and usually results in code that is less reliable.

The keyword 'class X : Y,Z' was found 4 times.
The keyword 'class X : Y' was found 173 times.
The keyword 'class' was found 537 times.
The keyword 'virtual x() =0' was found 24 times.
The keyword 'virtual' was found 521 times.

For details please see the Classes and Inheritance Details.

File IO, System Calls, etc

Both file IO and system calls use valuable operating system resources. Those resources should be used with care. The following shows where, in what context and how often these resources were used. Usually one would want these resources to be managed centrally.

The keyword 'fopen' was found 36 times.
The keyword 'dev' was found 216 times.
The keyword '/../' was found 883 times.
The keyword 'system' was found 0 times.
The keyword 'exec' was found 0 times.
The keyword 'mmap' was found 0 times.
The keyword 'memmap' was found 0 times.
The keyword 'pthread' was found 0 times.

For details please see the File and System Calls Details.