RegExp Analysis
This part of the report is based on running different regular
expressions against the code. This way we can make statements
about the following aspects of the code quality:
- General Information
- Preprocessor Directives
- File IO, System Calls, etc
- Enum, Typedef, Structs
- Author and Copyright
- Comments
- Code Quality
- Classes and Inheritance
Disclaimer: Some of the numbers found through regular expressions
should be considered to be rough estimates, most of the time they will
be pretty exact, in some other cases they maybe off by quite a bit.
General Considerations
The project has a total of 123 files
and consists of 143846 brutto lines of code.
The total project size is 4 MByte.
The distribution of files according to their major file type is
-
29949 lines
in 36 .h files
-
64097 lines
in 38 .cpp files
Comparing header files to source files, we find that
%
of the total
code (counting LoC) is found in header files. Since header files
should only contain declarations, but no implementation, any number above
10% here is suspicious.
Next, we can estimate the ratio of C to C++ code, by looking at the
code found in C files versus C++ files.
Method |
C |
C++ |
% |
Number of files ending with (.h+.c) vs (.hh+.cc+.cpp) |
|
|
%
|
Lines of Code in files with (.h+.c) vs (.hh+.cc+.cpp) |
|
|
%
|
In general, a project should either be a pure C or a pure C++
project. Any mixing will lead to problems sooner or later.
This is an indication that the project is in a transition from one to
the other (usually C to C++) and the transition has not been
completed. Also, when using UML tools, usually the C part of the
project is not modelled inside the UML tool.
Another interesting question to ask is how many of the
143846 brutto lines of code
contain really useful information. For this the following list maybe helpful:
- 39514 lines contain statements (';')
- 13782 lines contain single line comments ('//')
- 1379 lines contain multi line comments ('/*')
- 3692 lines contain preprocessor statements ('#')
- 15080 lines contain empty lines
- 7570 lines contain only an opening block statement ({)
- 7567 lines contain only a closing block statement (})
For details on file types please see the General Details.
For details on statements please see the Statement Details.
Comments
Comments are very important. They help developers (old and new) to
quickly understand the code. A rule of thumb says that about 10% to
20% of the code should be developer generated comments (the closer the
language is to assembly the higher this number should be).
That being said it is important to point out, that quite a large
portion of the comments are auto-generated by IDE's or by tools such as
Rational Rose. Hence, to make a qualified statement about how
good the code is commented one needs to separate the auto-generated
from the human-generated comments. This is a little non-trivial,
and is hard to do automated.
The whole project has 143846 brutto lines of code. Of those we found
that:
-
Rose comments ('//##') occured 1 times.
-
regular comments ('//') occured 13782 times
-
multi-line comments ('/*...*/') occured 1379 times
Note, that for multi-line comments, we only counted the occurance of
'/*'. Assume that multi-line comments use somewhere between 3 to
5 lines on average, so multiply this number by 3 to 5. As for the Rose
comments, those are already counted in the single-line comments ('//').
In case a UML type tool such as Rose has been used, it is possible to
use this information to get an estimate on which percentage of the code
is really controlled by this tool. Naturally this percentage
should be close to 100%. Especially for hybrid systems that may
often not be the case and cause problems.
In addition it is wise to check the comments and make sure that
- some comments may be commented source code
- comments maybe automatically generated and thus useless
Details on comments can be found in Comment Details.
Copyright and Authorship
Protecting intellectual property is very important for many
reasons. For companies this is related to revenue, for the
opensouce community this is important to protect a common goal.
In any event, it should always be made clear who owns the copyright and
what licensing policy is being used (like GPL). So even if code is in
the open source, it should still be declared as such. Hence, every file
should have a note on who owns the copyright, and under what are the
licensing terms for this file is.
In addition, for every file, all authors should be named who
participated in creating that file. For one, this is important to
identify the copyright, and second, in case of questions, it makes it
possible to identify the person who may know the answer to questions
relating to that file.
- The keyword 'copyright' was found 138 times
out of 123 files.
It appears that most files have the required copyright notice, but checking a couple of files, just to make sure, may still be a good idea.
- The keyword 'author' was found 31 times
out of 123 files.
It seems that in this project the author was not very proud of his work.
An author who is proud of his or her work will sign it. An author who
does not sign it, may have several reasons for this, but none of those
reasons has a positive connotation. Hence, in my humble opinion,
programmers who don't sign their work, should not be allowed to program.
For details on copyright and authorship, please see the Author and Copyright Details.
Code Quality
There are many ways to guarantee code quality. Two of the most well known ones are asserts
and exception handling via try-catch blocks. Whereas the former one maybe replaced by
extensive unit testing, the later one, should never be missing.
-
The keyword 'assert' was found 691 times.
This should be compared to a total of 143846 lines of code.
Assuming that a third to a quater is actual statements, and further assuming that every method
should have somewhere between 10 to 50 lines of code, there should be roughly somewhere
between 719 and 4794
asserts in this code.
-
The keyword 'try' was found 0 times.
Here, similar numbers as above for the assert can be used to get a rough idea of what are decent numbers.
This project, however, seems to avoid the use of exception handling at all cost. I would not want
to sit in a car or an airplane whose software was written by this r.
-
The keyword 'catch' was found 4 times.
In general, the number of catches should be larger than the number of try's.
For details please see the Code Quality Details.
Memory Leaks
Memory leaks can happen in many different ways. Quite often, however,
it is related to the wrong use of the keywords 'new', 'delete', 'malloc' and
'free'.
The rules of thumb are as follows: in C++ projects, the keywords 'malloc' and
'free' should not be used. Furthermore, every object that has been created
with a 'new' statement has to be removed with a 'delete' statement. So as
long as there is no twisted logic (which there shouldn't), then for every 'new'
there should be exactly one 'delete'.
-
The keyword 'new' was found 939 times.
-
The keyword 'delete' was found 497 times.
The number of delete's in general should be at least as high as the number of new's. It is best if they are the same.
-
The keyword 'malloc' was found 5 times.
In general, in C++ programs is the use of malloc strongly discouraged. It only causes trouble.
-
The keyword 'free' was found 44 times.
For every malloc there should be a free. Also, in C++ projects there should be no free's.
For details please see the Memory Leak Details.
Preprocessor Directives
Preprocessor directives are o.k. when used moderately. However when used excessively, they are problematic,
mostly because they circumvent the compiler. So, for instance, never should there be any functionality implemented
in #define directives. So if the number of #define's is significantly larger than the number of files, usually some
action needs to be taken.
-
#pragma: 0
-
#define: 149
(compared to a total of 123 files)
-
#undef: 123
-
#include: 259
-
#if: 49
-
#elif: 33
-
#else: 0
-
#ifdef: 569
-
#ifndef: 703
-
#endif: 0
-
#error: 0
-
#warning: 0
For a detailed listing of each occurance, please see the Preprocessor Directive Details.
Enum, Typedef, Structs
In this section we look at the use of templates, namespaces, structs, typedefs and enums.
Whereas the use of the first two is encouraged, the use of the second two is strongly
debatable. As for enums, they don't hurt, but having too many of them, may not make
the code very readable.
-
The keyword 'template' was found 19 times.
The use of templates in general is encouraged, especially when used in place of defines.
Using templates can reduce duplicate code, and thus avoid many problems related to it.
-
The keyword 'namespace' was found 10 times.
The use of namespaces is strongly encouraged, especially in larger systems. In general one would
expect the number of namespaces to equal the number of classes respectively .cpp files. (Assuming one
class per file). That is in this case, one would expect somewhere around 61
occurances of the keyword namespace.
-
The keyword 'struct' was found 39 times.
In C++ a struct is almost exactly the same as a class (this is different in C). Hence the use of
structs indicates a C past of the code, and it is encouraged to turn them into classes in C++.
-
The keyword 'typedef' was found 18 times.
The benefit of typedefs is strongly debatable. Unless there is a very good reason for using them,
they should be replaced.
-
The keyword 'enum' was found 84 times.
The enums usually are a short hand, human readable notation. In many cases they may be helpful.
-
The keyword 'inline' was found 400 times.
This number should be compared with the total number of functions. Inline functions tend to
make code faster, but will result in a larger memory requirement. The developer has to
decide which is more important. (The number above may include duplicate mentionings).
For details please seethe Enum, Typedef and Struct Details.
Classes and Inheritance
C++ is an object-oriented language and should be used as such. Very often one sees the mix of
C and C++ code, or the half-hearted transition from C to C++. This leads to many problems and
usually results in code that is less reliable.
-
The keyword 'class X : Y,Z' was found 4 times.
look for multiple inheritance (is not perfect, but fast):
(The number above may include duplicate mentionings).
-
The keyword 'class X : Y' was found 173 times.
look for inheritance (maybe public can be before class?):
(The number above may include duplicate mentionings).
-
The keyword 'class' was found 537 times.
find classes (maybe public can be before class?, also need to worry about duplicates because of forward declaration):
(The number above may include duplicate mentionings).
-
The keyword 'virtual x() =0' was found 24 times.
Similar to virtual functions, pure virtual functions indicate that the project makes use of inheritance.
In addition, pure virtual functions are C++ way to define interfaces. The use of interfaces
is strongly encouraged when it leads to decoupling of separate components.
(The number above may include duplicate mentionings).
-
The keyword 'virtual' was found 521 times.
The use of virtual functions indicates that the project makes use of inheritance. If this keyword
is not or rarely used, it is a strong indication that the project is not really object-oriented. On
the other hand, the other extreme of overusing virtual functions may cause other problems.
(The number above may include duplicate mentionings).
For details please see the Classes and Inheritance Details.
File IO, System Calls, etc
Both file IO and system calls use valuable operating system resources. Those resources should be
used with care. The following shows where, in what context and how often these resources were
used. Usually one would want these resources to be managed centrally.
-
The keyword 'fopen' was found 36 times.
Files that are opened also should be closed. In addition, which files are used and how many. Maybe
some files can be put together into one. Maybe a temp file is not really needed.
-
The keyword 'dev' was found 216 times.
dev on unix machines usually refers to devices. Here one should make sure that working
with those devices is done properly.
-
The keyword '/../' was found 883 times.
On Unix machines this refers to files. Sometimes instead of using fopen, direct system or shell
commands are used. Looking for this pattern may reveal these type of file access.
Unfortunately, it is also used in web links, and hence this number maybe quite off.
-
The keyword 'system' was found 0 times.
System calls can have many side effects, they should be used sparingly.
-
The keyword 'exec' was found 0 times.
System calls can have many side effects, they should be used sparingly.
-
The keyword 'mmap' was found 0 times.
Memory maps can have many side effects, they should be used sparingly.
-
The keyword 'memmap' was found 0 times.
Memory maps can have many side effects, they should be used sparingly.
-
The keyword 'pthread' was found 0 times.
The use of threads increases the complexity of a program significantly. Everybody needs to
be aware when they are used. If they are used all the issues related to threads must be considered.
For details please see the File and System Calls Details.