I know that I took some time to continue writing, but I was really busy. At least I had time to read many interesting books, and to stop reading some boring ones.

In the last post I talked just a little bit about the book "Efficient C++ Performance Programming Techniques" and I decided now to continue the posts talking exactly about this book written by Dov Bulka and David Mayhew.
The book is perfect for those ones who are on C and thinking about to upgrade to C++. It is possible to see sometimes that there is a lot of prejudice among embedded developers and C purist developers against C++. This book is exactly for those ones. If you have thoughts about how C is ever faster than C++, I would recommend this book for you; not to "evangelize" you, but to have a good balance on what you think about each language. And even better, to understand many of the prejudice against C++ is mainly based on the lack of knowledge on how to work with classes in C++.

I can tell you, I had this prejudice against C++ when starting my career and could overcome it even before reading this book, but after this reading, I could really go ahead in not just being comfortable on programming C++, but also understand more deeply how things are done internally when working with classes. That is extremely important when performance requirements are a critical requirement.

The book explains many details about classes and object creations, it points some common mistakes that we are used to do innocently. Not just the big mistakes, but also some mistakes that could event not be considered a mistake, but that can make a big difference when we are working on a software where each clock is precious.
It won`t make you crazy about each clock spent by your software, but it will give you a good balance when thinking about how much of your time should be spent on optimizing the code, and where to attack when working on it.
It will give you a good perspective on what compilers do behind the scenes, especially when working with inline functions. A lot of the book is just about inlining! With many tricks to consider!
It is full of performance comparisons. Each time it explains some performance hints on C++ it shows the bench-marking results they got with that.

And if you think that working with STL libraries you don`t need to care so much about that, since STL is well optimized by definition, the book contains a chapter about that. It`s a good chapter about how to use STL efficiently, about which container is better for each kind of application. And of course, when would be faster not to use STL, what would make many C++ developers to screw up.
For those who still stands for pure C, the book has also many hints that can be very useful for C software also, and even I had implemented some hints from the book in one of my C projects.
Finally, the book covers also aspects of scalability, architecture and threats separately code optimizations and design optimizations.

Well, I did not become those ones that want to convert all C projects into C++ code like crazy. I still work happily on both projects at work, but I admit that there is one specific C application of mine that I really would like to convert it all to C++ and it would make my life easier after that. But for now, what I could do is apply many cache considerations that was learned here.

"Software performance is important and always will be."

As I use to say, in my working area, the good book (or site) is the one that inspires you to development, that fires you with new ideas for your projects. Here is one book that has this feature.

"Optimizing software in C++" is not what we could call a real book. It has no ISBN, the unique reference I had to it was on internet and I couldn't find any hard cover version. Anyway it is an electronic format reading that really deserves a hard cover and any honor that is given to a book.

When developing a project that had a rigorous requirement for performance I started to look on internet about how to improve even better my software. I had already well balanced the multi-threading design for the best processing, network bottlenecks were not an issue anymore and the IO configuration of the Linux servers were already working for me after some adjustments. Then I just had the CPU to work on in order to improve performance and reach my goal of channel numbers.

Well, as I said in a previous post, I found lots of partial information about the subject, but nothing explaining well all the reasons. Until I found the pdf "Optimizing software in C++" here.

This book goes deep on software optimization considering each clock that could be saved during your software implementation. Since I was already used with firmware development, that was not a problem for me, but most people would say that it is not worthy to care about some clocks. Well, depending on the project you are working on, some clocks may make a big difference.

As an example, my project was being designed to generate one network packet at each 20 milliseconds, and the same software had to handle one incoming network packet at each 20 milliseconds. It still does not seems to be too much, right? But imagine my case where I was trying to deal with 4000 channels in a 2GHz 8 core CPU?
This amount of channels required the processing of at least 8000 packets (generate and receive) at each 20 milliseconds. At this point, making the calculation, there is 5000 clock cycles to process each packet, does it seem enough? Well, considering that I had to compress/uncompress data, protect/unprotect data using OpenSSL, move buffers and package/unpackage them for other layers of data before sending/receiving, apply digital signal processing algorithms on the information received... it is really not too much clock cycles. I was feeling like I was developing an embedded firmware again (despite the amount of available memory that was huge, not common for embedded). It is not to be forgotten also that I had to consider many threads competing for processor time generating task switch overhead and other parts of software that was not directly related to packet handling. Thinking about these issues, now, for sure, 5000 cycles were too few and I had to improve the usage of CPU!

I had already a good understanding about many things regarding performance improvement, but the reading of this book helped me to organize my ideas, to focus on what is real relevant and many "whys" were much better explained. After all, I can say that many performance concepts were already in my mind, but if you cannot translate your thoughts into words, it's because you are not controlling this thought at all, so you cannot use it as an advantage.

Finally, how did I apply some lessons learned with this book in my project?

I started implementing the memory considerations described in the book. Since memory limitations was not an issue for my project, I decided to create polls allocating on heap the memory that was being used in critical areas of the code and avoid memory allocations in parts of code that must have a Real Time response. Although memory was not an issue, I had to consider also to minimize the structure sizes in order to take advantage of the cache memory. If I wasted memory with irrelevant things I would loose cache efficiency. As it is explained:

"Reading or writing to a variable in memory takes only 2-3 clock cycles if it is cached, but several hundred clock cycles if it is not cached"

One can feel very well this cache problem when programing for GPU, where there is an entire mechanism designed just for memory access and cache improvement.

And then comes the question: "Who would waste memory with irrelevant things?"

Well, if your memory structures are not well organized, some bytes of memory are left unused due to alignment issues and then I started to reorder my structures to make them more cache friendly. These all is well explained in the book and it makes a huge difference in how you organize your structures.

After that I started to verify if I could remove some decision points from critical areas of code. I removed any "if" statement that could be performed during channel initialization instead of during critical code (packet handling). This comes in order to avoid any branch miss-prediction penalty. Of course, I could not eliminate all of them, but at least were it was possible, I could achieve some improvement.

Some other minor improvements were implemented also and many others I didn't apply since I had already reached my goal. Some of them are really complex, mainly the ones regarding cache and memory page organization. The book covers also compiler characteristics also but it would be too much to describe here in a post that is already too large.

Perhaps you would not relate all those things with "Cplusplus" programming, but more with pure C. That was my feeling also and my project was actually not using C++, but the concepts covered still applies to C++ also.

Well, some fanatic for design pattern and OOP would become crazy with something that is written in the book, I prefer to adopt a balance approach according the requirements of the project. Take a look on a piece bellow:

"University courses in programming nowadays stress the importance of structured and object-oriented programming, modularity, reusability and systematization of the software development process. These requirements are often conflicting with the requirements of optimizing the software for speed or size.
Today, it is not uncommon for software teachers to recommend that no function or method should be longer than a few lines. A few decades ago, the recommendation was the opposite: Don't put something in a separate subroutine if it is only called once. The reasons for this shift in software writing style are that software projects have become bigger and more complex, that there is more focus on the costs of software development, and that computers have become more powerful."

That is really true, but we don`t need to go too deep to any side.

A book that detail very well with that, and makes a more balanced OOP programming using C++, design patterns and performance is the "Efficient C++ Performance Programming Techniques" that I am reading right now and it seems wonderful. I will talk about it later in a post. As an appetizer from there: "Software perfection means you compute what you need, all of what you need, and nothing but what you need.

You may find lots of resources about optimization in the site of the writer and even about assembly optimization if it is needed: www.agner.org/optimize

It's worthy to take a look there.


Some people may ask: "Why books if you can find any technical information in blogs, forums or anywhere in internet?" Well, I had this question to myself once, that after sometime showed me a lack experience.

The main difference between "internet knowledge" and "books knowledge" is that all information in internet is sparse, where you have to "mine" parts of information in many sites; but most of the books contain a concrete and complete information all-together, well organized. Of course, there are many sites very good about some topics and they really intend to be as complete and concrete as any book, but even some of those sites were not so good as reading a good book to acquire concise information.

For example, I realized that when I was looking for information about wavelets. I found many sites with topics about it, implementation sources, lots of formulas and so on, but what I really liked to know was about the Digital Signal Processing concepts implied on it. I did learn a lot while searching on internet, and it was good; but I still had the feeling that the concept was not deep understood, I didn't know the "why" yet.

Finally I found the book Conceptual Wavelets that I wrote about in my previous post. I will not talk even more about the book, but that was an example for me were a well written book could supply my expectations in an ordered manner that hardly would be supplied in internet.

I am not saying here that books will always be better than any site but I have the feeling that a well chosen book tends to be more helpful when looking for an specific knowledge about a complex subject.

Another example I can tell comes from when I was learning about NVIDIA and CUDA GPU programming. The site of nvidia is wonderful, full of explanations, full of examples and pictures. The API documentation is well designed. Oh, not forgetting, there are many and many video tutorial about CUDA programming. I learned a lot with them, implemented somethings also, but the feeling that I had a real concrete knowledge about CUDA programming came after reading the book "CUDA by Example", that by the way, was released by nvidia and it is available for downloading in the nvidia developer zone.

I read the CUDA by Example book in the electronic PDF format and it is a wonderful book. So, it doesn't matter the format of book, if it is hardcover or electronic, but what I am talking is about organization, concrete knowledge released in such a way for your mind that it even fires you to implement something applying what you are learning, because it is so well constructed, that you feel even uncomfortable to let it to be forgotten.

As you could see, I strongly recommend the book CUDA by Example for anyone that is starting with CUDA programming, but I will talk specifically about that in another post. For now I am just anxious for my next reading about nvidia, the book "CUDA Application Design and Development", that seems to be very good. It's in my reading queue.

But then, comes the question: Is there sites that could be so helpful as books? Of course there is! But it should be a very focused site.

An example of that is the following site about Design Patterns.

In my own implementations I was feeling bad about some disorganization in my code, about some ugly things implemented and about the problem for re-usability that I was creating. Then I borrowed from a friend of mine a very good book called Design Patterns: Elements of Reusable Object-Oriented Software and started the reading. Well, I confess, I got a little bit annoyed with the reading. So, I decided to take a look on internet and I found the site Design Patterns from sourcemake.com as I said before. After taking a look in the site, I started immediately to implement some Design Patterns in my projects, since it was all very clear, so well explained and with very good examples in many programming languages that I had to start implementing as fast a I could.

So, here is an example where an internet site was more helpful than a book, and a site that fired me to the code implementation. I know that the subject here is about information taken by reference about some independent topics and not about constructing a knowledge about some specific technology, but I will write about it latter.

For now, just let me show one more site helpful for the Design Patterns subject: http://www.tutorialspoint.com/design_pattern

I hope you enjoyed the reading.

The book Conceptual Wavelets, by D. Lee Fugal, is the first one that motivated me to write about books and even implement them, not just read, but apply the knowledge.

Just after graduation, I had already a very good knowledge and a real experience working with Digital Signal Processing. I had no problems with spectrum analysis with Fourier transforms, filters, convolutions, modulations and so on, but I didn't know yet what Wavelets really was. It wasn't taught in my graduation, just mentioned. So, I had to go after by myself.

Then I started looking for information in internet and I found many sites with tutorials, short explanations, integral formulas and even source code. But none of them were real complete and none of them covered the DSP principles and implementation details as I expected. Until I found the book Conceptual Wavelets. As the title says, it is conceptual, meaning that the author is not just concerned about knowing what is wavelets and how to implement it, but also about how the idea was conceived and all the implications of one or another implementation could lead to.

The book assumes that the reader already has some knowledge about DSP and programming techniques, and uses this very well while building all the knowledge about wavelets. Of course, if somebody reading the book is not familiar with such things like convolution properties and digital signals, it would be a very hard reading; but for the ones who are familiar with that, it will be very delightful to see how it is used in Wavelets.

It starts with an overall explanation about what could be achieved using Wavelets techniques that awakens an excitement in anyone who has some passion for Signal Analysis. It is impossible not to become amazed with the power of Wavelets in signal analysis.

I don't have problems with formulas, but some books go to deep on them. This book goes directly to the implementation, from the simpler one to some more advanced scenarios. It not just shows the properties of each kind of implementation, but the book prove each one programmatically.

The good book is the one that leads you to apply all information inside, and that is the case. In my job, I started a new project that applies Wavelet algorithms and it is still under implementation due to other priorities. But I could not wait, and then I decided to implement some Wavelet algorithms by my own and now I am working on some applications using Wavelets. If you want to take a look:


Finally, near to end of the book, Conceptual Wavelets presents some topics about properties and applications of Wavelet families and some Wavelet applications that I have never seen yet elsewhere. I couldn't see such a good application explanation even in some books just about Wavelet applications.

So, I really recommend this book for anyone who wants to learn about Wavelets or to anyone who just like Digital Signal Processing.

In case you want to know more about it before buying it, visit the site or take a look in some free chapters:



If you have some other reference to recommend, please tell me.