How virtual functions and abstract classes work in C++
Virtual functions and abstract classes are evidently one of the key elements of C++ language and object-oriented programming (OOPs). Essentially it facilitates programmers to define rules for defination of a class and its functions also it is very vital in designing reusable and extendable code. Certainly most C++ programmers must be familiar with the benefits of using virtual functions and abstract clases in their code. But it will be facinating to understand the mechanism compilers perform internally to deal with virtual functions.
Basically when a class contains one or more virtual functions the compilers have to perform some extra work to generate appropriate assembly instruction to seamlessly invoke appropriate implementation of that function at base class level. When a function call is invoked through base class pointer for a particular function the compiler at compile time has to determine if the function is virtual or not. If the function is non-virtual it will simply place a call to the address of the function being called, this process is called **early binding**. However if the function is virtual the compiler has to take further steps to ensure appropriate base or derived implementation of the function is invoked at runtime, because at compile time it does not know which implementation of the function is called, this process performed by compiler is called late binding.
Early and late binding
Briefly speaking, early binding is a process of function binding which occurs at compile time and the compiler has clear idea which function is going to be called when the program is run, so it places appropriate assembly instruction to invoke that function using its address. Conversely in case of late binding it occurs at runtime while the program is running because at compile time the compiler does not know which implementation of function to bind for invocation therefore it does not bind or does not place any specific instructions to make a call however it palces few more instructions for the binding to occur at runtime. In the following code block, derived1 is upcasted to Base b, while compiling the main function on line number '40', it will simply perform early binding at compile time because the function printNumber is non virtual or just a normal function. However it will treat differently for calls to func1 and func2 on line number 41 and 42.
Compiler, Late binding and virtual keyword
Firstly virtual keyword informs the compiler that early binding should not be performed on the function which is defined as virtual. Secondly on learning about existence of virtual function the compiler assign a lookup table to the base class, lookup table is also called as virtual function table, vtable, dispatch table. Essentially vtable stores information about all the virtual functions in a class, moreover the vtable will also be inherited by all derived classes therefore each class base and every derived class will have its very own vtable. Thirdly the compiler also adds a hidden member pointer variable *__vptr called virtual pointer or vpointer, this vpointer points to the vtable of its own class. This two steps are undertaken for late bindings to occur at runtime.
Correspondingly after adding vptr and creating vtable for each class the compiler starts populating vtable for each class. A vtable size is determined by number of virtual functions defined in the class also it is placed in similar sequence as defined in class. Typically the vtable begins populating at base class level, for base class is will be very simple it will store address of each virtual function however for derived classes the vtable will store address of the function it is overidding and for non overriden functions it will store address of base class functions.
Initially when the object of a class is invoked the compiler initialises the vptr to its own vtable, this process is the first thing to happen in object creation for most compilers. Next the vtable for each class will be populated