Automatic doctoral advisor genealogy diagram using Wikipedia

An idea to me occured some time ago when I was bored and browsing through Wikipedia looking up famous historical figures in science. At the time, I was interested in each person’s supervisor and kept clicking the doctoral advisor link in their bio box to see where it would take me. Some went as far back as the 17th century. I thought it would be a nifty idea to hack togther a quick script that would grab data from Wikipedia and automatically output a tree diagram showing the PhD genealogy/lineage. So I did it.

Here are the results for two people I chose. The first is Alan Turing (aka the father of computer science) and the second is Robert Oppenheimer (aka the father of the atomic bomb). I have a thing for a World War 2 scientists.

The arrow points to the doctoral advisor of the current node. Alan Turing’s doctoral lineage has some very well known names that I’ve come across during my studies in engineering;  Bernoulli, Poisson, Euler, Laplace and Lagrange. Wow, that’s a lot! There’s also Gauss that appears on Robert Oppenheimer’s side.

Robert Oppenheimer

Alan Turing

I’ll probably add more in the upcoming days if I find anything else interesting.

Download

Download doctoral_advisor_tree.zip

The script has been written for Linux and requires PHP and GraphViz installed, both are in Ubuntu’s synaptics. Also, set the file permission of the script to be executable. It’s usage is as follows:

./doctoral_advisor_tree [wiki link] [graph.txt] [graph.png]

wiki link is the actual URL to the Wikipedia page of the person you are interested in, graph.txt is a temporary file created for input into GraphViz. Here’s an example usage:

./doctoral_advisor_tree http://en.wikipedia.org/wiki/John_Forbes_Nash,_Jr. \ 
graph.txt graph.png

You can actually try the above command. It’s for the mathematician John Nash, who gained mainstream popularity from the film based on him, ‘A Beautiful Mind’.

Important

This script is very simple and has not been tested thorughly. It was written in like 2-3 hours. It relies on the Wikipedia page to follow a specific formatting style.

OpenCV vs. Armadillo vs. Eigen on Linux revisited

This is a quick revisit to my recent post comparing 3 different libraries with matrix support. As suggested by one of the comments to the last post, I’ve turned off any debugging option that each library may have. In practice you would have them on most of the time for safety reasons, but for this test I thought it would be interesting to see it turned off.

Armadillo and Eigen uses the define ARMA_NO_DEBUG and NDEBUG respectively to turn off error checking. I could not find an immediate way to do the same thing in OpenCV, unless I  edit the source code, but chose not to. So keep that in that mind. I also modified the number of iterations for each of the 5 operation performed to be slightly more accurate. Fast operations like add, multiply, transpose and invert have more iterations performed to get a better average, compared to SVD, which is quite slow.

On with the results …

Add

Performing C = A + B

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00093 0.00008 0.00007
8×8 0.00039 0.00006 0.00015
16×16 0.00066 0.00030 0.00059
32×32 0.00139 0.00148 0.00194
64×64 0.00654 0.00619 0.00712
128×128 0.02454 0.02738 0.03225
256×256 0.09144 0.11315 0.10920
512×512 0.47997 0.57668 0.47382

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 12.12x 14.35x
8×8 1.00x 6.53x 2.63x
16×16 1.00x 2.19x 1.13x
32×32 1.39x 1.31x 1.00x
64×64 1.09x 1.15x 1.00x
128×128 1.31x 1.18x 1.00x
256×256 1.24x 1.00x 1.04x
512×512 1.20x 1.00x 1.22x

Multiply

Performing C = A * B

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00115 0.00017 0.00086
8×8 0.00195 0.00078 0.00261
16×16 0.00321 0.00261 0.00678
32×32 0.01865 0.01947 0.02130
64×64 0.15366 0.33080 0.07835
128×128 1.87008 1.72719 0.35859
256×256 15.76724 3.70212 2.70168
512×512 119.09382 24.08409 22.73524

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 6.74x 1.34x
8×8 1.34x 3.34x 1.00x
16×16 2.11x 2.60x 1.00x
32×32 1.14x 1.09x 1.00x
64×64 2.15x 1.00x 4.22x
128×128 1.00x 1.08x 5.22x
256×256 1.00x 4.26x 5.84x
512×512 1.00x 4.94x 5.24x

Transpose

Performing C = A^T

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00067 0.00004 0.00003
8×8 0.00029 0.00006 0.00008
16×16 0.00034 0.00028 0.00028
32×32 0.00071 0.00068 0.00110
64×64 0.00437 0.00592 0.00500
128×128 0.01552 0.06537 0.03486
256×256 0.08828 0.40813 0.20032
512×512 0.52455 1.51452 0.77584

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 17.61x 26.76x
8×8 1.00x 4.85x 3.49x
16×16 1.00x 1.20x 1.21x
32×32 1.56x 1.61x 1.00x
64×64 1.35x 1.00x 1.18x
128×128 4.21x 1.00x 1.88x
256×256 4.62x 1.00x 2.04x
512×512 2.89x 1.00x 1.95x

Inversion

Performing C = A^-1

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00205 0.00046 0.00271
8×8 0.00220 0.00417 0.00274
16×16 0.00989 0.01255 0.01094
32×32 0.06101 0.05146 0.05023
64×64 0.41286 0.25769 0.27921
128×128 3.60347 3.76052 1.88089
256×256 33.72502 23.10218 11.62692
512×512 285.03784 126.70175 162.74253

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.32x 5.85x 1.00x
8×8 1.90x 1.00x 1.52x
16×16 1.27x 1.00x 1.15x
32×32 1.00x 1.19x 1.21x
64×64 1.00x 1.60x 1.48x
128×128 1.04x 1.00x 2.00x
256×256 1.00x 1.46x 2.90x
512×512 1.00x 2.25x 1.75x

SVD

Performing full SVD, [U,S,V] = SVD(A)

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.01220 0.22080 0.01620
8×8 0.01760 0.05760 0.03340
16×16 0.10700 0.16560 0.25540
32×32 0.51480 0.70230 1.13900
64×64 3.63780 3.43520 6.63350
128×128 27.04300 23.01600 64.27500
256×256 240.11000 210.70600 675.84100
512×512 1727.44000 1586.66400 6934.32300

Normalised

Discussion

Overall, the average running time has decreased for all the operations, which is a good start. Even OpenCV has lower running time, maybe the NDEBUG has an affect, since it’s a standardised define.

Speed up over slowest OpenCV Armadillo Eigen
4×4 18.10x 1.00x 13.63x
8×8 3.27x 1.00x 1.72x
16×16 2.39x 1.54x 1.00x
32×32 2.21x 1.62x 1.00x
64×64 1.82x 1.93x 1.00x
128×128 2.38x 2.79x 1.00x
256×256 2.81x 3.21x 1.00x
512×512 4.01x 4.37x 1.00x

Discussion

Overall, average running time has decreased for all operations, which is a good sign. Even OpenCV, maybe the NDEBUG has an affect, since it’s a standardised define.

The results from the addition test show all 3 libraries giving more or less the same result. This is probably not a surprise since adding matrix is a very straight forward O(N) task.

The multiply test is a bit more interesting. For matrix 64×64 or larger, there is a noticeable gap between the libraries. Eigen is very fast, with Armadillo coming in second for matrix 256×256 or greater. I’m guessing for larger matrices Eigen and Armadillo leverages the extra CPU core, because I did see all the CPU cores utilised briefly during benchmarking.

The transpose test involve shuffling memory around. This test is affected by the CPU’s caching mechanism. OpenCV does a good job as the matrix size increases.

The inversion test is a bit of a mixed bag. OpenCV seems to be the slowest out of the two.

The SVD test is interesting. Seems like there is a clear range where OpenCV and Armadillo are faster. Eigen lags behind by quite a bit as the matrix size increases.

Conclusion

In practice, if you just want a matrix library and nothing more then Armadillo or Eigen is probably the way to go. If you want something that is very portable with minimal effort then choose Eigen, because the entire library is header based, no library linking required. If you want the fastest matrix code possible then you can be adventurous and try combining the best of each library.

Download

test_matrix_lib.cpp

Code compiled with:

g++ test_matrix_lib.cpp -o test_matrix_lib -lopencv_core -larmadillo -lgomp -fopenmp \
-march=native -O3 -DARMA_NO_DEBUG -DNDEBUG

General updates

Re-did the tutorial on finding transformation between 3D points in the note section. Fixed some mistakes in the maths and explanation. Amazing how often I get that simple algorithm wrong.

Got around to playing with VisualSFM. Need to tweak it to use less memory. It appears to give good results and is very fast. If this program works out for my need it will probably make RunSFM obselete!

Next thing on my TODO list is to do a follow up post on the matrix library comparison benchmark with any debugging/error checking turned off to see how fast it can go.

OpenCV vs. Armadillo vs. Eigen on Linux

In this post I’ll be comparing 3 popular C++ matrix libraries found on Linux.

OpenCV is a large computer vision library with matrix support. Armadillo wraps around LAPACK. Eigen is an interesting library, all the implementation is in the C++ header, much like boost. So it is simple to link into, but takes more time compile.

The 5 matrix operations I’ll be focusing on are: add, multiply, transpose, inversion, SVD. These are the most common functions I use. All the libraries are open source and run on a variety of platforms but I’ll just be comparing them on Ubuntu Linux.

Each of the 5 operations were tested on randomly generated matrices of different size NxN with the average running time recorded.

I was tossing up whether to use a bar chart to display the result but the results span over a very large interval. A log graph would show all the data easily but make numerical comparisons harder. So in the end I opted to show the raw data plus a normalised version to compare relative speed ups. Values highlight in red indicate the best results.

Add

Performing C = A + B

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00098 0.00003 0.00002
8×8 0.00034 0.00006 0.00017
16×16 0.00048 0.00029 0.00077
32×32 0.00142 0.00208 0.00185
64×64 0.00667 0.00647 0.00688
128×128 0.02190 0.02776 0.03318
256×256 0.23900 0.27900 0.30400
512×512 1.04700 1.17600 1.33900

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 30.53x 44.41x
8×8 1.00x 5.56x 2.02x
16×16 1.62x 2.66x 1.00x
32×32 1.46x 1.00x 1.12x
64×64 1.03x 1.06x 1.00x
128×128 1.52x 1.20x 1.00x
256×256 1.27x 1.09x 1.00x
512×512 1.28x 1.14x 1.00x

The average running time for all 3 libraries are very similar so I would say there is no clear winner here. In the 4×4 case where OpenCV is much slower it might be due to overhead in error checking.


Multiply

Performing C = A * B

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00104 0.00007 0.00030
8×8 0.00070 0.00080 0.00268
16×16 0.00402 0.00271 0.00772
32×32 0.02059 0.02104 0.02527
64×64 0.14835 0.18493 0.06987
128×128 1.83967 1.10590 0.60047
256×256 15.54500 9.18000 2.65200
512×512 133.32800 35.43100 21.53300

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 16.03x 3.52x
8×8 3.84x 3.35x 1.00x
16×16 1.92x 2.84x 1.00x
32×32 1.23x 1.20x 1.00x
64×64 1.25x 1.00x 2.65x
128×128 1.00x 1.66x 3.06x
256×256 1.00x 1.69x 5.86x
512×512 1.00x 3.76x 6.19x

Average running time for all 3 are similar up to 64×64, where Eigen comes out as the clear winner.


Transpose

Performing C = A^T.

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00029 0.00002 0.00002
8×8 0.00024 0.00007 0.00009
16×16 0.00034 0.00019 0.00028
32×32 0.00071 0.00088 0.00111
64×64 0.00458 0.00591 0.00573
128×128 0.01636 0.13390 0.04576
256×256 0.12200 0.77400 0.32400
512×512 0.68700 3.44700 1.17600

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 17.00x 12.57x
8×8 1.00x 3.45x 2.82x
16×16 1.00x 1.81x 1.20x
32×32 1.56x 1.26x 1.00x
64×64 1.29x 1.00x 1.03x
128×128 8.18x 1.00x 2.93x
256×256 6.34x 1.00x 2.39x
512×512 5.02x 1.00x 2.93x

Comparable running time up to 64×64, after which OpenCV is the winner by quite a bit. Some clever memory manipulation?


Inversion

Performing C = A^-1

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00189 0.00018 0.00090
8×8 0.00198 0.00414 0.00271
16×16 0.01118 0.01315 0.01149
32×32 0.06602 0.05445 0.05464
64×64 0.42008 0.32378 0.30324
128×128 3.67776 4.52664 2.35105
256×256 35.45200 16.41900 17.12700
512×512 302.33500 122.48600 97.62200

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 1.00x 10.22x 2.09x
8×8 2.09x 1.00x 1.53x
16×16 1.18x 1.00x 1.15x
32×32 1.00x 1.21x 1.21x
64×64 1.00x 1.30x 1.39x
128×128 1.23x 1.00x 1.93x
256×256 1.00x 2.16x 2.07x
512×512 1.00x 2.47x 3.10x

Some mix results up until 128×128, where Eigen appears to be better choice.


SVD

Performing [U,S,V] = SVD(A)

Raw data

Results in ms OpenCV Armadillo Eigen
4×4 0.00815 0.01752 0.00544
8×8 0.01498 0.05514 0.03522
16×16 0.08335 0.17098 0.21254
32×32 0.53363 0.73960 1.21068
64×64 3.51651 3.37326 6.89069
128×128 25.86869 24.34282 71.48941
256×256 293.54300 226.95800 722.12400
512×512 1823.72100 1595.14500 7747.46800

Normalised

Speed up over slowest OpenCV Armadillo Eigen
4×4 2.15x 1.00x 3.22x
8×8 3.68x 1.00x 1.57x
16×16 2.55x 1.24x 1.00x
32×32 2.27x 1.64x 1.00x
64×64 1.96x 2.04x 1.00x
128×128 2.76x 2.94x 1.00x
256×256 2.46x 3.18x 1.00x
512×512 4.25x 4.86x 1.00x

Looks like OpenCV and Armadillo are the winners, depending on the size of the matrix.

Discussion

With mix results left, right and centre it is hard to come to any definite conclusion. The benchmark itself is very simple. I only focused on square matrices  of power of two, comparing execution speed, not accuracy, which is important for SVD.

What’s interesting from the benchmark is the clear difference in speed for some of the operations depending on the matrix size. Since the margins can be large it can have a noticeable impact on your application’s running time. It would be pretty cool if there was a matrix library that could switch between different algorithms depending on the size/operation requested, fine tuned to the machine it is running on. Sort of like what Atlas/Blas does.

So which library is faster? I have no idea, try them all for your application and see 🙂

Download

Here is the code used to generate the benchmark:  test_matrix_lib.cpp

Compiled with:

g++ test_matrix_lib.cpp -o test_matrix_lib -lopencv_core -larmadillo -lgomp -fopenmp -march=native -O3