Today the focus will be on the data pipeline; how are we going to bring in, clean and analyse data? Languages such as R, Python and Julia are useful to bring data for analysis. R also has many libraries specifically made for finance such as quantmod (used for data cleaning, financial forecasting, and plotting tools) or quantlib (option pricing functions).
The issue with many of these languages (R included) is its scalability. When dealing with commercial use of trading algorithms (think billion dollar quantitative hedge funds and designated financial market makers), these algorithms do not perform with the necessary latency as is the problem with many higher level languages. One reason why this is the case is due their garbage collector features (a feature many programming languages have that automatically manages memory no longer used by a program). The solution to the low latency problem would be to use C++ instead but it is by no means an easy language to learn and even harder for rapid prototype building (important when developing financial systems).
My personal favourite right now is also gaining traction as the data analysis language to use. Julia is a relatively new language (first appearing only 4 years ago) but claims to combine both the speed of C++ with the data analysis capabilities of popular languages such as R or Python. One reason why this is possible is because Julia's core is implemented in C/C++. Below is a picture of its relative performance against other languages (where C's speed is benchmarked at a value of 1) where smaller is better.
Fortran | Julia | Python | R | Matlab | Octave | Mathe-matica | JavaScript | Go | LuaJIT | Java | |
---|---|---|---|---|---|---|---|---|---|---|---|
gcc 5.1.1 | 0.4.0 | 3.4.3 | 3.2.2 | R2015b | 4.0.0 | 10.2.0 | V8 3.28.71.19 | go1.5 | gsl-shell 2.3.1 | 1.8.0_45 | |
fib | 0.70 | 2.11 | 77.76 | 533.52 | 26.89 | 9324.35 | 118.53 | 3.36 | 1.86 | 1.71 | 1.21 |
parse_int | 5.05 | 1.45 | 17.02 | 45.73 | 802.52 | 9581.44 | 15.02 | 6.06 | 1.20 | 5.77 | 3.35 |
quicksort | 1.31 | 1.15 | 32.89 | 264.54 | 4.92 | 1866.01 | 43.23 | 2.70 | 1.29 | 2.03 | 2.60 |
mandel | 0.81 | 0.79 | 15.32 | 53.16 | 7.58 | 451.81 | 5.13 | 0.66 | 1.11 | 0.67 | 1.35 |
pi_sum | 1.00 | 1.00 | 21.99 | 9.56 | 1.00 | 299.31 | 1.69 | 1.01 | 1.00 | 1.00 | 1.00 |
rand_mat_stat | 1.45 | 1.66 | 17.93 | 14.56 | 14.52 | 30.93 | 5.95 | 2.30 | 2.96 | 3.27 | 3.92 |
rand_mat_mul | 3.48 | 1.02 | 1.14 | 1.57 | 1.12 | 1.12 | 1.30 | 15.07 | 1.42 | 1.16 | 2.36 |
More details regarding the specifics of these performance tests can be found in my image reference.
Writing References:
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
Image References:
http://julialang.org