Friday, November 25, 2016

Autonomous Trading: Language Wars

In my 7th blog post, I talked about a man named William Mok who created an algorithmic trading system unlike any other (or so he claimed). But today, I want to talk more about how a beginner would approach picking a platform for actual implementation. With a plethora of languages to choose like the cult classics (Python, R, Matlab, SAS) or newer languages with a lot of potential (Julia, Scala, OCaml), it can be difficult to pick a platform for both analysis and implementation. It is first important to think about what part of the trading development process we are concerned with (Data Pipeline vs. implementation/brokerage API).

Today the focus will be on the data pipeline; how are we going to bring in, clean and analyse data? Languages such as R, Python and Julia are useful to bring data for analysis. R also has many libraries specifically made for finance such as quantmod (used for data cleaning, financial forecasting, and plotting tools) or quantlib (option pricing functions).

The issue with many of these languages (R included) is its scalability. When dealing with commercial use of trading algorithms (think billion dollar quantitative hedge funds and designated financial market makers), these algorithms do not perform with the necessary latency as is the problem with many higher level languages. One reason why this is the case is due their garbage collector features (a feature many programming languages have that automatically manages memory no longer used by a program). The solution to the low latency problem would be to use C++ instead but it is by no means an easy language to learn and even harder for rapid prototype building (important when developing financial systems).

My personal favourite right now is also gaining traction as the data analysis language to use. Julia is a relatively new language (first appearing only 4 years ago) but claims to combine both the speed of C++ with the data analysis capabilities of popular languages such as R or Python. One reason why this is possible is because Julia's core is implemented in C/C++. Below is a picture of its relative performance against other languages (where C's speed is benchmarked at a value of 1) where smaller is better.


FortranJuliaPythonRMatlabOctaveMathe-maticaJavaScriptGoLuaJITJava
gcc 5.1.10.4.03.4.33.2.2R2015b4.0.010.2.0V8 3.28.71.19go1.5gsl-shell 2.3.11.8.0_45
fib0.702.1177.76533.5226.899324.35118.533.361.861.711.21
parse_int5.051.4517.0245.73802.529581.4415.026.061.205.773.35
quicksort1.311.1532.89264.544.921866.0143.232.701.292.032.60
mandel0.810.7915.3253.167.58451.815.130.661.110.671.35
pi_sum1.001.0021.999.561.00299.311.691.011.001.001.00
rand_mat_stat1.451.6617.9314.5614.5230.935.952.302.963.273.92
rand_mat_mul3.481.021.141.571.121.121.3015.071.421.162.36
Figure: benchmark times relative to C (smaller is better, C performance = 1.0).

More details regarding the specifics of these performance tests can be found in my image reference.

Writing References:
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html

Image References:
http://julialang.org





No comments:

Post a Comment