Sections in this Chapter:

- Write & run scripts
- Debugging
- Benchmarking
- Vectorization
- Embarrassingly parallelism in R/Python
- Evaluation strategy
- Speed up with C/C++ in R/Python
- A first impression of functional programming
- Miscellaneous

In the first Chapter, we are coding within the interactive mode of R/Python. When we are working on the real world projects, using an Integrated development environment (IDE) is a more pragmatic choice. There are not many choices for R IDE, and among them RStudio^{1} is the best one I have used so far. As for Python, I would recommend either Visual Studio Code^{2} or PyCharm^{3}. But of course, you could use any text editor to write R/Python scripts.

Let’s write a simple script to print `Hello World!`

in R/Python. I have made a directory `chapter2`

on my disk, the R script is saved as `hello_world.R`

and the Python script is saved as `hello_world.py`

, inside the directory.

chapter2/hello_world.R

chapter2/hello_world.py

There are a few ways to run the R script. For example, we can run the script from the console with the `r -f filename`

command. Also, we can open the R interactive session and use the `source()`

function. I would recommend the second approach with `source()`

function. As for the Python script, we can run it from the console.

Debugging is one of the most important aspects of programming. What is debugging in programming? The programs we write might include errors/bugs and debugging is a step-by-step process to find and remove the errors/bugs in order to get the desired results.

If you are smart enough or the bugs are evident enough then you can debug the program on your mind without using a computer at all. But in general we need some tools/techniques to help us with debugging.

Most of programming languages provide the functionality of printing, which is a natural way of debugging. By trying to place `print`

statements at different positions we may finally catch the bugs. When I use `print`

to debug, it’s feeling like playing the game of minesweeper. In Python, there is a module called `logging`

^{4} which could be used for debugging like the `print`

function, but in a more elegant fashion.

In R, there is a function `browser()`

which interrupts the execution and allows the inspection of the current environment. Similarly, there is a module `pdb`

in Python that provides more debugging features. We would only focus on the basic usages of `browser()`

and the `set_trace()`

function in `pdb`

module.

The essential difference between debugging using `print()`

and `browser()`

and `set_trace()`

is that the latter functions allows us to debug in an interactive mode.

Let’s write a function which takes a sorted vector/list `v`

and a target value `x`

as input and returns the leftmost index `pos`

of the sorted vector/list so that `v[pos]>=x`

. Since `v`

is already sorted, we may simply loop through it from left to right to find `pos`

.

chapter2/find_pos.R

chapter2/find_pos.py

Now let’s run these two scripts.

When `x=11`

, the function returns `NULL`

in R and `None`

in Python because there is no such element in `v`

larger than `x`

.

The implementation above is trivial, but not efficient. If you have some background in data structures and algorithms, you probably know this question can be solved by binary search. The essential idea of binary search comes from divide-and-conquer^{5}. Since `v`

is already sorted, we may divide it into two partitions by cutting it from the middle, and then we get the left partition and the right partition. `v`

is sorted implies that both the left partition and the right partition are also sorted. If the target value `x`

is larger than the rightmost element in the left partition, we can just discard the left partition and search `x`

within the right partition. Otherwise, we can discard the right partition and search `x`

within the left partition. Once we have determined which partition to search, we may apply the idea recursively so that in each step we reduce the size of `v`

by half. If the length of `v`

is denoted as `n`

, in terms of big O notation^{6}, the run time complexity of binary search is $\mathcal{O}(\log{}n)$, compared with $\mathcal{O}(n)$ of the for-loop implementation.

The code below implements the binary search solution to our question (It is more intuitive to do it with recursion but here I write it with iteration since tail recursion optimization^{7} in R/Python is not supported).

chapter2/find_binary_search_buggy.R

chapter2/find_binary_search_buggy.py

Now let’s run these two binary search scripts.

The binary search solutions don’t work as expected when `x=11`

. We write two new scripts.

chapter2/find_binary_search_buggy_debug.R

chapter2/find_binary_search_buggy_debug.py

Let’s try to debug the programs with the help of `browser()`

and `set_trace()`

.

In the R code snippet above, we placed the `browser()`

function on the top of the function `binary_search_buggy`

. Then when we call the function we enter into the debugging environment. By calling `ls()`

we see all variables in the current debugging scope, i.e., `v,x`

. Typing `n`

will evaluate the next statement. After typing `n`

a few times, we finally exit from the while loop because `start = 4`

such that `start < end`

is FALSE. As a result, the function just returns the value of start, i.e., 4. To exit from the debugging environment, we can type `Q`

; to continue the execution we can type `c`

.

The root cause is that we didn’t deal with the corner case when the target value `x`

is larger than the last/largest element in `v`

correctly.

Let’s debug the Python function using `pdb`

module.

Similar to R, command `n`

would evaluate the next statement in `pdb`

. Typing command `l`

would show the current line of current execution. Command `b line_number`

would set the corresponding line as a break point; and `c`

would continue the execution until the next breakpoint (if exists).

In R, besides the `browser()`

function there are a pair of functions `debug() and undebug()`

which are also very handy when we try to debug a function; especially when the function is wrapped in a package. More specifically, the `debug`

function would invoke the debugging environment whenever we call the function to debug. See the example below how we invoke the debugging environment for the `sd`

function (standard deviation calculation).

The binary_search solutions are fixed below.

chapter2/find_binary_search.R

chapter2/find_binary_search.py

By benchmarking, I mean measuring the entire operation time of a piece of program. There is another term called profiling which is related to benchmarking. But profiling is more complex since it commonly aims at understanding the behavior of the program and optimizing the program in terms of time elapsed during the operation.

In R, I like using the `microbenchmark`

package. And in Python, `timeit`

module is a good tool to use when we want to benchmark a small bits of Python code.

As mentioned before, the run time complexity of binary search is better than that of a for-loop search. We can do benchmarking to compare the two algorithms.

chapter2/benchmark.R

In the R code above, `times=1000`

means we want to call the function `1000`

times in the benchmarking process. The `sample()`

function is used to draw samples from a set of elements. Specifically, we pass the argument `1`

to `sample()`

to draw a single element. It’s the first time we use `set.seed()`

function in this book. In R/Python, we draw random numbers based on the pseudorandom number generator (PRNG) algorithm^{8}. The sequence of numbers generated by PRNG is completed determined by an initial value, i.e., the seed. Whenever a program involves the usage of PRNG, it is better to set the seed in order to get replicable results (see the example below).

Now let’s run the R script to see the benchmarking result.

The binary_search solution is much more efficient based on the benchmarking result.

Doing the same benchmarking in Python is a bit of complicated.

chapter2/benchmark.py

The most interesting part of the Python code above is `from __main__ import`

. Let’s ignore it for now, and we would revisit it later.

Below is the benchmarking result in Python (the unit is second).

In parallel computing, automatic vectorization^{9} means a program in a scalar implementation is converted to a vector implementation which process multiple pairs of operands simultaneously by compilers that feature auto-vectorization. For example, let’s calculate the element-wise sum of two arrays `x`

and `y`

of the same length in C programming language.

The C code above might be vectorized by the compiler so that the actual number of iterations performed could be less than 4. If 4 pairs of operands are processed at once, there would be only 1 iteration. Automatic vectorization may make the program runs much faster in some languages like C. However, when we talk about vectorization in R/Python, it is different from automatic vectorization. Vectorization in R/Python usually refers to the human effort paid to avoid for-loops. First, let’s see some examples of how for-loops may slow your programs in R/Python.

chapter2/vectorization_1.R

Running the R code results in the following result on my local machine.

Please note that in this Python example we are using the `random`

submodule of `numpy`

module instead of the built-in `random`

module since `random`

module doesn’t provide the vectorized version of random number generation function. Running the Python code results in the following result on my local machine.

Also, the `timeit.timeit`

measures the total time to run the main statements `number`

times, but not the average time per run.

In either R or Python, the vectorized version of random normal random variable (r.v.) is significantly faster than the scalar version. It is worth noting the usage of the `print(f'')`

statement in the Python code, which is different from the way how we print the object of `Complex`

class in \autoref{ch:chp1. In the code above, we use the `f-string`

^{10} which is a literal string prefixed with `'f'`

containing expressions inside `\{\`

which would be replaced with their values. `f-string`

was a feature introduced since Python 3.6. If you are familiar with Scala, you may find that this feature is quite similar with the string interpolation mechanism introduced since Scala 2.10.

It’s also worth noting that lots of built-in functions in R are already vectorized, such as the basic arithmetic operators, comparison operators, `ifelse()`

, element-wise logical operators `&,|`

. But the logical operators `&&, ||`

are not vectorized.

In addition to vectorization, there are also some built-in functions which may help to avoid the usages of for-loops. For example, in R we might be able use the `apply`

family of functions to replace for-loops; and in Python the `map()`

function can also be useful. In the Python `pandas`

module, there are also many usages of `map/apply`

methods. But in general the usage of `apply/map`

functions has little or nothing to do with performance improvement. However, appropriate usages of such functions may help with the readability of the program. Compared with the `apply`

family of functions in R, I think the `do.call()`

function is more useful in practice. We would spend some time in `do.call()`

later.

Considering the importance of vectorization in scientific programming, let’s try to get more familiar with vectorization thorough the Biham–Middleton–Levine (BML) traffic model

^{11}. The BML model is very important in modern studies of traffic flow since it exhibits a sharp phase transition from free flowing status to a fully jammed status. A simplified BML model could be characterized as follows:

- Initialized on a 2-D lattice, each site of which is either empty or occupied by a colored particle (blue or red);
- Particles are distributed randomly through the initialization according to a uniform distribution; the two colors of particles are equally distributed.
- On even time steps, all blue particles attempt to move one site up and an attempt fails if the site to occupy is not empty;
- On Odd time steps, all red particles attempt to move one site right and an attempt fails if the site to occupy is not empty;
- The lattice is assumed periodic which means when a particle moves out of the lattice, it would move into the lattice from the opposite side.

The BML model specified above is implemented in both R/Python as follows to illustrate the usage of vectorization.

chapter2/BML.R

Now we can create a simple BML system on a $5\times5$ lattice using the R code above.

In the initialization step, we used the inverse transform sampling approach^{12} to generate the status of each site. Inverse transform sampling method is basic but powerful approach to generate r.v. from any probability distribution given its cumulative distribution function (CDF). Reading the wiki page is enough to master this sampling method.

The Python implementation is also given.

Please note that although we have imported `numpy`

in BML.py, we import it again in the code above in order to set the random seed. If we change the line to `from BML import *`

, we don’t need to import `numpy`

again. But it is not recommended to `import *`

from a module.

According to the explanation of wikipedia^{13}, single-threading is the processing of one command at a time, and its opposite is multithreading. A process is the instance of a computer program executed by one or many threads^{14}. Multithreading is not the same as parallelism. In a single processor environment, the processor can switch execution between threads, which results in concurrent execution. However, it is possible a process with multithreads runs on on a multiprocessor environment and each of the threads on a separate processor, which results in parallel execution.

Both R and Python are single-threaded. In Python, there is a `threading`

package^{15}, which supports multithreading on a single core. It may suit some specific tasks. For example, in web scraping multithreading on a single core may be able to speed up the program if the download time is in excess of the CPU processing time.

Now let’s talk about embarrassingly parallelism by multi-processing. Embarrassingly parallel problem is one where little or no effort is needed to separate the problem into a number of parallel tasks^{16}. In R there are various packages supporting multi-processing on multiple CPU cores, for example, the `parallel`

package, which is my favorite one. In Python, there are also some available modules, such as `multiprocessing`

, `joblib`

and `concurrent.futures`

. Let’s see an application of embarrassingly parallelism to calculate $\pi$ using Monte Carlo simulation^{17}.

Monte Carlo simulation provides a simple and straightforward way to estimate $\pi$. We know the area of a circle with radius 1 is just $\pi$. Thus, we can convert the original problem of $\pi$ calculation to a new problem, i.e., how to calculate the area of a circle with radius 1. We also know the area of a square with side length 2 is equal to 4. Thus, $\pi$ can be calculated as 4$r_{c/s}$ where $r_{c/s}$ denotes the ratio of areas of a circle with radius 1 and a square with side length 2. Now the problem is how to calculate the ratio $r_{c/s}$? When we randomly throw $n$ points into the square and $m$ of these pionts fall into the inscribed circle, then we can estimate the ratio as $m/n$. As a result, a natural estimate of $\pi$ is $4m/n$. This problem is an embarrassingly parallel problem by its nature. Let’s see how we implement the idea in R/Python.

chapter2/pi.R

In the above R code snippet, we use the function `mclapply`

which is not currently available on some operation systems^{18}. When it is not available, we may consider to use `parLapply`

instead.

chapter2/pi.py

In the above Python code snippet, we defined a function `generate_points_parallel`

which returns a list of number of inside points. But the problem of this function is that each process in the pool shares the same random state. As a result, we will obtain the a list of duplicated numbers by calling this function. To overcome the issue, we defined another function `generate_points_parallel_set_seed`

. Let’s see the results of our estimation for $\pi$ running on a laptop.

We see the winner in this example is vectorization, and the parallel solution is better than the naive solution. However, when the problem cannot be vectorized we may use parallelization to achieve better performance.

R is a lazy programming language since it uses lazy evaluation by default^{19}. Lazy evaluation strategy delays the evaluation of an expression until its value is needed. When an expression in R is evaluated, it follows an outermost reduction order in general. But Python is not a lazy programming language and the expression in Python follows an innermost reduction order.

The code snippets below illustrate the innermost reduction vs. outermost reduction.

Because of the outermost reduction order, the R code snippet evaluate the function `power`

first and since if second argument is 0 the first argument is not required to evaluate. Thus, the function call returns 1. But the Python code snippet first evaluates the function call of `divide`

and an exception is raised because of division by zero.

Although Python is an eager language, it is possible to simulate the lazy evaluation behavior. For example, the code inside a generator^{20} is evaluated when the generator is consumed but not evaluated when it is created. We can create a sequence of Fibonacci numbers with the help of `generator`

.

In the code snippet above, we create a generator which generates the sequence of Fibonacci numbers less than 10. When the generator is created, the sequence is not generated immediately. When we consume the generator in a loop, each element is then generated as needed. It is also common to consume the generator with the `next`

function.

The purpose of this section is not to teach C/C++. If you know how to write C/C++ then it may help to improve the performance of R/Python program.

It is recommended to use vectorization technique to speed up your program whenever possible. But what if it’s infeasible to vectorize the algorithm? One potential option is to use C/C++ in the R/Python code. According to my limited experience, it is much easier to use C/C++ in R with the help of `Rcpp`

package than in Python. In Python, more options are available but not as straightforward as the solution of `Rcpp`

. In this section, let’s use `Cython`

^{21}. Actually, `Cython`

itself is a programming language written in Python and C. In `Rcpp`

we can write C++ code directly and use it in native R code. With the help of `Cython`

, we can write python-like code which is able to be compiled to C or C++ and automatically wrapped into python-importable modules. `Cython`

could also wrap independent C or C++ code into python-importable modules.

Let’s see how to calculate Fibonacci numbers with these tools. The first step is to install `Rcpp`

package in R and the `Cython`

module (use `pip`

) in Python. Once they are installed, we can write the code in C++ directly for R. As for `Cython`

, we write the python-like code which would be compiled to C/C++ later.

chapter2/Fibonacci.cpp

chapter2/Fibonacci.pyx

It’s worth noting the extension of the `Cython`

file above is `pyx`

, not `py`

. And in `Fibonacci.pyx`

, the function we defined is quite following the native Python syntax. But in `Cython`

we can add static typing declarations which are often helpful to improve the performance, although not mandatory. The keyword `cdef`

makes the function `Fibonacci_c`

invisible to Python and thus we have to define the `Fibonacci_static`

function as a wrapper which can be imported into Python. The function `Fibonacci`

is not static typed for benchmarking.

Let’s also implement the same functions in native R/Python for benchmarking purpose.

chapter2/Fibonacci.R

chapter2/Fibonacci.py

Now let’s compare the performance of different implementations.

The native implementation in R is much slower the C++ implementation (see the difference of units).

The results show the static typed implementation is the fastest, and the native implementation in pure Python is the slowest. Again the time measured by `timeit.timeit`

is the total time to run the main statement 1000 times. The average time per run of `Fibonacci_static`

function is close to the average time per run of `Fibonacci`

in R.

Functional (programming) languages define programs as mathematical functions and treat them as first-class2]. In pure functional programming, a function takes input and returns output without making any side effect. A side effect means a state variable outside the function is modified. For example, printing the value of a variable inside a function is a side effect.

Both R and Python are not purely functional languages per se, although R behaves more functional than Python^{23}. If you wonder why the `return`

function could be ignored to return a value in an R function, now you get the explanation – a function should return the output automatically from a FP perspective.

I don’t think it makes any sense to debate on if we should choose OOP or FP using R/Python. And both languages support multiple programming paradigms (FP, OOP etc.). If you want to do purist FP, neither R nor Python would be the best choice. For example, purist FP should avoid loop because loop always involves an iteration counter whose value changes over iterations, which is obviously a side effect. Thus, recursion is very important in FP. But tail recursion optimization is not supported in R and Python although it can be implemented via trampoline^{24}.

Let’s introduce a few tools that could help to get a first impression of FP in R and Python.

We have seen anonymous functions before. Anonymous functions are commonly used in FP. To recap, let’s see how to define a useless anonymous function.

Usually, we define anonymous functions because we want to pass them to other functions. And thus, the functions defined above are useless as they are not passed to any other function and after the definition they are deleted automatically from the memory.

Of course, we can assign names to the anonymous functions – but what is the point to name anonymous functions?

Map could be used to avoid loops. The first argument to map is a function.

In R, we use `Map`

which is shipped with base R; and in Python we use `map`

which is also a built-in function.

There are a few things to note from the code snippets above. First, the returned value from `Map`

in R is a list rather than a vector. That is because `Map`

is just a wrapper of `mapply`

function in R, and the `SIMPLIFY`

argument is set to `FALSE`

. If we want to get a vector value returned, just use `mapply`

instead.

The returned value from `map`

in Python is a `map`

object which is an iterator. The `generator`

function we have seen before is also an iterator. To use an iterator, we could convert it to a list, or get the next value using the `next`

function, or just use a for loop.

Similar to `map`

, the first argument to `Filter`

in R and `filter`

in Python is also a function. This function is used to get the elements which satisfy a certain condition specified in the first argument. For example, we can use it to get the even numbers.

`Reduce`

behaves a bit different from `Map`

and `Filter`

. The first argument is a function with exactly two arguments. The second argument is a sequence, on each element of which the first function argument will be applied from left to right. And there is one optional argument called initializer. When the initializer is provided, it would be used as the first argument of the function argument of `Reduce`

. The examples below depict how it works.

Please note that in order to use `reduce`

in Python, we have to `import`

it from the `functools`

module.

Utilizing these functions flexibly may help to make the code more concise, at the cost of readability.

We have introduced the basics of R/Python programming so far. There are much more to learn to become an advanced user of R/Python. For example, the appropriate usages of `iterator, generator, decorator`

could improve both the conciseness and readability of your Python code. The `generator`

^{25} is commonly seen in machine learning programs to prepare training/testing samples. `decorator`

is a kind of syntactic sugar to allow the modification of a function’s behavior in a simple way. In R there are no built-in `iterator, generator, decorator`

, but you may find some third-party libraries to mimic these features; or you may try to implement your own.

One advantage of Python over R is that there are some built-in modules containing high-performance data structures or commonly-used algorithms implemented efficiently. For example, I enjoy using the `deque`

structure in the Python `collections`

module^{26}, but there is no built-in counterpart in R. We have written our own binary search algorithm earlier in this Chapter, which can also be replaced by the functions in the built-in module `bisect`

^{27} in Python.

Another important aspect of programming is testing. Unit testing is a typical software testing method that is commonly adopted in practice. In R there are two third-party packages `testthat`

and `RUnit`

. In Python, the built-in `unittest`

is quite powerful. Also, the third-party module `pytest`

^{28} is very popular.

^{1} https://www.rstudio.com/products/rstudio

^{2} https://code.visualstudio.com/

^{3} https://www.jetbrains.com/pycharm

^{4} https://docs.python.org/3/library/logging.html

^{5} https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm

^{6} https://en.wikipedia.org/wiki/Big_O_notation

^{7} https://en.wikipedia.org/wiki/Tail_call

^{8} https://en.wikipedia.org/wiki/Pseudorandom_number_generator

^{9} https://en.wikipedia.org/wiki/Automatic_vectorization

^{10} https://www.python.org/dev/peps/pep-0498

^{11} https://en.wikipedia.org/wiki/Biham-Middleton-Levine_traffic_model

^{12} https://en.wikipedia.org/wiki/Inverse_transform_sampling

^{13} https://en.wikipedia.org/wiki/Thread_(computing)

^{14} https://en.wikipedia.org/wiki/Process_(computing)

^{15} https://docs.python.org/3.7/library/threading.html

^{16} https://en.wikipedia.org/wiki/Embarrassingly_parallel

^{17} https://en.wikipedia.org/wiki/Monte_Carlo_method

^{18} https://stat.ethz.ch/R-manual/R-devel/library/parallel/doc/parallel.pdf

^{19} Floréal Morandat, Brandon Hill, Leo Osvald, and Jan Vitek. Evaluating the design of the r language. In European Conference on Object-Oriented Programming, pages 104–131. Springer, 2012.

^{20} https://www.python.org/dev/peps/pep-0255

^{21} https://en.wikipedia.org/wiki/Cython

^{22} https://en.wikipedia.org/wiki/List_of_programming_languages_by_type#Functional_languages

^{23} John M Chambers et al. Object-oriented programming, functional programming and r. Statistical Science, 29(2):167–180, 2014.

^{24} https://en.wikipedia.org/wiki/Tail_call#Through_trampolining

^{25} https://docs.python.org/3/howto/functional.html

^{26} https://docs.python.org/3/library/collections.html

^{27} https://docs.python.org/3.7/library/bisect.html

^{28} https://docs.pytest.org/en/latest