Use CPython to speed up Python execution
Python is easy to code. But it may not be the most efficient one that runs on a CPU.
CPython can drastically improve the execution speed. In a project of mine, it runs at least four times as fast as before.
Python is slow because it runs on an interpreter. Line by line, the Python interpreter translates in real time the python code into machine code and run it.
CPython changes the situation. It converts the Python code you write into a C code, compiles it into a library, and now one can import it back into Python just like a normal python package.
In the example below, let me share how to use CPython. The example is one from the source code shared by a JAR paper by Brown, Crowley, and Elliott (2020). I slightly updated their approach to fit my needs (or my laziness).
Brown, Nerissa C., Richard M. Crowley, and W. Brooke Elliott. “What are you saying? Using topic to detect financial misreporting.” Journal of Accounting Research 58, no. 1 (2020): 237-291.
Using CPython Step by Step
Suppose you have a function parse() to run, which is written in Parse.py
Do the following:
- Create a new file called Parse_Setup.py with the following content:
# cython: language_level=3
# The above comment is for CPython to read. DO NOT REMOVE.
from distutils.core import setup
from Cython.Build import cythonize
import os, shutil
#Provide the file name of your original Python code:
python_file = "Parse.py"
#Automatically copy it to make a .pyx file, and set up if os.path.isfile(python_file+"x"): os.remove(python_file+"x") shutil.copy(python_file, python_file+"x") setup( ext_modules = cythonize(python_file+"x") )
- Run it in Terminal:
python Parse_Setup.py build_ext --inplace
- Create a new file called Parse_Run.py with the following content:
from Parse import parse
parse()
- Run it. It now flies like an aircraft.
python Parse_Run.py
Further thoughts…
Here (URL) someone compares the speed of pure python with numpy and with Cython. It seems that Numpy are x31 as fast, Native Cython is x24 as fast… Noting that Numpy is also running C in the backend, this is reasonable.
I tried a minimal example of using the Monte Carlo method to estimate \PI, which uses numpy.random, and the result is…. disappointing. CPython only makes it slowlier…
real time | 2 core user time | |
Python + Numpy | 2m37s | 5m10s |
Python + Numpy + CPython | 2m41s | 5m13s |
I may need to build a minimal example to compare the performance with and without CPython for RegEx, the only thing whose speed bothers me.