Nowadays, we have a lot of cores in the CPU. But we cannot make any good use of it unless we program in multicore.
Say we have thousands of SEC filings to process every year; we write a function to collect data from it. For most filings, it works just fine. But for a small group of filings (8 out of 8000+ in 2005, for my data set, as an example), it runs with no end. 10 hours later, the process for this filing is still stuck somewhere. As Obama once famously (and maybe unfairly) said, “Eight is enough.” (It was unfair, according to McCain, because Obama made Bush a pet name for him.) We can put a timeout exception to end it.
Let’s first make a troublesome function, one that sometimes runs overtime.
from time import sleep
def f(x):
""" x can be any integer.
But when x is a even number, it takes x seconds to finish;
when x is an odd number, it takes one second.
When x=2, it causes an error/exception.
"""
sl = x if (-1)**x >0 else 1
print("Start running with x={} and sleep={}".format(x,sl))
sleep(sl)
try:
print(" finished with x={} and sleep={}, result={}".format(x, sl, 1 / (x-2)))
return x
except Exception as e:
print('\n\nCaught exception {} in worker thread (x = {:d}):'.format(e, x))
return None
And, let’s make use of a 2-core CPU by using Python’s multiprocessing. Run f(x) parallelly, and collect results as a list. And every child-process is allowed to run for at most 5 seconds. In the end, we collect the result and do some analysis (that’s not the job of this post, though).
import multiprocessing
if __name__ == '__main__':
with multiprocessing.Pool(2) as pool:
async_results = [pool.apply_async(f, (i,)) for i in range(20)]
results_collection=[]
for async_res in async_results:
try:
this_res = async_res.get(timeout=5)
results_collection.append(this_res)
except Exception as e:
print("Exception: {}".format(e))
results_collection.append(None)
pass
print(results_collection)
#Removing unsuccessful ones
results_collection = [r for r in results_collection if r !=None]
print(results_collection)
The first line of output has 20 elements. The return value of f(x) for x in range(20). Some are None (x=2, 8, 10, 12, 14, 16, 18) because it either had an error or was timed out.
And the final result is ready for analysis next step: [0, 1, 3, 4, 5, 6, 7, 9, 11, 13, 15, 17, 19]
How it runs live:
The code in GitHub:
The code looks too simple. Why would I bother writing a post in my blog? Well, to me, yesterday it was not this simple. I tried a few other ways. They didn’t work. The worst case was, my workstation was even frozen, and I cannot run any command even in bash. There was an error message from the OS: “-bash: fork: retry: Resource temporarily unavailable.” And I googled a way out… “exec killall python3”
Machine Learning sounds like a cool thing. I have wanted to learn it for some while, but with little progress. Perhaps a lack of motivation is a problem.
Today I have a task at hand that may need to use it. I guess learning by doing could solve my lack of motivation issue.
Here are a few things I would like to check:
A paper about using Supervised learning to identify paragraph titles:
This paper is so cool. It compares several classifiers, their timing, and precision in doing the classification. And found that Decision Tree is among the best in precision.
Another way this paper is very cool with is, its raw input is PDF, and it converts it to HTML with formats, and then uses HTML format tag as input for classifying!
Python is easy to code. But it may not be the most efficient one that runs on a CPU. CPython can drastically improve the execution speed. In a project of mine, it runs at least four times as fast as before.
Python is slow because it runs on an interpreter. Line by line, the Python interpreter translates in real time the python code into machine code and run it.
CPython changes the situation. It converts the Python code you write into a C code, compiles it into a library, and now one can import it back into Python just like a normal python package.
In the example below, let me share how to use CPython. The example is one from the source code shared by a JAR paper by Brown, Crowley, and Elliott (2020). I slightly updated their approach to fit my needs (or my laziness).
Brown, Nerissa C., Richard M. Crowley, and W. Brooke Elliott. “What are you saying? Using topic to detect financial misreporting.” Journal of Accounting Research 58, no. 1 (2020): 237-291.
Using CPython Step by Step
Suppose you have a function parse() to run, which is written in Parse.py
Do the following:
Create a new file called Parse_Setup.py with the following content:
# cython: language_level=3
# The above comment is for CPython to read. DO NOT REMOVE.
from distutils.core import setup
from Cython.Build import cythonize
import os, shutil
#Provide the file name of your original Python code:
python_file = "Parse.py"
#Automatically copy it to make a .pyx file, and set up if os.path.isfile(python_file+"x"): os.remove(python_file+"x") shutil.copy(python_file, python_file+"x") setup( ext_modules = cythonize(python_file+"x") )
Run it in Terminal:
pythonParse_Setup.pybuild_ext--inplace
Create a new file called Parse_Run.py with the following content:
fromParseimportparseparse()
Run it. It now flies like an aircraft.
pythonParse_Run.py
Further thoughts…
Here (URL) someone compares the speed of pure python with numpy and with Cython. It seems that Numpy are x31 as fast, Native Cython is x24 as fast… Noting that Numpy is also running C in the backend, this is reasonable.
I tried a minimal example of using the Monte Carlo method to estimate \PI, which uses numpy.random, and the result is…. disappointing. CPython only makes it slowlier…
real time
2 core user time
Python + Numpy
2m37s
5m10s
Python + Numpy + CPython
2m41s
5m13s
I may need to build a minimal example to compare the performance with and without CPython for RegEx, the only thing whose speed bothers me.
I’m a student from your class TAX 101 and I’m preparing for the second exam. I wish to ask a few questions about the case Apple’s Tax Expenses. From the income statement of Apple’s annual report 2017, we know that Apple has Income before provision for income taxes 64,089 and provision for income taxes 15,738, from which we get the tax rate of 24.6%. Is this the “Book ETR” you mentioned in the lecture?And, from the notes we see Apple defer 5,966 of the 15,738 provision for income taxes, leaving the current component only 9,772. Is this the actual tax payable? And is the corresponding tax rate 15.2% the so called “Cash ETR”?
I was trying to figure out what information we can extract out of this reconciliation table. Also, I come up with a paper Henlon (2003) who criticized a lot about this calculation and implied this is wrong and that is also wrong… She was our ZEW guest keynote speaker… I’m so confused and worried about the exam. Could you please help me, professor?
Hanlon, Michelle. “What can we infer about a firm’s taxable income from its financial statements?.” National Tax Journal(2003): 831-863.
Apple
2017
Income before provision for income taxes
64,089
Computed expected tax
22,431
35%
State tax
185
0%
Indefinitely invested earnings of foreign
subsidiaries
-6,135
-10%
Domestic production activities deduction
-209
0%
Research and Development credit
-678
-1%
other
144
0%
Provision for income taxes
15,738
24.6%
(1)
of which current
9,772
15.2%
(2)
of which deferred
5,966
9.3%
Best wishes,
Reeyarn
Reeyarn Zhiyang Li
University of Mannheim | Business School Schloss | 68131 Mannheim | Germany | Phone +49 (0) 621 ▮▮▮ ▮▮▮▮ Email: ▮▮▮▮@uni-mannheim.de
Dear Reeyarn,
Look at the Statement of Cash Flows and you find a line which reads
Yes. This is a classic Hello world! page generated by WordPress. And I am a big fan of Hello world! Anyone who took a course in any programming language must have learned writing a Hello World! code.
Yes. I think of myself as an amateur programmer. And I am also trying to become a data scientist. The world may be heading toward a future where human beings are no longer needed, and Artificial Intelligence could one day become the dominating intelligent beings. We are in a serious competition, fighting for the survival of human beings. Learn maths, learn programming, learn machine learning, and learn to survive.
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish.AcceptRead More
Privacy & Cookies Policy
Privacy Overview
This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.