Making procrastination into good use…

Trying to learn some non-native languages. (Yes, all these three are not my mother tone).

  • Die Bilanz bildet den Vermögens- sowie Schuldenstatus eines Unternehmens zu einem bestimmten Stichtag ab.
  • The balance sheet shows the asset and liability status of a company on a specific reporting date.
  • 资产负债表显示公司在特定报告日期的资产和负债状况。

Knocking at the door of Machine Learning

Machine Learning sounds like a cool thing. I have wanted to learn it for some while, but with little progress. Perhaps a lack of motivation is a problem.

Today I have a task at hand that may need to use it. I guess learning by doing could solve my lack of motivation issue.

Here are a few things I would like to check:

  • A paper about using Supervised learning to identify paragraph titles:

https://www.groundai.com/project/a-supervised-learning-approach-for-heading-detection/1

This paper is so cool. It compares several classifiers, their timing, and precision in doing the classification. And found that Decision Tree is among the best in precision.

Another way this paper is very cool with is, its raw input is PDF, and it converts it to HTML with formats, and then uses HTML format tag as input for classifying!

  • What is Decision Tree?

https://www.datacamp.com/community/tutorials/decision-tree-classification-python

https://scikit-learn.org/stable/modules/tree.html

A minimal example

from sklearn import tree
clf = tree.DecisionTreeClassifier()


X = [[0, 0], [1, 1]]
Y = [0, 1]

clf = clf.fit(X, Y)

clf.predict([ [2., 2.], [-0.1, -0.1], [1,2]])

It runs. The result makes sense too: array([1, 0, 1])