Knocking at the door of Machine Learning

Machine Learning sounds like a cool thing. I have wanted to learn it for some while, but with little progress. Perhaps a lack of motivation is a problem.

Today I have a task at hand that may need to use it. I guess learning by doing could solve my lack of motivation issue.

Here are a few things I would like to check:

  • A paper about using Supervised learning to identify paragraph titles:

This paper is so cool. It compares several classifiers, their timing, and precision in doing the classification. And found that Decision Tree is among the best in precision.

Another way this paper is very cool with is, its raw input is PDF, and it converts it to HTML with formats, and then uses HTML format tag as input for classifying!

  • What is Decision Tree?

A minimal example

from sklearn import tree
clf = tree.DecisionTreeClassifier()

X = [[0, 0], [1, 1]]
Y = [0, 1]

clf =, Y)

clf.predict([ [2., 2.], [-0.1, -0.1], [1,2]])

It runs. The result makes sense too: array([1, 0, 1])