Home –  Uncategorized
Category Archives: Uncategorized

Getting started with Conditional Random Fields

This is the first of a series of post that I am going to write about Conditional Random Fields. I am recently following the excellent Coursera Specialization on Probabilistic Graphical Models (the videos for each course are freely accessible), and I found the topic really interesting. However, I felt that the time dedicated to Conditional Random Fields (CRF from now on) was decisively short, considering that this (and the evolution of this) model is been using in several applications nowadays: apart from the classic Part-of-speech tagging, it is used in phone recognition and gesture recognition (note that in this work Hidden Conditional Random Fields are used, a model which we will talk about in a future post)

Of course, this is not the only introduction on the internet that you can find on CRF: Sutton and McCallum wrote a excellent article (here), and another introduction can be found in Edwin Chen blog (here). However, Sutton and McCallum article appeared to me to be too hard for a novice that just started in the field; Edwin Chen blog assumes some knowledge of graphical models, and does not provide any code. Conversely, this gentle introduction will not assume any knowledge on probabilistic graphical model (but it does assume some basic of Probabilistic Analysis), and will provide some code for solving a simple problem that we all face: how can we tell if our cat is happy? By using Conditional Random Fields, of course!

I will use Python as a language of choice, with the nice pgmpy library. Installing pgmpy was completely painless, (just follow the instructions!). This guide is not going to be a formal introduction (for that you have many other sources you can dig in), but an informal, aimed more to convey the insight behind the model than to be mathematically precise.

So, what are Conditional Random Fields? 

Let's make some classification. Conditional Random Fields is a type of Markov Network. Markov Networks are models in which the connection between events are defined by a graphical structure, as shown in the next Figure.

Each node represents a random variable, and the edges between nodes represent dependency. With Markov Networks is very convenient to describe these dependencies using factors (). Factors are positive values describing the strength of associations between variables. I know what you are thinking: are these probabilities? Or conditional probabilities? Nope, nope, they are not*, and in fact they are not bounded from 0 to 1, but can take any positive value (in some cases they could be equivalent to conditional probabilities, but this is in general not true).

CRF are defined as discriminative model. These are models used when we have a set of unobserved random variables Y and a set of observation X, and we want to know the probability of Y given X. Quite often we want to ask a different question: what are the assignment to the Y variables that most likely generated the observations X? In particular, CRF are very useful when we want to do classification by taking into account neighboring samples. For example, let's say that we want to classify sections of a musical piece. You know that different sections are related, as there is a very strong chance that the exposition will come after the introduction. Well, CRF are perfect for doing that, as you can incorporate these knowledge in them. Not only! You can also incorporate arbitrary feature describing information that you know exists in the data. For example, you know that after a certain chord progression, you are quite likely going into the development section. Well, you can introduce that info as well in your model.

But for now, let's drop this music nonsense, and let's talk about something unquestionably more stringent:

Is your cat happy?

Everybody loves cats. They are cute, fluffy, and adorable. Not everybody knows how to tell if a cat is sad or happy. Well, you will soon be able to, with a little help from your friendly Conditional Random Fields!

Let's say that you have a very simple cat, Felix. He won't experience complex emotions such as anger, joy, or disgust. He will just be "happy" or "sad". You read on your favourite cats magazine that cats express contentedness with "purrs". However, some time, "purrs" may be expression of tensions. Felix, begin a simple cat, doesn't know what those words mean, and he uses "purrs" for expressing happiness or sadness.

We are going to observe Felix several times a day, and every time we are going to notice if he is purring or not. This will be our observation vector . We know that our observation can tell us something about Felix's internal state at each time (let's call this set ). So, if we observe that Felix is purring at this moment, we can more or less confidently say that Felix is happy. Pretty simple, right? Now is where it gets interesting.

The important concept about Conditional Random Fields is that we can also specify dependencies between unobserved states. For example, we may know that our Felix is a pretty relaxed cat: his emotional states are quite stable, and we know that if at some point in time he was happy (or sad), he will most likely be happy (or sad) at the next point in time. On the other hand, we may know that our Felix is an histrionic diva, changing his mind at the blink of an eye. We can model this relationship as well.

With this information, we can build our graph:

CRF_blog_cat

As explained, are the observation, and each is connected with Felix internal state () at time . The internal states are also connected in pair. In this and the following example we are going to observe Felix 3 times (). Notice that each state is only affected by the previous one (our Felix doesn't seem to care about his internal state long time ago, but he is only affected by what he was feeling recently). This simple network is called a linear CRF.

Now we also have to come up with some idea about the relationship between Felix's purring and his level of happiness. Let's say that we come up with this table:cat1

The values are in an arbitrary scale, and they indicate the "strength" of the relationship. Note that these are not probabilities. To emphasise that, I didn't scale those from 0 to 1 (but your are free to do so). From the table we can see that Felix can express his emotion more strongly when he is purring. If he is purring, we can be say quite surely that he is happy. If he is not purring, we can say that he is sad, but the strength of this belief is not so strong: we are convinced '25' [whatever scale that is] that he is sad, and '15' that he is happy. We will call this factor
Now we have to specify the transition probability: if Felix is happy now, what's the chance that he is going to be happy later on? For the time being, we want to ignore this dependency and just get everything to work, so we will build this uninformative transitional matrixcat2

(notice that the value can be anything, as far as it's the same for each entry in the table). The row variables indicates the emotional state at time ; the the emotional state at time . We will call this factor . This is how the network looks like with the factors:cat3

Let's see how we can implement this in Python. We are going take 3 measures for now, so our network will only have 3 unobserved states and 3 observed states:

[to run this code you need Python2.7, pgmpy, and numpy]

from pgmpy.models import MarkovModel
from pgmpy.factors.discrete import DiscreteFactor
import numpy as np 
from pgmpy.inference import BeliefPropagation

 
 
def toVal(string):
    if string=='purr': 
        return 0; 
    else:
        return 1; 
    
catState=['happy','sad']
MM=MarkovModel();
# non existing nodes will be automatically created 
MM.add_edges_from([('f1', 'f2'), ('f2', 'f3'),('o1','f1'),('o2','f2'),('o3','f3')])

#NO DEPENDENCY
transition=np.array([10, 10, 10, 10]); 
#RELAXED CAT
#transition=np.array([90,10,10,90]); 
#DIVA CAT
#transition=np.array([10, 90, 90, 10]); 


purr_happy=80; 
purr_sad=20
noPurr_happy=15; 
noPurr_sad=25; 

factorObs1= DiscreteFactor(['o1','f1'],cardinality=[2, 2], \
values=np.array([purr_happy, purr_sad, noPurr_happy,noPurr_sad]))
factorObs2= DiscreteFactor(['o2','f2'],cardinality=[2, 2], \
values=np.array([purr_happy, purr_sad, noPurr_happy,noPurr_sad]))
factorObs3= DiscreteFactor(['o3','f3'],cardinality=[2, 2], \
values=np.array([purr_happy, purr_sad, noPurr_happy, noPurr_sad]))

factorH1= DiscreteFactor(['f1','f2'],cardinality=[2, 2], values=transition)
factorH2= DiscreteFactor(['f2','f3'],cardinality=[2, 2], values=transition)


#factor.values
MM.add_factors(factorH1)
MM.add_factors(factorH2)
MM.add_factors(factorObs1)
MM.add_factors(factorObs2)
MM.add_factors(factorObs3)
 

Note the peculiar way of inserting the entry for each factor. We take the conditional probability table, starting from the first row, and we proceed from left to right before going to the next row. We can have a more friendly representation by calling factorObs1.values(). If you are wondering what does cardinality means, it's just the number of values that each variable in the factor can take. In our case, we are working with binary variables, so cardinality will always be 2.

Now we are going to do some magic. By using an inference algorithm called Belief Propagation we will be able to estimate the probability of each internal state. However, we are not exactly interested in that, but we want to know what is the most likely internal state given our purring observations. The procedure for this is called MAP, and there are different ways of implementing that. We are going to use the MAP included in the Belief Propagation algorithm. Notice how in the code we specify what are the values of our observed variables (evidence=...). The function toVal maps "purr" to the entry 0 and "no_purr" to the entry 1.

belief_propagation = BeliefPropagation(MM)
ymax=belief_propagation.map_query(variables=['f1','f2','f3'],\
evidence={'o1' : toVal('purr'), 'o2' : toVal('no_purr'), 'o3' : toVal('purr')})
#ymax=belief_propagation.map_query(variables=['f1','f2','f3'],\
#evidence={'o1' : toVal('purr'), 'o2' : toVal('no_purr'), 'o3' : toVal('no_purr')})
#ymax=belief_propagation.map_query(variables=['f1','f2','f3'], \
#evidence={'o1' : toVal('purr'), 'o2' : toVal('purr'), 'o3' : toVal('purr')})

print('f1: ' + str(ymax['f1']) + ' = ' + catState[ymax['f1']]);
print('f2: ' + str(ymax['f2']) + ' = ' + catState[ymax['f2']]);
print('f3: ' + str(ymax['f3']) + ' = ' + catState[ymax['f3']]);

 

In our case, we observed a sequence of purr, no_purr, and purr again. As we have not specified any dependencies between states, the network is going to tell us that our cat was happy, sad, and happy again. Is this the case, though? Let's take a more realistic approach to this fluffy problem.

Felix as a cool, relaxed cat

Let's consider the first case: Felix is a cool dude, and when he is in a state, he is probably going to stay there for quite long time. We are going to use this transition factor table:cat4

Which clearly tell us that there is a strong connection between being happy (or sad) at time , and being happy (or sad) at time . To generate this network, uncomment line 22 (from the first snippet), and run the script again. What is going to happen now? All the three states  are now classified as "happy"! The lack of purr in the middle state, being less strongly connected with his level of happiness, has been overcome by the neighbour states, which more surely indicates that Felix was happy.

Try instead to set the sequence (uncomment line 4 and 5 from the second snippet). What's happening now? Think about what the network is computing, and check if it makes sense.

Felix as a diva 

Let's consider the situation where Felix is a diva, and he is more likely to change his mood along different observations. We can represent this by using the following transition table:cat5

Which favours transition from sad to happy or from happy to sad. Now let's say that we observe our Felix purring all the time. Being him a crazy diva that he is, can we infer that he is indeed happy all the time? No way! In fact, if you uncomment line 25 from the first snippet and line 6 and 7 from the second snippet, and you will see that even though the observed sequence is , the most likely state appears to be . Oh Felix, you are driving me crazy!

This simple example shows only the basics of the potential of CRF. This is only the beginning. We may want to automatically calculate the transition table, given the observation. Or we may want to include more complicated observation, or generate a non-linear graph. We can do all of this and much more with CRF.

In the next post I hope to use more complicated models to show other cool features of this approach. Hope you enjoyed it!

Head Tilting Detection - Automatic Page Turner for Pianists

Head Tilting Detection (or Automatic Page Turner for Pianists, still undecided about the name 😛 ) is a simple software that emits a Page Down/Up keypress when the user is tilting the head.Photo-3

Since I started learning piano, more than 10 years ago, I had the problem of turning pages. Turning pages is one of the most annoying thing for a pianist: it forces you to waste seconds, interrupt the flow of music, and it affects the way we learn the pieces (e.g. by making the connection between pages really poor, since we usually stop from a page to the next). Several alternatives exist for  turning pages automatically, but they are clumsy and inefficient. Recently I thought about applying my programming knowledge to this purpose.

As more and more pianists are switching from paper to digital scores, it is possible to use a machine learning approach. I designed a simple software that detects when you are tilting your head right/left, and "scroll" down the page accordingly by simulating a page down/up keypress, with in most software will scroll the window down/up - in Adobe PDF, if you "fill one full-page to window", you will turn a whole page (to next/previous one).

Update 28/03/2016: New version for Linux released. You can now rotate/flip your webcam viewer. The sound now works on Windows machine.

WINDOWS 32/64bit (tested on Windows 7, Windows 8, Windows 8.1)

Download Head Tilting Detection (no setup required) - for Windows

LINUX (tested on Ubuntu 15.10)

Download Head Tilting Detection (no setup required) - for Linux

open folder in terminal and type sudo ./HeadTiltingDetection.sh. This bash file will download and install xdotool if not installed already) and the software will be executed.

Instructions

Open the software, wait for the your face to be detected (a green rectangle around your face should appear), then wait for your eyes to be detected (red squares).
At this point select the application you want to send the keypress (for example, a pdf file).
When you tilt the head on one side, the corresponding arrow key signal will be emitted. A green circle should appear on the corresponding side of the camera view, and you should hear a specific sound for the direction you are tilting your head (you can disable the sound).

Adjust the threshold from the slider, disable the sound, or pause the detection from the user interface.

Key points

  • The software is designed for pianists, so with more or less constant level of light, by taking into account slow/random movement, and by considering that pages are usually not turned so often. Plus, instead of just "turning" your page, it will scroll down on the page, with in most of the cases is what you want.
  • For now, the software does not work if you wear glasses. I will maybe work on this in the future.
  • The software uses a Haar feature-based cascade classifiers, with template matching for tracking the eyes, and several heuristics to increase accuracy. A cubic function of the two eyes' degree is used to emit the keypress. The threshold for the keypress can be adjusted.
  • The software has been tested on Windows 7, 8, 8.1. For now there is only a Window version, but I may develop it for other platform in the future. (28/03/16) Linux: Ubuntu version released

Future work

The software is not perfect. Unfortunately, I do not have many hours to dedicate to it. I plan to keep work on it, but not consistently. I'll probably add to this post any minor updates, and create new posts only if there is a major improvement. If you want to collaborate with me for this project, feel free to contact me!

The software uses OpenCV (for the computer vision part) and Qt. It was very nice working with Qt again after so long, and discovering OpenCV was also very interesting. I have a very good opinion about both in terms of usability and capability. Managing to merge both systems was an excellent experience for me.