I finally found time to look through Infer.NET. For those not in the know and too lazy to click on links, Infer.NET is a new projecting coming out of Microsoft Research. According to them, it’s a framework for machine learning. They claim that provides a lot of state-of-the-art stuff, including “message passing.” I’m not sure why “message passing” is an important feature in a package that claims to be for machine learning, but that’s what the website says, so…
Despite what the website says, I don’t really see this as being a framework for machine learning. I see it as an API for probability, which is indeed a very important part of machine learning, but it certainly is not a framework for ML in the same as Weka is. Don’t download this package expecting to find a lot of variety in what you can do. All you really get here is a single type of classification technique: Bayesian models. I did not see anything that resembled clustering, and none of their examples dealt with regression/prediction of real-valued functions, so I’m guessing that’s not supported either.
Ignoring the fact that this is definitely not an ML framework, it is a neat idea. Bayesian methods are very powerful, and this could be a good framework for using them to solve large problems in .NET. That said, I see a lot of flaws with it. It’s a beta release, so I’m going to be gentle, but I want to get these out in the open on the off chance that someone at Microsoft is listening.
First, the API is really not very clean. Let’s look at an example:
1: Variable<double> x = Variable.GaussianFromMeanAndVariance(0, 1).Named("x");
2: Variable.ConstrainTrue(x > 0.5);
Why in the world should I have to assign a magicing name to my variable? I’m going to assume that this will be fixed before the next release, because that’s just ugly, and I can see no real reason for it.
Next, we have this beauty:
1: Variable<double> probIfTreated, probIfControl;
2: using (Variable.If(isEffective))
3: {
4: // Model if treatment is effective
5: probIfControl = Variable.Beta(1, 1);
6: controlGroup[i] = Variable.Bernoulli(probIfControl).ForEach(i);
7: probIfTreated = Variable.Beta(1, 1);
8: treatedGroup[j] = Variable.Bernoulli(probIfTreated).ForEach(j);
9: }
Maybe I’m missing something, but that *looks* dangerously like there is something static that’s tracking that I have created an If-block, and the If-block is in effect until it is disposed. Intuitive? No. Thread-safe? Almost certainly not (not that you should be doing much threading with an API like this today, but in 5 years when we’re all running 72 core machines, maybe).
Overall, the API just doesn’t feel very clean or C#-like to me. I can’t help but think that the exact same logic could be expressed in a cleaner, simpler way. Someone that is not very comfortable with probabilities and statistics is not going to be able to use this API in its current form. If this is going to be portrayed as a toolkit for machine learning, I strongly recommend that the authors look at Weka and try to come up with a similarly intuitive and general API.
In its current state, this is not a toolkit for machine learning, but it is an interesting project. I’m very glad to see that the trend of cool things coming out of Microsoft Research is continuing. Just don’t download it looking for the C# version of Weka.