Try-Catch-FAIL

Failure is inevitable.

Bridging the Java-.NET Gap: foreach-ing an Enumeration

clock November 3, 2008 10:08 by author Matt

My fun with IKVM.NET continues this week as I utilize Weka from .NET (just a fun note, but the .NET-compiled version is insanely faster than the Java version doing the exact same thing; suck on that, Java fans!).  For the most part, everything has been swell.  The hardest part is trying to decipher research methodologies to replicate the systems they describe.  Still, I've run into a few snags when working with the .NET versions of Weka, the first of which is that you can't foreach over a Java Enumeration.  For example, consider the following:

   1: //THE FOLLOWING DOESN'T COMPILE!
   2: //foreach (weka.core.Instance instance in instances.enumerateInstances())
   3: //{
   4: //    //TODO: Operate on instance here.
   5: //}
   6:  
   7: Enumeration enumerator = instances.enumerateInstances();
   8: while (enumerator.hasMoreElements())
   9: {
  10:     weka.core.Instance instance = (weka.core.Instance)enumerator.nextElement();
  11:  
  12:     //TODO: Operate on instance here
  13: }

While it isn't a huge difference, being able to use foreach would obviously be simpler than having to create an instance of Enumeration, then use it to step through the items.  Fortunately, it's quite easy to generically bridge the gap so that any Enumeration can be enumerated by simply calling an extension method, like so:

   1: foreach (weka.core.Instance instance in instances.enumerateInstances().ToEnumerable())
   2: {
   3:     //TODO: Operate on instance
   4: }

How does this work?  First, we need to apply the adapter pattern to convert a Java Enumeration into a .NET IEnumerator.  Here's our adapter:

   1: /// <summary>
   2: /// Provides an adapter that can convert a Java
   3: /// Enumeration class into something that implements
   4: /// IEnumerator.  
   5: /// </summary>
   6: public class EnumeratorEnumerationAdapter : IEnumerator
   7: {
   8:     #region Private Fields
   9:  
  10:     /// <summary>
  11:     /// The class being adapted.
  12:     /// </summary>
  13:     private Enumeration mEnumeration;
  14:  
  15:     /// <summary>
  16:     /// The current object.
  17:     /// </summary>
  18:     private object mCurrent;
  19:  
  20:     #endregion
  21:  
  22:     #region Implementation of IEnumerator
  23:  
  24:     /// <summary>
  25:     /// Advances the enumerator to the next element of the collection.
  26:     /// </summary>
  27:     /// <returns>
  28:     /// true if the enumerator was successfully advanced to the next element; false if the enumerator has passed the end of the collection.
  29:     /// </returns>
  30:     /// <exception cref="T:System.InvalidOperationException">The collection was modified after the enumerator was created. </exception><filterpriority>2</filterpriority>
  31:     public bool MoveNext()
  32:     {
  33:         if (!mEnumeration.hasMoreElements())
  34:         {
  35:             return false;
  36:         }
  37:  
  38:         mCurrent = mEnumeration.nextElement();
  39:         return true;
  40:     }
  41:  
  42:     /// <summary>
  43:     /// Sets the enumerator to its initial position, which is before the first element in the collection.
  44:     /// </summary>
  45:     /// <exception cref="T:System.InvalidOperationException">The collection was modified after the enumerator was created. </exception><filterpriority>2</filterpriority>
  46:     public void Reset()
  47:     {
  48:         throw new NotSupportedException();
  49:     }
  50:  
  51:     /// <summary>
  52:     /// Gets the current element in the collection.
  53:     /// </summary>
  54:     /// <returns>
  55:     /// The current element in the collection.
  56:     /// </returns>
  57:     /// <exception cref="T:System.InvalidOperationException">The enumerator is positioned before the first element of the collection or after the last element.-or- The collection was modified after the enumerator was created.</exception><filterpriority>2</filterpriority>
  58:     public object Current
  59:     {
  60:         get
  61:         {
  62:             return mCurrent;
  63:         }
  64:     }
  65:  
  66:     #endregion
  67:  
  68:     #region Public Constructors
  69:  
  70:     /// <summary>
  71:     /// Creates an adapter for the specified enumeration.
  72:     /// </summary>
  73:     /// <param name="enumeration"></param>
  74:     public EnumeratorEnumerationAdapter(Enumeration enumeration)
  75:     {
  76:         mEnumeration = enumeration;
  77:         mCurrent = null;
  78:     }
  79:  
  80:     #endregion
  81: }

Next, we just need to write the extension method that utilizes our adapter to create an IEnumerable:

   1: /// <summary>
   2: /// Contains extension methods to simplify working with 
   3: /// <see cref="Enumeration"/> objects.
   4: /// </summary>
   5: public static class EnumerationExtensions
   6: {
   7:     /// <summary>
   8:     /// Creates a <see cref="IEnumerable"/> wrapper
   9:     /// around a <see cref="Enumeration"/>.
  10:     /// </summary>
  11:     /// <param name="enumeration"></param>
  12:     /// <returns></returns>
  13:     public static IEnumerable ToEnumerable(this Enumeration enumeration)
  14:     {
  15:         EnumeratorEnumerationAdapter adapter = new EnumeratorEnumerationAdapter(enumeration);
  16:  
  17:         while (adapter.MoveNext())
  18:         {
  19:             yield return adapter.Current;
  20:         }
  21:     }
  22: }

And like magic, you can now foreach over any Java Enumeration just like it was a .NET IEnumerable implementor.  It'd be nice if this were baked in to IKVM.NET, but for now, this simple "hack" will do the trick.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Machine Learning: Why you should care

clock September 16, 2008 10:11 by author Matt

In the last post, I introduced the topic of machine learning.  In this post, I'll describe an example problem, discuss how you might go about writing code to address the problem, then discuss how you can apply machine learning to the same problem and get the computer to do the heavy lifting for you.  In the end, hopefully you'll at least have a vague idea of how machine learning might be useful.

The problem

You work for a company that makes data entry software for hospital emergency rooms.  One day, your boss comes in and says that you're going to add a new feature to the software: based on patient vitals and stats, it will prioritize patients automatically into one of two categories: critical or not critical.  What do you do?  On his way out the door, your boss leaves you with a database containing about 100,000 records from the local ER.  Each record contains stats on some patient and whether or not the supervising doctor decided that the patient was critically injured.  That's all you have to work with.  Stop and think about how you would try to solve this problem. 

Approach 1: Write code

You decide that you'll write a series of if/else-if statements to solve this problem.  Surely that will work, right?  Well, it turns out that there are about 40 stats (we'll call these attributes) for each patient (and we'll call patients instances), so which attributes do you use in your if/else-if statements?  This is going to get pretty hairy in a hurry.  After checking a few instances, you end up with something that looks like:

   1: if (patient.BP < minBP && patient.Pulse < minPulse)
   2: {
   3:     return true;
   4: }
   5: if (patient.O2 < minO2 || patient.Pulse > maxPulse)
   6: {
   7:     return true;
   8: }
   9: if (patient.Pulse == 0)
  10: {
  11:     return true;
  12: }

So far so good!  Except now you look at the next instances, and suddenly you have a patient that meets the requirements for the first if statement, but they're labeled as not critically injured.  So, you carefully examine the instance, and try to tack another check on an attribute into the if statement, but that causes the statement not to match an instance that it should be matching.  So, now you have to add yet another if statement, and it has to go before the original if statement.

Now, multiple that scenario by about 100,000, and you're finished.  Wasn't that fun and easy?

Approach 2: Encode human knowledge

So approach 1 didn't work out so well.  Instead, you try asking the doctors what criteria they use to decide whether or not someone is critically injured.  You get a short explanation and encode it as a couple of rules, easy enough.  Except that when  you test it on your 100,000 records, it gets the vast majority of them wrong.  When you bring a few examples to the doctor, he says "Oh, yeah, the patient isn't critically injured *UNLESS* this attribute has this value, too, then they're critically injured."  You take the knowledge back, rework your rules to encode this new knowledge, and find that you're still not doing a very good job at re-classifying your 100,000 instances.  After multiple iterations with the doctor, you have a set of rules that seem to work.  You roll it out into production, and immediately the phone starts ringing off the hook.  "Your software says this guy is critically injured, but he's fine!" Even though your rules work very, very well on your 100,000 records, they seem to do very poorly in the real world.

Approach 3: The Machine Learning way

You throw away all your rules, and instead decide to try out these fancy machine learning tools.  You don't really know much about the tool, other than you give it data, tell it what attribute to predict, and it builds a model that you can apply to new instances.  For this problem, you feed it in your 100,000 records and tell it to build a model that can predict the critical/not critical status.  After a few short seconds, it spits out a model that works quite well on the 100,000 instances you trained it with (note that you actually don't want it to be 100% accurate most of the time, more on that in a future post).  You hook the model in to your code; all you have to do is pass it an instance, and it passes back a true/false, much cleaner than 100 if/else-if statements strewn about.  You then roll the code out to the world and wait... the phone rings, and people do complain that it isn't 100% accurate, but the calls are infrequent.  Most of the time when it is making mistakes, it is doing so on patients that are borderline anyway.  All in all, not bad considering you had to write almost no code.

So what happened?

The machine learning approach worked because there were patterns in the data that the computer was able to learn to recognize.  The patterns were too subtle for you to pick up on given the sheer size of the data set and the number of attributes on each case.  Our brains have a hard time working across many dimensions at once. The machine learner is really good at that sort of thing though.  It's able to consider all 100,000 records and all 40 attributes quite easily.  It was able to identify patterns in the training data that you fed it, and it generalized those patterns so that they would be useful in classifying new instances.  That's the magic of machine learning: being able to generalize from observed instances to things that haven't been seen before.

Next time, I'll give some examples of how machine learning is used by tools that you're probably already using.  I may even get in to specific types of machine learning techniques and what they can be used for.

Currently rated 5.0 by 1 people

  • Currently 5/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


What is Machine Learning?

clock September 8, 2008 09:21 by author Matt

I'm going to be doing some posts on machine learning, artificial intelligence, and data mining over the coming weeks and months as I try to crank out a thesis.  Since machine learning isn't a topic that a lot of developers are familiar with, I decided it would be best to write up a brief summary of what machine learning encompasses and why you should care.  If you have any questions after reading this, please let me know in the comments.  I need to be very solid at explaining machine learning to an uninformed reader if I'm going to crank out a decent thesis...

Machine learning, at its most basic level, is a field that is concerned with techniques that allow computers to "learn".   It is often confused with artificial intelligence, data mining, and other related fields.  That's because many applications of machine learning (or AI, or data mining) tend to straddle the fence between fields as opposed to being purely in one domain or another.  So, machine learning is any computer system that has the capability to learn.  The formal definition of "learn" is typically something like "a system can learn if it can improve its performance at some task based on experience at performing that task".  Basically, if it can get better at something automagically, it is learning. 

Machine learning is a *huge* field with many diverse applications.  My area of research is focused mainly on two applications of machine learning: classification (a form of supervised learning, which we'll talk about in the future) and clustering (a form of unsupervised learning).  There are many other applications that will probably be discussed in future posts (if I ever get the itch). 

Classification deals with trying to classify some individual with respect to a target class or set of target classes.  For example, I'm working on a medical system that takes in a patient data (such as age, blood pressure, pulse, and respiratory rate) and tries to determine whether or not that person is critically injured.  The system does that by using machine learning.  We feed the system a training set, which contains patient data that has been labeled apriori with the correct class, and the system learns how to classify critically injured patients from non-critically injured ones.  The magic is that the machine is able to apply what it learns to patient data that it hasn't seen before.  If it wasn't able to make this inductive leap, the system would be a look-up table. 

Clustering deals with trying to place individuals together into groups (called clusters).  There are many techniques for doing this, but most are still accomplishing the same task: creating clusters wherein individuals in the same cluster are similar to one another and dissimilar to individuals in other clusters.  Clustering algorithms typically need to know a few things: the number of clusters that exist, and how to calculate the 'distance' between individuals (there are several clustering systems that can determine the number of clusters automatically, but most basic clustering algorithms are not capable of making that determination).  Unlike in classification where the goal is to predict some target classification, clustering doesn't care about class labels.  It looks at all the attributes of the data in the training set and groups them by similarity.  New items can then be clustered to previously built categories as needed.

Summary

So, machine learning deals with anything in which a computer system "learns"; it gets better at some task based on experience.  Machine learning is not a synonym for artificial intelligence, though the two are related.  It is  a very broad field that provides useful tools capable of solving complex problems in elegant ways.

In the next post on this topic, we'll delve in to why you should care about machine learning (here's a hint: you are probably already leveraging it and don't even know it, but if you're not, you're working way too hard!)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


About Matt

I am an overworked (and apparently overpaid) software developer that moonlights as a graduate student in computer science. I started off coding in C over a decade ago.  Since then, I've migrated from C to C++ and branched out to C#, PHP, VB.NET, JavaScript, and worked with a wide assortment of other languages that I hope to never deal with again (I'm looking at you, COBOL). Oh, and yes, I've written some Java.  Does that make me a bad person?

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in  anyway.

© Copyright 2008

Sign in