Tools of the Effective Developer: Regular Expressions

May 12th, 2010 8 comments

Whenever I suggest using regular expressions to solve a string parsing problem, more often than not I’m met with skepticism and frowning faces. Regular expressions have a bad reputation among many of my fellow developers.

(Yes, they are mostly Windows developers, Xnix users don’t seem to have this problem.)

But can you blame them? I mean, have a look at this regular expression for validating email addresses.

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

That’s enough to send any normally functioning individual out the door screaming. Fortunately, regular expressions aren’t always this messy. In fact, the simpler the problem, the more effective and beautiful they get.

Let’s look at an example that shows the power and density of regular expressions.
Suppose we have a chunk of text. We know that somewhere in it there’s a social security number. Our job is to extract it.

This is one of those tasks that involve a decent amount of code unless your language of choice has support for regular expressions. In Ruby, on the other hand, it’s a single line of code.


text = “Test data that 123-45-6789 contains a  social security number.”

if text =~ /\d\d\d-\d\d-\d\d\d\d/
  puts $~
else
  puts “No match”
end

If  the match operator (=~) and the magical match result variable ($1) puts you off, here’s how it’s done in C# that doesn’t have a special notation for regular expressions, but support them through the .Net framework.


String text = “Test data that 123-45-6789 contains a  social security number.”;

Regex ssnReg = new Regex(@"\d\d\d-\d\d-\d\d\d\d");
Match match = ssnReg.Match(text);

if ( match.Success ) {
  Console.WriteLine(match);
} else {
  Console.WriteLine("No match");
}

A beautiful thing with regular expressions is that it’s really simple to extract the parts of a match. For instance, if we need to extract the area code all we need to do is to put parenthesis around the part we’re interested in. Then we can easily extract that information, in C# by using the match.Groups property.


String text = “Test data that 123-45-6789 contains a  social security number.”;

Regex ssnReg = new Regex(@"(\d\d\d)-\d\d-\d\d\d\d");
Match match = ssnReg.Match(text);

if ( match.Success ) {
  Console.WriteLine(match);
  Console.WriteLine(“Area Number: “ + match.Groups[1]);
} else {
  Console.WriteLine("No match");
}

As intimidating as they may seem, the payoff using regular expressions is huge. And, since efficiency is what we strive for, they have a natural place in our bag of tools. So, learn the basics of regular expressions. You’ll be happy you did.

Finally some advice sprung from my own experience with regular expressions.

•   Get your regular expressions right first. Use Rubular, or an equivalent tool, for pain free experimentation before you implement them in your code.
•    Document your regular expressions. They are notoriously difficult to read so a short description with example match data is almost always a good idea.

Cheers!

Previous posts in the Tools of The Effective Developer series:

  1. Tools of The Effective Developer: Personal Logs
  2. Tools of The Effective Developer: Personal Planning
  3. Tools of The Effective Developer: Programming By Intention
  4. Tools of The Effective Developer: Customer View
  5. Tools of The Effective Developer: Fail Fast!
  6. Tools of The Effective Developer: Make It Work – First!
  7. Tools of The Effective Developer: Whetstones
  8. Tools of The Effective Developer: Rule of Three
  9. Tools of The Effective Developer: Touch Typing
  10. Tools of The Effective Developer: Error Handling Infrastructure
Categories: C#, ruby, software development Tags:

The King is Challenged – By the Uncle

April 9th, 2010 Comments off

According to Tiobe Programming Community Index as of April 2010, Java is no longer the most popular programming language. C is; Not due to a sudden increase in popularity of C, but rather a cause of Java’s ongoing decline.

After more than 4 years C is back at position number 1 in the TIOBE index. The scores for C have been pretty constant through the years, varying between the 15% and 20% market share for almost 10 years. So the main reason for C’s number 1 position is not C’s uprise, but the decline of its competitor Java. Java has a long-term downward trend. It is losing ground to other languages running on the JVM. An example of such a language is JavaFX script that is now approaching the top 20.

It’s important to point out, thought, that only the language part of Java is losing ground. As Tiobe Software indicates in their comment, the JVM is as hot as ever with all kinds of cool language evolution occurring.

Another interesting note to make about the april index is that Delphi is once again among the top ten. I’m really happy about that, although it means my old prophecy of a hasty decline kind of puts me in the corner (although I’ve already taken it back).

A third point is the giant leap of Objective-C, from 42:nd to 11:th most popular language. This is a combination of iPhone’s popularity and the fact that Objective-C is the only OO alternative allowed on that platform, besides C++.

The Tiobe Programming Language Community Index chart of april 2010

Cheers!

Categories: Delphi, java, opinion Tags:

Analog or Digital Task Board? That’s the Question.

April 8th, 2010 Comments off

As a project manager, the best tool I’ve come across to help me monitor and control an iteration is The Sprint Task Board. It strikes a perfect balance between expressiveness and visual feedback. One quick look is all you need to make a diagnose of the ongoing iteration. Yet, it contains most of the information needed to make informed decisions.

A drawn example of a task board, containing for instance a burndown-chart, sprint goal, and tasks.

The task board is absolutely invaluable and I’ll probably never do team related work without it again. The question, though, is whether to use a computerized or a real world one.

Henrik Kniberg recommends the latter. The following quote is from Henrik’s excellent book Scrum and XP from the Trenches.

We’ve experimented with different formats for the sprint backlog,
including Jira, Excel, and a physical taskboard on the wall. In the
beginning we used Excel mostly, there are many publicly available Excel
templates for sprint backlogs, including auto generated burn-down charts
and stuff like that. […]

Instead […] what we have found to be the most
effective format for the sprint backlog [is] a wall-based taskboard!

And there are good reasons for picking an analog task board. Maybe the most significant being:

  • Visibility. A wall-based dashboard is always up, always visible. If positioned near the team it’ll work as a constant reminder of the shape of the iteration.
  • Simplicity. Planning and re-planning is simply a matter of moving physical cards around, tearing them up or writing new ones.

In my current project we’re using an application (Scrum Dashboard on top of TFS). The reasons are:

  • Availability. Actually, this is the only reason for us to use a digitalized task board. We’re having dislocated team members from time to time (did I say we’re ScrumButs?) but since our dashboard is a web application they are able to access it remotely. It would be a lot more painful to relay the information otherwise.
  • Automatic burndown update. No need for me (or any one else) to do a work remaining summary and update the burndown chart each and every day.

What are your reasons? I’d be most interested in hearing your point of view on this subject.

Cheers!

Categories: software, software development, tools Tags:

I am a ScrumBut

March 26th, 2010 Comments off

Many organizations that are going agile are experiencing difficulties in applying the whole concept. Most of them yield and make some kind of adaption to better suit their current situation. Ken Schwaber, the father of Scrum, thinks this is just a way to avoid dealing with the true problems in the organization. He calls them “ScrumButs”.

In my organization we only fulfill two thirds of the Scrum Checklist, so I guess that makes us ScrumButs. I’m not going to feel bad about it though; Change takes time and we’re pretty happy we’ve gotten as far as we have. But Ken’s right, we really should do something about our most difficult problems and go the full ten yards. Maybe later. 🙂

Cheers!

Don’t Use Brian’s Threaded Comments With iNove Theme

March 23rd, 2010 Comments off

A quick WordPress note in case anyone else gets this problem.

Ever since I changed to the iNove theme I have – unknowingly – had a problem with my site in Internet Explorer. The first comment was always blank. Well, it was there in the web page, it just didn’t show in any IE browser. It turned out that I had a deprecated plugin activated. I removed it and now your comments should show again.

Categories: software Tags:

Software Development Principles Worth Knowing

March 19th, 2010 Comments off

I’ve just finished writing an article on common software development principles for a swedish computer magazine (Datormagazin, the article is scheduled for the may issue). The research work for the article was great fun and gave me the opportunity to freshen up on stuff that sunk into some kind of sub-conscious knowledge state years ago.

Anyway, I compiled a list of the principles I think are worth knowing. You can find it in the Principles section of this website.

Cheers!

Categories: software development Tags:

APM Delivers Nothing

January 22nd, 2010 Comments off

I’m reading Jim Highsmith’s book Agile Project Management. It’s a good book, as the frequent use of my marker pen shows, except for maybe this one sentence:

“APM reliably delivers innovative results to customers within cost and schedule constraints.”

Let me take that sentence out of its context and make a point I believe needs to be stressed. Software development methodologies (Agile Project Management in this case) delivers nothing! People, on the other hand, do.

Some years ago I developed a special interest in software development methodologies. Since then I’ve spent much time reading literature on Scrum, eXtreme Programming, etc, aiming to optimize the processes we use in our projects. During that time I’ve learned a lot, but also come to realize that I’ve been focusing too much on the wrong things.

In the past I thought the problems we were experiencing were problems with the methodology, an easy conclusion to draw from the propaganda-like information out there. However, I’ve found that the methodology matters little in comparison to the quality of my project team. Today I focus more on the team, the product, and the customer, and less on the latest within agile.

Thus, my formula for a successful project is this:

  • Spend less energy on the methodology; i.e. pick a simple one and adapt it when (and only when) needed, or stick to the one you’re currently using
  • Do more to get the right people
  • Make sure they are motivated and connected
  • If you think you’ll fail, do it fast

Summary

If we focus too much on the methodology, and give it too much importance, we risk loosing sight on the real goal, namely producing the right system. And for that you need the right people.

Cheers!

Communicating Through Waves

November 27th, 2009 Comments off

Have you seen the presentation in which Google introduces their coming product Google Wave? If not, do so. I believe that the presentation holds a piece of the future. The so called Wave combines the e-mail, chat and newsgroup functionality in a way that makes the whole greater than the parts.

One of the things that caught my attention was the possibility to collaboratively produce documents or Wiki-kind of information in real-time. I had a similar idea several years ago, and seeing it implemented, as a web application –which was unthinkable at the time, makes me happy. There’s a big questionmark though whether this will work in practice. The example in the presentation shows how the information added by the remote users makes the text move around, which is probably going to be annoying in a real situation.

There’s more work to be done there I guess. But the basic functionalities of the Wave, for instance the ability to add comments and start sub threads anywhere in a message is awesome. This new way of communicating will become infrastructure in the future. That’s my quess at least.

One can’t help becoming impressed with Google. They keep pulling these innovative, cool services out of their big bag of tricks, leaving the rest of the business in a state of awe. Google Waves is yet another proof that their free thinking company policy is paying off – big time.

Cheers!

P.S. Special thanks to Paul who sent me the invitation that triggered my interest.

Categories: software, tools Tags:

The Bad Practices of Exception Handling

October 31st, 2009 12 comments

Exception handling has truly been a blessing to us software developers. Without it, dealing with special conditions and writing robust programs was a lot more painful. But, like any powerful tool, badly used it could cause more harm than good. This article name the top three on my Exception handling bad practices list, all of which I’ve been practicing in the past but now stay away from.

Swallowing Exceptions

Have you ever come across code like this?

try
{
  DoSomeNonCriticalStuff();
}
catch (Exception e)
{
  // Ignore errors
}

DoStuffThatMustBeDoneDispiteAnyErrorsAbove();

Of all the bad exception handling practices, this is the worst since its effect is the complete opposite of the programmer’s intention. The reasoning goes something like this: Catching exceptions where they don’t hurt makes my program more robust since it’ll continue working even when conditions aren’t perfect.

The reasoning could have been valid if it wasn’t for Fatal exceptions; Here described by Eric Lippert.

Fatal exceptions are not your fault, you cannot prevent them, and you cannot sensibly clean up from them. They almost always happen because the process is deeply diseased and is about to be put out of its misery. Out of memory, thread aborted, and so on. There is absolutely no point in catching these because nothing your puny user code can do will fix the problem. Just let your “finally” blocks run and hope for the best.

Catching and ignoring these fatal exceptions makes your program less robust since it will try to carry on as if nothing happened in the worst of conditions, most likely making things worse. Not very Fail fastish.

So, am I saying that ignoring exceptions is bad and should always be avoided? No, the bad practice is catching and ignoring general exceptions. Specific exceptions on the other hand is quite OK.

try
{
  DoSomeNonCriticalStuff();
}
catch (FileNotFoundException e)
{
  // So we couldn't find the settings file, big deal!
}

Bad example, I know, but you get the point.

Throwing Exception

Here’s another bad practice I come across every now and then.

throw new Exception("No connection!");

The problem is that in order to handle Exception we have to catch Exception, and if we catch Exception we have to be prepared to handle every other type of Exception, including the Fatal exceptions that we discussed in the previous section.

So, if you feel the need to throw an exception, make sure it’s specific.

throw new NoConnectionException();

If the idea of defining lots of specific exceptions puts you off, then the very least thing you should do is to define your own application exception to use instead of the basic Exception. This is nothing I recommend though, since general exceptions, like ApplicationException, violate the Be specific rule of the Three rules for Effective Exception Handling.  It’ll make you depend heavily on the message property to separate different errors, so don’t go there.

Overusing exceptions

Exceptions is a language construct to handle exceptional circumstances, situations where a piece of code discovers an error but don’t have the context necessary to handle it. It should not be used to handle valid program states. The reasons are:

  1. Performance. Throwing an exception with all that’s involved, like building a stack trace, will cost you a second or so.
  2. Annoyance. Debugging code where exceptions are a part of the normal execution flow can be frustrating.

Eric Lippert calls those exceptions Vexing exceptions, which I concider a great name given the second argument. Make sure you’ll check out his article (link at the beginning of this article).

Those were the three misuses of exception handling I concider worst. What’s on your list?

Cheers

Tools of the Effective Developer: Error Handling Infrastructure

September 3rd, 2009 2 comments

I often come across code with little or no infrastructure for error handling. This is a big mistake, one that’ll make you pay increasingly as the code base grows. Why? Because your code’ll end up with loads of this:

try
{
  ParseText(SomeText);
}
catch (Exception e)
{
  MessageBox.Show("Error while parsing.", "My Application",
         MessageBoxButtons.OK, MessageBoxIcon.Error);

  // Skipping the rest due to the error
  return;
}

It may look OK but there are several problems with the error handling code above, some more severe than others.

  • Code duplication
    A message box for displaying error messages to the user should almost always have an error icon, the application name as the title, and an OK button. Repeating this all over the code base is not very DRY and should be avoided.
  • Inconsistency
    A related problem is that when you leave programmers with only the general error handling routines of the platform chances are they’ll end up doing things differently, resulting in an inconsistent user experience; For example showing some error dialogs with an error icon and some without.
  • Information loss
    A more serious problem is the fact that the code in the example, by ignoring the exception object, drops information that could be important to track down a bug. We could of course mititgate that problem somewhat by showing the error message of the exception to the user (by concatinating e.Message), but we’d still miss an important piece of information: the stack trace.
    Since dumping a pile of function calls in the face of the users is a no no, what we’d really need is a way to put the details in a non-intrusive place, for example a log file. If the Error Handling Infrastructure doesn’t make this easy, we’re likely to leave that kind of information out completely. For shame.
  • Not automation-friendly
    But the most severe problem with a spread out usage of UI displaying methods like MessageBox is that it makes your code impossible to automate. If someone has to monitor a scheduled run of for example unit tests, and every now and then klick an OK button, it kind of beats the purpose of automation.

So here’s my advice: Implement a strategy for handling errors at the earliest possible time. That is, right after setting up continuous integration for the Hello world application.

Properties of an Error Handling Infrastructure

So how should a error handling infrastructure be like? There’s only one rule. It has to be easy!
If it isn’t simple to use, it won’t be used, and programmers end up using the general purpose message showing methods again, or worse, not doing any error handling at all.

I usually build my error handling interfaces around two use cases.

  1. Displaying error messages
  2. Logging error messages

These are often used in combination, for example

try
{
  ParseText(SomeText);
}
catch (Exception e)
{
  ApplicationEnvironment.ShowErrorMessage("Error while parsing: " + e.Message);
  ApplicationEnvironment.LogError(e);
  return;
}

Or, to avoid code duplication, combined into a single convenient method.

try
{
  ParseText(SomeText);
}
catch (Exception e)
{
  ApplicationEnvironment.HandleException("Error while parsing", e);
  return;
}

When you design your error handling interfaces don’t be afraid to add lots of convenient methods. Remember, it has to be easy to use and that’s what convenient methods are all about, as opposed to utility methods who generally need to be given more arguments. In this case I prefer the Humane interface design style instead of a minimal interface approach.

Needs to be configurable

Furthermore, the Error Handling Infrastructure needs to be configurable or support some other kind of dependency breaking technique like dependency injection. For instance, you should be able to mute all error messages intended for the user, and instead write them to the error log. This way your code will be able to run in a scripted environment, like during unit testing.

There are many ways to design an Error Handling Infrastructure. You could create your own message dialogs allowing the user to send a report right away, you could use the built in application log of the operating system or just plain text files, etc etc.

Whatever you choose to do, don’t forget to do it early and make it easy.

Cheers!

Previous posts in the Tools of The Effective Developer series:

  1. Tools of The Effective Developer: Personal Logs
  2. Tools of The Effective Developer: Personal Planning
  3. Tools of The Effective Developer: Programming By Intention
  4. Tools of The Effective Developer: Customer View
  5. Tools of The Effective Developer: Fail Fast!
  6. Tools of The Effective Developer: Make It Work – First!
  7. Tools of The Effective Developer: Whetstones
  8. Tools of The Effective Developer: Rule of Three
  9. Tools of The Effective Developer: Touch Typing