Archive

Archive for the ‘ruby’ Category

Tools of the Effective Developer: Regular Expressions

May 12th, 2010 8 comments

Whenever I suggest using regular expressions to solve a string parsing problem, more often than not I’m met with skepticism and frowning faces. Regular expressions have a bad reputation among many of my fellow developers.

(Yes, they are mostly Windows developers, Xnix users don’t seem to have this problem.)

But can you blame them? I mean, have a look at this regular expression for validating email addresses.

[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?

That’s enough to send any normally functioning individual out the door screaming. Fortunately, regular expressions aren’t always this messy. In fact, the simpler the problem, the more effective and beautiful they get.

Let’s look at an example that shows the power and density of regular expressions.
Suppose we have a chunk of text. We know that somewhere in it there’s a social security number. Our job is to extract it.

This is one of those tasks that involve a decent amount of code unless your language of choice has support for regular expressions. In Ruby, on the other hand, it’s a single line of code.


text = “Test data that 123-45-6789 contains a  social security number.”

if text =~ /\d\d\d-\d\d-\d\d\d\d/
  puts $~
else
  puts “No match”
end

If  the match operator (=~) and the magical match result variable ($1) puts you off, here’s how it’s done in C# that doesn’t have a special notation for regular expressions, but support them through the .Net framework.


String text = “Test data that 123-45-6789 contains a  social security number.”;

Regex ssnReg = new Regex(@"\d\d\d-\d\d-\d\d\d\d");
Match match = ssnReg.Match(text);

if ( match.Success ) {
  Console.WriteLine(match);
} else {
  Console.WriteLine("No match");
}

A beautiful thing with regular expressions is that it’s really simple to extract the parts of a match. For instance, if we need to extract the area code all we need to do is to put parenthesis around the part we’re interested in. Then we can easily extract that information, in C# by using the match.Groups property.


String text = “Test data that 123-45-6789 contains a  social security number.”;

Regex ssnReg = new Regex(@"(\d\d\d)-\d\d-\d\d\d\d");
Match match = ssnReg.Match(text);

if ( match.Success ) {
  Console.WriteLine(match);
  Console.WriteLine(“Area Number: “ + match.Groups[1]);
} else {
  Console.WriteLine("No match");
}

As intimidating as they may seem, the payoff using regular expressions is huge. And, since efficiency is what we strive for, they have a natural place in our bag of tools. So, learn the basics of regular expressions. You’ll be happy you did.

Finally some advice sprung from my own experience with regular expressions.

•   Get your regular expressions right first. Use Rubular, or an equivalent tool, for pain free experimentation before you implement them in your code.
•    Document your regular expressions. They are notoriously difficult to read so a short description with example match data is almost always a good idea.

Cheers!

Previous posts in the Tools of The Effective Developer series:

  1. Tools of The Effective Developer: Personal Logs
  2. Tools of The Effective Developer: Personal Planning
  3. Tools of The Effective Developer: Programming By Intention
  4. Tools of The Effective Developer: Customer View
  5. Tools of The Effective Developer: Fail Fast!
  6. Tools of The Effective Developer: Make It Work – First!
  7. Tools of The Effective Developer: Whetstones
  8. Tools of The Effective Developer: Rule of Three
  9. Tools of The Effective Developer: Touch Typing
  10. Tools of The Effective Developer: Error Handling Infrastructure
Categories: C#, ruby, software development Tags:

Loop Abstractions in D

January 17th, 2008 8 comments

One of the great things with Ruby is the natural way in which you can hide looping constructs behind descriptive names. Like the retryable example that Cheah Chu Yeow gives on his blog.

retryable(:tries => 5, :on => OpenURI::HTTPError) do
  open('http://example.com/flaky_api')
end

Notice how elegantly the loop logic is abstracted; There’s no need to look at the implementation of retryable to figure out what it does. The question is, can we do something similar with D as well? It turns out that with features like delegates and function literals we can actually get pretty close.

bool retryable(int tries, void delegate() dg)
{
  for(int i = tries; i > 0; i--)
  {
    try
    {
      dg();
      return true;
    }
    catch
    {
      // Retry
    }
  }
  return false;
}

Which can be used like this:

retryable(5, {
  open("http://example.com/flaky_api");
}) ;

Not as nice as with Ruby, but almost.

The custom exception of the Ruby version is a tricky one to implement in D. Templates to our rescue.

bool retryable(E)(int tries, void delegate() dg)
{
  for(int i = tries; i > 0; i--)
  {
    try
    {
      dg();
      return true;
    }
    catch (E)
    {
      // Retry
    }
  }
  return false;
}

With the (little bit odd) template syntax, we can then make retryable retry only when, for example, StdioExceptions are thrown.

retryable!(StdioException)(5, {
  open("http://example.com/flaky_api");
}) ;

To clean it up a bit, we can add some defaults (which requires us to switch places between the parameters).

bool retryable(E = Exception)(void delegate() dg, int tries = 5)
{
  for(int i = tries; i > 0; i--)
  {
    try
    {
      dg();
      return true;
    }
    catch (E)
    {
      // Retry
    }
  }
  return false;
}

That gives us a little more freedom when utilizing retryable.

retryable({
  // Retry up to 5 times
});

retryable({
  // Retry up to 10 times
}, 10);

retryable!(StdioException)({
  // Retry up to three times
  // on StdioException failures
}, 3);

I totally agree with Cheah Chu that Ruby is nice, but I think D is pretty cool too.

Cheers!