Shugo on Blogger: March 2012

Monday, March 12, 2012

Enumerable#lazy and its benefits

Enumerable#lazy has been introduced into Ruby trunk. With Enumerable#lazy, the evaluation of Enumerable methods is delayed until needed, so Enumerable methods can be applied on infinite sequences. The following program shows first ten pythagorean triples:

def pythagorean_triples
  (1..Float::INFINITY).lazy.flat_map {|z|
    (1..z).flat_map {|x|
      (x..z).select {|y|
        x**2 + y**2 == z**2
      }.map {|y|
        [x, y, z]
      }
    }
  }
end
p pythagorean_triples.first(10)

One of the benefits of Enumerable#lazy is that you can separate the way to calculate each value and the way to determine when the iteration should be terminated. Thus the following program can show all pythagorean triples less than 100, without modification of the method pythagorean_triples:

p pythagorean_triples.take_while { |x, y, z| z < 100 }.force

Another example is an implementation of the UNIX wc command:

(ARGV.length == 0 ?
 [["", STDIN]] :
 ARGV.lazy.map { |filename|
  [filename, File.open(filename)]
}).map { |filename, file|
  "%4d  %4d  %4d %s\n" % [*file.lines.lazy.map { |line|
    [1, line.split.length, line.length]
  }.inject([0, 0, 0]) { |(lc, wc, cc), (l, w, c)|
    [wc + w, lc + l, cc + c]
  }, filename]
}.each(&:display)

Without Enumerable#lazy, you need more memory, and you cannot see the result until all files are processed.

Tuesday, March 6, 2012

Pattern matching and dynamic dispatch

In functional programming, pattern matching is preferred to if expressions. The following example is from The Fun of Programming:

data Colour = Blue | Red
data (Ord a) => Tree a = Null | Fork Colour a (Tree a) (Tree a)

isEmpty Null = True
isEmpty (Fork col x a b) = False

minElem (Fork col x a b) = x

deleteMin (Fork col x a b) = merge a b

insert x a = merge (Fork Blue x Null Null) a

merge a Null = a
merge Null b = b
merge a b
    | minElem a <= minElem b = join a b
    | otherwise = join b a

join (Fork Blue x a b) c = Fork Red x (merge a c) b
join (Fork Red x a b) c = Fork Blue x a (merge b c)

How about object-oriented programming? Dynamic dispatch is preferred to if expressions. The above example can be written in Ruby as follows:

class Tree
  def insert(value)
    BlueFork.new(value, Null, Null).merge(self)
  end

  def merge(other)
    other._merge(self)
  end
end

Null = Tree.new
def Null.empty?
  true
end
def Null.merge(other)
  other
end
def Null._merge(other)
  other
end

class Fork < Tree
  attr_reader :value, :left, :right

  def initialize(value, left, right)
    @value, @left, @right = value, left, right
  end

  def empty?
    false
  end

  def min_elem
    value
  end

  def delete_min
    merge(left, right)
  end

  def _merge(other)
    if min_elem <= other.min_elem 
      self.join(other)
    else
      other.join(self)
    end
  end
end

class BlueFork < Fork
  def join(other)
    RedFork.new(value, left.merge(other), right)
  end
end

class RedFork < Fork
  def join(other)
    BlueFork.new(value, left, right.merge(other))
  end
end

This Ruby code is slightly more verbose than Haskell code, but join is well defined using dynamic dispatch. However, dynamic dispatch has some limitations:

Ruby doesn't support multiple dispatch, so which method is invoked depends on only the receiver (in the above code, merge is implemented using double-dispatch techniques).
Dynamic dispatch doesn't work on recursive data structure.

2 is more critical. For example. it's hard to implement balancing of red-black trees using dynamic dispatch.

Kazuki Tsujimoto has implemented pattern-match to support pattern matching in Ruby. Balancing of red-black trees can be implemented using pattern-match as follows:

Node = Struct.new(:left, :key, :right)
class R < Node; end
class B < Node; end

def balance(left, key, right)
  match([left, key, right]) {
    with(_[R.(a, x, b), y, R.(c, z, d)]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[R.(R.(a, x, b), y, c), z, d]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[R.(a, x, R.(b, y, c)), z, d]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[a, x, R.(b, y, R.(c, z, d))]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[a, x, R.(R.(b, y, c), z, d)]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_) { B[left, key, right] }
  }
end

With Refinements, you don't need to use ugly .():

def balance(left, key, right)
  match([left, key, right]) {
    with(_[R[a, x, b], y, R[c, z, d]]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[R[R[a, x, b], y, c], z, d]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[R[a, x, R[b, y, c]], z, d]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[a, x, R[b, y, R[c, z, d]]]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_[a, x, R[R[b, y, c], z, d]]) { R[B[a, x, b], y, B[c, z, d]] }
    with(_) { B[left, key, right] }
  }
end

But balance is defined as a function, not as a method, and it doesn't look so Rubyish. Is there any object-oriented way to define balance?

Thursday, March 1, 2012

To return, or not to return

Ruby guys sometimes discuss whether we should use return or not. A few years ago at RubyConf, there was a panel discussion about coding, where questions were asked from the audience by Twitter. I asked the question, and most people said they don't use return. A panelist who use return was attacked as a Java guy. If I had been a good English speaker, I could have defended him.

Why do I like return? It's because in the following code, we can't very well know whether the code expects that baz returns a value, or just counts on the side effect of baz.

def foo
  bar
  baz
end

Those who don't like return sometimes says that there's no return in functional languages (Haskell has return, but it's completely different from Ruby's). But, it's obvious that a function returns a value, so it's reasonable for functional languages not to use return. However, Ruby is very imperative. Very.

It's OK not to use return in functional style code. For example:

def sum_of_squares(ary)
  ary.map { |i| i ** 2 }.inject(&:+)
end

But the following code looks a bit awkward:

def sum_of_squares(ary)
  result = 0
  for i in ary
    result += i ** 2
  end
  result
end

The same awkwardness can be observed in code using inject. The use of inject in the above code looks fine, but the following code looks awkward:

def to_hash(ary)
  ary.inject({}) do |h, (k, v)|
    h[k] = v
    h
  end
end

If you need side effects, you should use each_with_object instead of inject:

def to_hash(ary)
  ary.each_with_object({}) do |(k, v), h|
    h[k] = v
  end
end

Anyway, it's OK if your code is readable. That's all.