Sunday, October 25, 2015

Trie in Haskell

Lately I've been playing around with Haskell. As an exercise, I decided to partially implement a Trie. I ignored the various implementations out there, including Data.Trie and the one linked on Wikipedia. Instead I focused on iterating on a Trie from scratch, and I found the results to be pretty interesting, so I'm sharing them here.

My real goal with this exercise was to implement "member", "insert" and a sorted "toList" and to do so somewhat generically. This interface turns out to be more similar to a Set than say a Map, because there aren't values associated with members. I used [a] to represent a member of the Trie a which, in retrospect, looks like a Monoid. Well, anyways, my first structure came out looking like

data Trie a =
  RootNode [TrieNode a]
  deriving Show

data TrieNode a =
  Leaf a
  | SpineEnd a [TrieNode a]
  | Spine a [TrieNode a]
  deriving Show

In this modeling, Leaf a indicated the end of a member at a leaf, SpineEnd a [TrieNode a] indicated an end on a spine, Spine a [TrieNode a] was a non-ending spine, and well RootNode [TrieNode a] indicated the root. At first glance, I really liked this model. After implementing the Trie, however, I found the structure less than desirable for a few reasons
  • Since "edges" are a list, a fair amount of boilerplate traversal code was necessary to implement the operations
  • Although at first I liked keeping RootNode separate, but later I realized it was inhibiting me from inserting the empty list
  • Turns out that Leaf a is totally redundant with SpineEnd a []
I'm much more pleased with the second structure I came up with
import qualified Data.Map as M

type TrieEdges a = M.Map a (Trie a)

data Trie a =
  Node { edges :: TrieEdges a }
  | EndNode { edges :: TrieEdges a }
  deriving (Show, Eq)

empty :: Trie a
empty = Node M.empty
Not only does this design have fewer parts than the other, but it yielded a simpler implementation. Here's the code for insert
insert :: Ord a => [a] -> Trie a -> Trie a
insert [] n = EndNode $ edges n
insert vs (Node m) = Node $ insertBelow vs m
insert vs (EndNode m) = EndNode $ insertBelow vs m

insertBelow (v:vs) m = M.insert v (insert vs child) m
  where child = M.findWithDefault empty v m
I believe it's pretty straightforward. The only real tricks were safely creating edges where none existed (which is the main purpose of insertBelow) and ensuring that the node at the end of a member is an EndNode. The member function turned out even nicer
member :: Ord a => [a] -> Trie a -> Bool
member [] (EndNode _) = True
member [] (Node _) = False
member (v:vs) n = maybe False (member vs) . M.lookup v $ edges n
Getting an in-order toList was a little bit tricker but, thanks to the useful functions in Data.Map it was quite doable
toList :: Trie a -> [[a]]
toList = go []
  where
  go acc (Node m) = continue acc m
  go acc (EndNode m) = reverse acc:continue acc m

  continue acc = concatMap (\(v, n) -> go (v:acc) n) . M.toList
In conclusion, if you have any comments, improvements or tips to make this better let me know!