Word Segmentation by Component Tracing and Association (CTA) Technique
Word-level segmentation is a very important step in many document analysis systems. This is because word is the most important unit in any language systems. Word segmentation of handwritten documents is a very challenging task due to cursive nature of handwriting, overlap, touching and crossing of adjacent words, non-straight baselines, and cluttering among many others. Of these challenges, crossing is the most difficult challenge to deal with. This paper proposes a novel offline word-level segmentation technique for handwritten documents that addresses the challenges of touching and crossing of words. The main contribution of the paper is junction branch association (JBA) method that specifically handles touching and crossing words where many other proposed methods fail. The proposed method has been evaluated with ICDAR2009 and ICDAR2013 benchmark datasets of handwritten scripts. Also, crossing words extracted from FireMaker dataset of handwritten documents have been used to specifically evaluate performance of JBA method in segmenting crossing words.