Main Article Content
In this study, deep neural networks are used towards create a continuous sign language (SL) recognition system that immediately converts videos about SL phrases into ordered gloss label sequences. Hidden Markov models among a limited ability towards capture temporal information are typically used in previous techniques considering continuous SL recognition. In contrast, our suggested architecture uses bi-directional recurrent neural networks as sequence learning module & deep convolutional neural networks among stacked temporal fusion layers as feature extraction module. considering our architecture, we suggest an iterative optimization procedure that will allow us towards fully utilise deep neural networks' representational abilities even among a small amount about input. We first train end-to-end recognition model considering alignment proposal, & then we directly tweak feature extraction module using alignment proposal as strong supervisory information. performance about recognition can be improved through repeating training process. through investigating multimodal fusion about RGB pictures & optical flow in sign language, we expand our contribution. Our approach beats state-of-the-art through a relative improvement about more than 15% on both databases when tested against two difficult SL recognition benchmarks.