Abstract: One important field of study that combines language processing and computer vision to produce descriptive text from images is image captioning, which uses deep learning and natural language ...