Main Article Content
The contents of a picture are auto assembled in Artificial Intelligence (AI), which incorporates computer vision and natural language processing (NLP). The regenerative neuronal model is constructed for which computer vision and machine restatements are needed. This approach is used to create natural-sounding statements that describe the image. In this model, convolutional neural networks (CNN) and intermittent neural networks are used in this model. CNN is used to extract features from images, while the RNN is used to generate rulings. The model has been trained in such a manner that when an input image is handed to it, it creates captions that almost precisely describe the image. On colorful datasets, the model's delicacy, smoothness, and command of language learned from visual descriptions are investigated.