This repository contains the implementation of a CLIP-Large-based multimodal framework for cover-based book genre prediction. The model uses book cover images, book titles, and OCR text as inputs.
An AI-powered OCR application built with Python and Streamlit that extracts text from images using EasyOCR and enables users to ask questions about the extracted content using the Gemini AI model. - ...