Marko Vidrih
AI & ML interests
Organizations
Explore pre-trained VLM models like Qwen-VL (https://github.com/QwenLM/Qwen-VL), DeepSeek-VL (https://github.com/deepseek-ai/DeepSeek-VL) or maybe even LayoutLM or VLoc https://arxiv.org/pdf/2304.06447 that are specifically designed for document understanding tasks. Also, consider using transfer learning by fine-tuning a pre-trained VLM on your FIR dataset for improved performance.
Try layout-aware transformers or attention models that can handle both textual and visual features. Train the VLM on prepared dataset. This involves feeding the FIR images and corresponding IPC section labels into the model. The model will learn to identify the visual cues that indicate the IPC section within the FIRs.
Once trained, you can use the VLM to process new FIRs. The model will analyze the layout and text within the FIR image and predict the location of the IPC section. You can then extract the predicted region of the FIR and parse the text to obtain the IPC codes.
Sure. What do you want to know?