VLM Finance NLP Japanese LLaVA SFT LoRA

Compass: Developing a Japanese Financial Vision-Language Model through Integrated Reasoning Enhancement and Document Comprehension

We present Compass, a Japanese Vision-Language Model specialized for financial document understanding. Built on the LLaVA-OneVision architecture with llm-jp-4-8b, our three-phase training pipeline integrates vision-language alignment, mathematical reasoning enhancement via knowledge distillation, and financial domain specialization through direct visual document reading.

Sections: Introduction Architecture Training Pipeline Datasets Implementation Details Evaluation Setup Experimental Results Conclusion and Future Work Acknowledgments References
Read more →