Compass: Developing a Japanese Financial Vision-Language Model through Integrated Reasoning Enhancement and Document Comprehension
We present Compass, a Japanese Vision-Language Model specialized for financial document understanding. Built on the LLaVA-OneVision architecture with llm-jp-4-8b, our three-phase training pipeline integrates vision-language alignment, mathematical reasoning enhancement via knowledge distillation, and financial domain specialization through direct visual document reading.