The Institute of Science Tokyo successfully trained Llama 3.3 Swallow, a 70-billion-parameter LLM with enhanced Japanese capabilities, using Amazon SageMaker HyperPod.
Llama 3.3 Swallow outperformed GPT-4o-mini and other models in Japanese language tasks, detailed in a report by Kazuki Fujii.
The model is available in different variants on Hugging Face for research and development purposes.
The training methodology involved continual pre-training and supervised fine-tuning for Japanese dialogue and code tasks.
The base model displayed superior performance compared to industry models like GPT-4o-mini and Qwen2.5-72B.
Licensing allows public usage of the model for research and commercial applications.
The training infrastructure architecture utilized Amazon SageMaker HyperPod for high performance and scalability.
Key elements included comprehensive storage hierarchy, compute and network configuration, and a robust observability stack.
The project emphasized advanced parallelism strategies and optimized distributed training with Megatron-LM.
Implemented memory prediction tools and checkpointing strategies further enhanced training efficiency.