Scaling laws are used to compare and predict the properties and performance of foundation models at larger scales in the context of transferable learning.
Full scaling laws are derived for language-vision learning procedures CLIP and MaMMUT, showing MaMMUT's stronger improvement with scale and better sample efficiency than standard CLIP.
Comparison is based on downstream tasks such as classification, retrieval, and segmentation across different open datasets, demonstrating consistent trends.
Deriving scaling laws with a constant learning rate schedule reduces compute cost and enables accurate comparison across scale spans to improve open foundation models and datasets.