MMFactory is a universal solution search engine for vision-language tasks.It aims to provide a diverse pool of programmatic solutions by instantiating and combining visio-lingual tools from its model repository.MMFactory considers user constraints and proposes solutions that meet unique design constraints.Experimental results show that MMFactory outperforms existing methods in delivering tailored solutions.