Current structural pruning methods face limitations in aggressive parameter reduction and latency-aware optimization.Multi-Dimensional Pruning (MDP) addresses these limitations by optimizing across various granularities and using advanced latency modeling.MDP achieves an optimal balance between latency and accuracy by formulating pruning as a Mixed-Integer Nonlinear Program (MINLP).Experimental results show that MDP outperforms previous methods, achieving speed increase and accuracy improvement.