Cascade Reward Sampling (CARDS) is introduced to address efficiency bottlenecks in decoding-time alignment of large language models (LLMs).CARDS utilizes a segment-level rejection sampling algorithm to minimize redundant computations of LLMs and reward models (RMs).An uncertainty-based segmentation mechanism ensures accurate evaluation of RMs on incomplete segments.Experimental results demonstrate that CARDS significantly improves decoding efficiency, alignment quality, and general utility.