Abstract:Mountainous terrain shadows in remote sensing satellite imagery typically exhibit irregular morphologies and complex boundaries, which pose significant challenges to accurate segmentation using conventional methods. To address this issue, an improved VGG16?UNet semantic segmentation model that integrated deformable convolution and a coordinate attention mechanism was proposed, aiming to improve the recognition and localization of shadow regions. The model employed VGG16 as the backbone encoder, where deformable convolution was introduced to dynamically adjust sampling locations, thereby effectively capturing features within irregular shadow neighborhoods. Simultaneously, a coordinate attention mechanism was embedded to enhance the synergistic representation of spatial positional information and channel?wise features, optimizing detail recovery and structural consistency. Validation experiments conducted on a self?built dataset and domestic GaoFen?7 satellite imagery showed that the proposed method achieved mean intersection over union (mIoU), mean recall (mRecall), and overall accuracy (OA) scores of 94.77%, 97.28%, and 97.52%, respectively. These results represented improvements of 0.62 percentage points, 0.41 percentage points, and 0.30 percentage points over the baseline VGG16?UNet model. Furthermore, tests across diverse mountainous scenarios confirmed that the method possessed stable and reliable shadow detection capabilities, along with strong generalization performance and robustness. This work can provide a reliable technical pathway for the automated extraction of high?precision terrain shadows.