Crowdsourced social media data has become popular for assessing cultural ecosystem services (CES). Nevertheless, social media data analyses in the context of CES can be time consuming and costly, particularly when based on the manual classification of images or texts shared by people. The potential of deep learning for automating the analysis of crowdsourced social media content is still being explored in CES research. Here, we use freely available deep learning models, i.e., Convolutional Neural Networks, for automating the classification of natural and human (e.g., species and human structures) elements relevant to CES from Flickr and Wikiloc images. Our approach is developed for Peneda-Gerês (Portugal) and then applied to Sierra Nevada (Spain). For Peneda-Gerês, image classification showed promising results (F1-score ca. 80%), highlighting a preference for aesthetics appreciation by social media users. In Sierra Nevada, even though model performance decreased, it was still satisfactory (F1-score ca. 60%), indicating a predominance of people’s pursuit for cultural heritage and spiritual enrichment. Our study shows great potential from deep learning to assist in the automated classification of human-nature interactions and elements from social media content and, by extension, for supporting researchers and stakeholders to decode CES distributions, benefits, and values.