This dataset contains a semantic segmentation delineation derived from street-level images, focusing on categorizing agricultural and natural landscapes. With 35 distinct classes, including labels such as "field margin," "crop," "cropfield," and "ditch," the dataset draws from Land Use/Cover Area Frame Survey (LUCAS) geospatial dataset. LUCAS images are collected using a consistent sampling framework, offering a representative view of different regions and environments of Europe.
Comprising a total of 1784 north looking images from 2018, this dataset contributes to land cover analysis by providing fine-grained annotations for a variety of landscape elements, as well as, a valuable resource for training and evaluating semantic segmentation models.
The dataset's potential applications span a range of domains, from land use mapping and environmental monitoring to urban planning and agricultural management. By fostering the advancement of machine learning models in accurately segmenting landscapes, this dataset contributes to sustainable land management practices and supports informed decision-making processes.
We provide two data products across three folders derived from the same raw data, for a total of three folders reported in this repository:
Across the above folders the raw data is the original data and not easily useable in machine learning context, but kept as a reference. The original dataset is organised into batches, per segmentation campaign, with each batch containing three main folders:
images: Contains the LUCAS north-looking images captured for each theoretical point.full_masks: Contains pixel-level annotated masks corresponding to each image, where each pixel is labelled with a class.partial_masks (only for the first batch): Contains partial masks where some areas of the images are not delineated.In the root of the raw_data folder is a classes_dataset.csv csv file containing the code and label correspondence.
The batch data is consolidated and enhanced the original labelled data with geolocation information and ancillary data derived from the Harmonized LUCAS in-situe land-cover and land use database. This meta-data can provide the necessary context within machine learning exercises or exploratory analysis.
The data is structured in two folders:
With in the root of the folder a file with the meta-data called lucas_ml_data.csv with ancillary data. It also contains the classes_dataset.csv CSV file containing the code and label correspondence.
The dynamic use of the data without downloading all data, should the dataset grow, can be accomplished using the implementation of a Spatio-Temporal Assets Catalogue (STAC). The STAC format allows for easy spatio-temporal subsetting. The data can be visually browsed using the STAC browser.
This dataset can be used for various semantic segmentation tasks, including land cover analysis, environmental monitoring, and urban planning. The unique identifiers in the image names enable geospatial analysis using correspondence with the LUCAS harmonised database. The provided ML dataset provides this capability.
If you use this dataset in your work, please consider citing the following paper:
Andrimont, Raphaël d’, Momchil Yordanov, Laura Martinez-Sanchez, Beatrice Eiselt, Alessandra Palmieri, Paolo Dominici, Javier Gallego, et al. “Harmonised LUCAS In-Situ Land Cover and Use Database for Field Surveys from 2006 to 2018 in the European Union.” Scientific Data 7, no. 1 (December 2020): 352. [https://doi.org/10.1038/s41597-020-00675-z](https://doi.org/10.1038/s41597-020-00675-z](https://doi.org/10.1038/s41597-020-00675-z))
The LUCAS Semantic Segmentation Dataset is provided under CDLA-Permissive-1.0 License