Article CC BY 4.0
refereed
published

A deep-learning based pipeline for estimating the abundance and size of aquatic organisms in an unconstrained underwater environment from continuously captured stereo video

ORCID
0000-0002-6196-9558
Affiliation
Institute of Applied Computer Science, Kiel University of Applied Sciences, Kiel, Germany
Böer, Gordon;
GND
143445871
Affiliation
GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Gröger, Joachim Paul;
Affiliation
Institute of Applied Computer Science, Kiel University of Applied Sciences, Kiel, Germany
Badri-Höher, Sabah;
GND
1018955925
ORCID
0000-0002-1130-6107
Affiliation
Thünen Institute of Sea Fisheries, Bremerhaven, Germany
Cisewski, Boris;
Affiliation
Fraunhofer IOSB, IOSB-AST Ilmenau, Fraunhofer Institute of Optronics, System Technologies and Image Exploitation, Ilmenau, Germany
Renkewitz, Helge;
GND
1175145491
Affiliation
GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Mittermayer, Felix;
Affiliation
GEOMAR Helmholtz Centre for Ocean Research Kiel, Kiel, Germany
Strickmann, Tobias;
GND
13163576X
Affiliation
Institute of Applied Computer Science, Kiel University of Applied Sciences, Kiel, Germany
Schramm, Hauke

The utilization of stationary underwater cameras is a modern and well-adapted approach to provide a continuous and cost-effective long-term solution to monitor underwater habitats of particular interest. A common goal of such monitoring systems is to gain better insight into the dynamics and condition of populations of various marine organisms, such as migratory or commercially relevant fish taxa. This paper describes a complete processing pipeline to automatically determine the abundance, type and estimate the size of biological taxa from stereoscopic video data captured by the stereo camera of a stationary Underwater Fish Observatory (UFO). A calibration of the recording system was carried out in situ and, afterward, validated using the synchronously recorded sonar data. The video data were recorded continuously for nearly one year in the Kiel Fjord, an inlet of the Baltic Sea in northern Germany. It shows underwater organisms in their natural behavior, as passive low-light cameras were used instead of active lighting to dampen attraction effects and allow for the least invasive recording possible. The recorded raw data are pre-filtered by an adaptive background estimation to extract sequences with activity, which are then processed by a deep detection network, i.e., Yolov5. This provides the location and type of organisms detected in each video frame of both cameras, which are used to calculate stereo correspondences following a basic matching scheme. In a subsequent step, the size and distance of the depicted organisms are approximated using the corner coordinates of the matched bounding boxes. The Yolov5 model employed in this study was trained on a novel dataset comprising 73,144 images and 92,899 bounding box annotations for 10 categories of marine animals. The model achieved a mean detection accuracy of 92.4%, a mean average precision (mAP) of 94.8% and an F1 score of 93%.

Preview

Cite

Citation style:
Could not load citation form.

Access Statistic

Total:
Downloads:
Abtractviews:
Last 12 Month:
Downloads:
Abtractviews:

Rights

Use and reproduction: