New perspectives for the Annelida collection (National Museum/UFRJ) database: using data visualization to analyze and manage biological collections
DOI:
https://doi.org/10.1590/Keywords:
Polychaetes, Biological collection, Management, Interactive visual representationsAbstract
Collection management faces many challenges in keeping stored items preserved and the information associated
with them accurate and organized. It is essential for the expansion and use of this biodiversity repository that the
database is unambiguous and that errors are quickly identified and corrected. This work aims to show the use of
interactive visual representations (IVRs) of the collection’s metadata as tools to inspect the data and help solve
these challenges. To do this, we used the Annelida collection database from the National Museum (MN) of the
Federal University of Rio de Janeiro (UFRJ). Interactive graphs of the metadata within this database (catalog
date, taxonomic identification and determiners, sampling, depth, geographic localization, and collector data) were
created with the Altair library in the Python 3 language. Data analyses using these graphs made it possible to
identify anomalous patterns in the data and fill in missing records. They also provided an understanding of the
spatial and bathymetric distribution of the specimens deposited over time, and the growth rate of the collection in
each family, thus projecting future growth and solutions for the physical organization of vials. Graphs are an ally
in the management of collections with digital entry forms and aim to facilitate the availability of metadata associated
with cataloged specimens. Likewise, IVRs can even be used to give credit to the researchers involved in building
biological collections. Thus, visualization tools are efficient in recognizing global patterns present in databases and
solving biological collection management tasks.
References
Ariño, A. H. 2010. Approaches to estimating the universe
of natural history collections data. Biodiversity
Informatics, 7(2), 81-92.
Beaman, R. & Cellinese, N. 2012. Mass digitization of
scientific collections: New opportunities to transform the
use of biological specimens and underwrite biodiversity
science. ZooKeys, 209, 7–17.
Blagoderov, V., Kitching, I., Livermore, L., Simonsen, T.
& Smith, V. 2012. No specimen left behind: industrial
scale digitization of natural history collections. ZooKeys,
, 133–146.
Cook, J. A., Edwards, S. V., Lacey, E. A., Guralnick, R. P.,
Soltis, P. S., Soltis, D. E., Welch, C. K., Bell, K. C.,
Galbreath, K. E., Himes, C., Allen, J. M., Heath, T. A.,
Carnaval, A. C., Cooper, K. L., Liu, M., Hanken, J. & IckertBond, S. 2014. Natural history collections as emerging
resources for innovative education. BioScience,
(8), 725-734.
Comoglio, F., Fracchia, L. & Rinaldi, M. 2013. Bayesian
Inference from Count Data Using Discrete Uniform
Priors. PLoS ONE, 8(10), e74388.
Fayyad, U., Grinstein, G. G. & Wierse, A. 2001. Information
visualization in data mining and knowledge discovery.
Burlington, Morgan Kaufmann Publishers
Graham, C., Ferrier, S., Huettman, F., Moritz, C. & Peterson, A.
New developments in museum-based informatics
and applications in biodiversity analysis. Trends in
Ecology & Evolution, 19(9), 497–503.
Guisan, A. & Thuiller, W. 2005. Predicting species
distribution: offering more than simple habitat models.
Ecology Letters, 8(9), 993–1009.
He, P., Chen, J., Kong, H., Cai, L. & Qiao, G. 2021.
Important Supporting Role of Biological Specimen in
Biodiversity Conservation and Research. Bulletin of
Chinese Academy of Sciences, 38(12), 11.
Hedrick, B. P., Heberling, J. M., Meineke, E. K., Turner, K. G.,
Grassa, C. J., Park, D. S., Kennedy, J., Clarke, J. A.,
Cook, J. A., Blackburn, D. C., Edwards, S. V. & Davis, C. C.
Digitization and the Future of Natural History
Collections. BioScience, 70(3), 243–251.
Hutchings, P. 1998. Biodiversity and functioning of
polychaetes in benthic sediments. Biodiversity and
Conservation, 7(9), 1133–1145.
Jin, J. & Yang, J. 2020. BDcleaner: A workflow for cleaning
taxonomic and geographic errors in occurrence data
archived in biodiversity databases. Global Ecology
and Conservation, 21, e00852.
Johnson, K. R., Owens, I. F. P. & The Global Collection
Group. 2023. A global approach for natural history
museum collections. Science, 379(6638), 1192–1194.
Keim, D. A. 2002. Information visualization and visual
data mining. IEEE Transactions on Visualization and
Computer Graphics, 8(1), 1–8.
Krishtalka, L. & Humphrey, P. S. 2000. Can Natural History
Museums Capture the Future? BioScience, 50(7), 611 -617.
Lana, P. C. & Bernardinho, A. F. (ed.). 2018. Brazilian
Estuaries. Cham: Springer International Publishing.
Liu, S., Andrienko, G., Wu, Y., Cao, N., Jiang, L., Shi, C.,
Wang, Y. S. & Hong, S. 2018. Steering data quality
with visual analytics: The complexity challenge. Visual
Informatics, 2(4), 191–197.
Liu, S., Cui, W., Wu, Y. & Liu, M. 2014. A survey on
information visualization: recent advances and
challenges. The Visual Computer, 30(12), 1373–1393.
Medeiros e Sá, A., Oliveira, F. A., Schneider, B., Echavarria,
K. R. & Serejo, C. S. 2022. Visually Overviewing
Biodiversity Open Data Digital Collections. In:
Proceedings of the Symposium on Open Data and
Knowledge for a Post-Pandemic Era ODAK22, UK.
Messias, C. S. M. A., Fonseca, C., Santos, M., Sá E
Medeiros, A. & Zanol, J. 2023. New perspectives of
Annelida collection (National Museum/UFRJ) database:
using data visualization to analyze and manage
biological collections. Ocean and Coastal Research.
https://doi.org/10.5281/zenodo.8092072
Meyer, C., Weigelt, P. & Kreft, H. 2016. Multidimensional
biases, gaps and uncertainties in global plant occurrence
information. Ecology Letters, 19(8), 992–1006.
Miller, M. & Vielfaure, N. 2022. OpenRefine: An
Approachable Open Tool to Clean Research Data.
Bulletin - Association of Canadian Map Libraries and
Archives), (170), 2-8.
National Academies of Sciences, Engineering and Medicine.
Biological Collections: Ensuring Critical Research
and Education for the 21st Century. Washington, DC,
National Academies Press.
Page, L. M., Macfadden, B. J., Fortes, J. A., Soltis, P. S. &
Riccardi, G. 2015. Digitization of Biodiversity Collections
Reveals Biggest Data on Biodiversity. BioScience,
(9), 841–842.
Peterson, A. T., Navarro-Sigüenza, A. G. & Pereira, R. S.
Detecting errors in biodiversity data based on
Annelida collection management with data visualization
Ocean and Coastal Research 2024, v72(suppl 1):e24016 15
Messias et al.
collectors’ itineraries. Bulletin of the British Ornithologists
Club, 124, 143–151.
Ribeiro, B. R., Velazco, S. J. E., Guidoni-Martins, K.,
Tessarolo, G., Jardim, L., Bachman, S. P. & Loyola, R.
bdc: A toolkit for standardizing, integrating and
cleaning biodiversity data. Methods in Ecology and
Evolution, 13(7), 1421–1428.
Rouhan, G., Dorr, L. J., Gautier, L., Clerc, P., Muller, S.
& Gaudeul, M. 2017. The time has come for Natural
History Collections to claim co‐authorship of research
articles. TAXON, 66(5), 1014–1016.
Scott, B., Baker, E., Woodburn, M., Vincent, S., Hardy, H.
& Smith, V. S. 2019. The Natural History Museum Data
Portal. Database, 2019, baz038.
Shiravi, H., Shiravi, A. & Ghorbani, A. A. 2012. A Survey
of Visualization Systems for Network Security. IEEE
Transactions on Visualization and Computer Graphics,
(8), 1313–1329.
Shnneiderman, B. 1996. The eyes have it: a task by
data type taxonomy for information visualizations. In:
Proceedings IEEE Symposium on Visual Languages
(pp. 336–343). Boulder: IEEE Computer Society Press.
Suarez, A. V. & Tsutsui, N. D. 2004. The Value of
Museum Collections for Research and Society.
BioScience, 54(1), 66-74.
Wang, R., Perez-Riverol, Y., Hermjakob, H. & Vizcaíno,
J. A. 2015. Open source libraries and frameworks for
biological data visualisation: A guide for developers.
PROTEOMICS, 15(8), 1356–1374.
Wilson, S. L., Way, G. P., Bittremieux, W., Armache,
J., Haendel, M. A. & Hoffman, M. M. 2021. Sharing
biological data: why, when, and how. FEBS Letters,
(7), 847–863.
Xu, J., Wu, S. & Li, X. 2007. Estimating Collection Size with
Logistic Regression. In: Proceedings of the 30th annual
international ACM SIGIR conference on Research and
development in information retrieval (pp. 789-790).
New York, ACM.
Zizka, A., Silvestro, D., Andermann, T., Azevedo, J.,
Duarte, C. R., Edler, D., Farooq, H., Herdean, A.,
Ariza, M., Scharn, R., Svantesson, S., Wengström,
N., Zizka, V. & Antonelli, A. 2019. CoordinateCleaner:
Standardized cleaning of occurrence records from
biological collection databases. Methods in Ecology and
Evolution, 10(5), 744–751.