Selenium (Se) is an essential trace element for the growth, development, and normal metabolic activities of a wide range of organisms, which is primarily incorporated into selenoproteins in the form of the 21st amino acid, selenocysteine (Sec). The biosynthesis of Sec and its insertion into selenoproteins involves a specialized molecular mechanism that recodes in-frame UGA codons (normally stop signals) to serve as Sec codons. To date, more than 100 selenoprotein families have been identified in bacteria, nearly three times the number found in eukaryotes. However, unlike eukaryotes, for which several selenoprotein databases have been established, there is currently no systematic database specifically for bacterial selenoproteins. The development of such a resource is urgently needed to advance our understanding of the diversity, functions, and evolutionary trajectories of bacterial selenoproteins.
To fill this gap, we have constructed BSepDB (Bacterial Selenoprotein DataBase), a specialized database dedicated to systematically cataloging bacterial selenoprotein genes and proteins. BSepDB has a user-friendly web interface, making it accessible to all users. This database not only offers comprehensive information for researchers in the field of Se but also serves as a reliable foundation for the annotation of bacterial selenoproteomes in genomic sequencing projects.
BSepDB is an ongoing research project with long-term objectives. Here, we present an initial yet comprehensive version of this resource. In the future, we plan to regularly update our database. This will involve incorporating newly discovered selenoprotein families and newly sequenced bacterial species, as well as integrating additional tools to enhance the functionality and practical utility of the database.
At present, the database (BSepDB) encompasses 57922 selenoprotein entries. These entries span 112 previously documented selenoprotein families/subfamilies and cover 16,649 bacterial organisms, spanning across 88 different taxonomic clades (phylum, class, and/or order levels).
The "Browse by Species" page presents the evolutionary relationships among species stored in BSepDB through a phylogenetic tree, illustrating taxonomic branches and the number of species within each branch. Clicking any branch navigates to a detailed table showing species-specific selenoprotein entry information. Each entry is organized in a dynamic table format that includes the following key fields:
- Taxa: taxonomic classification;
- Species name: name of species;
- Strain: strain identifier;
- Accession number: GenBank accession number;
- Selenoprotein name: name of selenoprotein family;
- Domain: protein domain;
- Protein sequence: amino acid sequence;
- Nucleotide sequence: Nucleic acid sequence;
- SECIS element: predicted SECIS element.
Most of the columns in the table support sorting, allowing for easy data organization. Users can click "Show sequence" in the "Protein sequence" and "Nucleotide sequence" columns to view the relevant sequences. The Sec residue (symbolized by U) and Sec-UGA codon are highlighted in red for easy identification. Also, when users click "Show SECIS" in the "SECIS element" column, they can view the SECIS structure predicted by the bSECISearch program, displayed in dot-bracket notation.
The "Browse by Selenoprotein Family" page provides a list of all selenoprotein families available in BSepDB. It displays their names side by side with the corresponding entry counts within an interactive and sortable table. A quick search box is also included to help users quickly locate specific selenoprotein families of interest. Clicking any family opens a detailed entry table with fields identical to those shown on the "Browse by Species" page. As this browsing function is targeted at specific selenoprotein families, the contents in the "Selenoprotein name" and "Domain" fields are identical, which are associated with the selenoprotein family chosen by the user.
The "Search" page offers users two distinct approaches to query the database.
Option 1: Species-Selenoprotein family search
This option follows a step-by-step process.
Step 1. Select species:
Select single or multiple species from hierarchical taxonomic ranks in the order of phylum, class, order, and species (selecting a given rank includes all species under that rank);
Step 2. Select protein family:
Select single or multiple selenoprotein families through a dropdown menu (if no family is selected, it defaults to all selenoprotein families).
Option 2: Keyword search
In this option, users can search the database by entering keywords with the option to restrict searches to specific fields, e.g., Species name, selenoprotein name.
The output results generated by both search methods are presented in an identical tabular format to that of the detailed view on the "Browse by Species" page.
The "BLAST" page provides users with NCBI BLAST tools to search against the database.
To use BLAST search, a user needs to input a query protein or nucleic acid sequence, and select the BLAST programs (BLASTN/BLASTP/BLASTX/TBLASTN/TBLASTX). The system will automatically determine the corresponding protein or nucleic acid database based on the BLAST program selected by the user. User can also modify other default settings, such as E-value, scoring matrix, and the number of target sequences displayed in output results.
On the result page, each homolog of the query sequence is presented with comprehensive information, including description, species name, max score, query coverage, E-value, percent identity, accession length and accession number. Detailed information about the alignment can be viewed by clicking on a specific homolog.
The "Statistics" page functions as a real-time monitoring instrument for users. It shows the distribution of species and selenoproteins within the current database, such as the top 20 species with the highest number of selenoprotein entries, the top 10 selenoprotein families, and the top 10 bacterial phyla. The information is presented via diverse graphical methods (such as pie charts and horizontal bar charts). Whenever new organisms and selenoproteins are incorporated into BSepDB, the page will automatically update the relevant information.
Users have the option to download the entire selenoprotein gene or protein database from this page.
Download Options:
- Protein Sequence (FASTA): Download the complete protein database in FASTA format (available in both zip and regular file types)
- Nucleotide Sequence (FASTA): Download the complete nucleotide database in FASTA format (available in both zip and regular file types)