Comparative genomics of the Microviridae. Part 4
The ΦCA82 genome and ORFs were aligned with selected microvirus sequences using ClustalW [26]. Putative ORFs within the ΦCA82 genome were predicted using the FGENESV Trained Pattern/Markov chain-based viral gene prediction method from the Softberry website. Searches for conserved domains within the ΦCA82 genome were performed with the Conserved Domain Database (CDD) Search Service v2.17 at the National Center for Biotechnology Information (NCBI) website.
Comparative genomics of the Microviridae
The sequence of phage ΦCA82 was compared to 14 other members of the Microviridae (Table 1) obtained from the integrated microbial genomes (IMG) system [29]. To first determine nucleotide level similarities, tetra-nucleotide comparisons between genomes were performed with jspecies [30]. Pairwise genome comparisons were based on regressions of normalized tetra-nucleotide frequency counts and the distributions of the R2 values from these comparisons were visualized in R. To compare genomes based on similarity of predicted gene sequences, the program CD-HIT [32] was used.
Genomic functional comparisons were based on pfam categories for each predicted gene as classified by the IMG annotation pipeline. A data table of pfam categories and gene counts for each genome was used to construct a similarity matrix and dendrogram in R. To determine which predicted genes were unique to ΦCA82 and those which were shared with other Microviridae members, the Microviridae pangenome was constructed as the union of all predicted genes from the 14 Microviridae genomes and compared to predicted genes for ΦCA82 using both CD-HIT and our data analysis pipeline as described above and blastp run with default parameters except for an e-value cutoff of 0.01.
Results and Discussion
The entire circular, single-stranded nucleotide sequence for the uncultured microvirus ΦCA82 genome was determined to be 5,514 nucleotides. The complete genome sequence had a nucleotide composition of A (38.6%), C (19.6%), G (20.1%), and T (21.6%) with an overall G + C content of 39.7%, which is similar to the chlamydial phages (37-40%). The ΦCA82 genome was organized in a modular arrangement similar to microviruses and encoded predicted proteins homologous to those chlamydial bacteriophage types and to the Bdellovibrio bacteriovorus ΦMH2K. The coding capacity of the genome is 91% as it encodes ten ORFs, greater than 99 nucleotides similarly to other chlamydial microvirus genomes [35]. The genome size, number of ORFs and total coding % of nucleotides as depicted in Figure 1 is larger than most of the chlamydial phages and is closer in size to the ΦX174 genome.