Prediction of secretion systems

The “EffectiveS346” software has been implemented for the Effective database in order to predict secretion systems encoded in bacterial genomes and to provide a clear “yes/no” prediction on whether they are sufficiently complete or not. This method predicts type III, IV and VI secretion systems, which are all able to inject eukaryotic cells and to directly transfer proteins into the host cell cytosol.

The input data for EffectiveS346 are genotype files listing COGs and NOGs according to EggNOG 4.0, which are present in a particular genome. For any user-submitted FASTA file an optional check of genome completeness by CheckM is provided. According to the prediction accuracy of our models for incomplete genomes we recommend that at a major fraction (default = 85%) of the marker genes should be present in order to obtain reliable secretion system predictions. The COGnitor program predicts orthologous groups which serve as input for EffectiveS346. Alternatively, the user may submit own lists of EggNOG 4.0 COGs and NOGs present in a genome. In this case, the time-consuming homology search by COGnitor is omitted.

With the genotype list as input, the three models calculate binary classifications whether the input species contains functional Type III, IV and VI secretion systems. These predictions feature a mean balanced accuracy of over 90% and a standard deviation of below 4.5%. EffectiveS346 additionally provides lists of the 100 most significant COGs in regard to each classification and lists of those COGs that are contained in the KEGG maps of the three secretion systems.

Performance of EffectiveS346

Results of cross validation for PICA models predicting Type III, IV and VI in EffectiveS346

Reliability of PICA models predicting Type III, IV and VI in EffectiveS346 in incomplete genomes