DIAMOND
File helpers
- magna.diamond.read_diamond_output(path)
Return a pandas DataFrame from a DIAMOND output file.
- Columns:
query- the accession of the sequence that was searched against the database, as specified in the input FASTA file after the > character until the first blank.reference- the accession of the target database sequence that the query was aligned againstidentity- the percentage of identical amino acid residues that were aligned against each other in the local alignmentlength- the total length of the local alignment, which including matching and mismatching positions of query and subject, as well as gap positions in the query and subject.mismatches- the number of non-identical amino acid residues aligned against each other.gap_openings- the number of gap openings.query_start- the starting coordinate of the local alignment in the query (1-based).query_end- the ending coordinate of the local alignment in the query (1-based).target_start- the starting coordinate of the local alignment in the subject (1-based).target_end- the ending coordinate of the local alignment in the subject (1-based).e_value- the expected value of the hit quantifies the number of alignments of similar or better quality that you expect to find searching this query against a database of random sequences the same size as the actual target database. This number is most useful for measuring the significance of a hit. By default, DIAMOND will report all alignments with e-value < 0.001, meaning that a hit of this quality will be found by chance on average once per 1,000 queries.bit_score- the bit score is a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar. It is always >= 0 for local Smith Waterman alignments.
- Parameters
path (
str) – The path to the DIAMOND output file.- Return type
DataFrame