DIAMOND

File helpers

magna.diamond.read_diamond_output(path)

Return a pandas DataFrame from a DIAMOND output file.

Columns:

query - the accession of the sequence that was searched against the database, as specified in the input FASTA file after the > character until the first blank.
reference - the accession of the target database sequence that the query was aligned against
identity - the percentage of identical amino acid residues that were aligned against each other in the local alignment
length - the total length of the local alignment, which including matching and mismatching positions of query and subject, as well as gap positions in the query and subject.
mismatches - the number of non-identical amino acid residues aligned against each other.
gap_openings - the number of gap openings.
query_start - the starting coordinate of the local alignment in the query (1-based).
query_end - the ending coordinate of the local alignment in the query (1-based).
target_start - the starting coordinate of the local alignment in the subject (1-based).
target_end - the ending coordinate of the local alignment in the subject (1-based).
e_value - the expected value of the hit quantifies the number of alignments of similar or better quality that you expect to find searching this query against a database of random sequences the same size as the actual target database. This number is most useful for measuring the significance of a hit. By default, DIAMOND will report all alignments with e-value < 0.001, meaning that a hit of this quality will be found by chance on average once per 1,000 queries.
bit_score - the bit score is a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar. It is always >= 0 for local Smith Waterman alignments.

Parameters: path (str) – The path to the DIAMOND output file.
Return type: DataFrame