DIAMOND
File helpers
- magna.diamond.read_diamond_output(path)
Return a pandas DataFrame from a DIAMOND output file.
- Columns:
query
- the accession of the sequence that was searched against the database, as specified in the input FASTA file after the > character until the first blank.reference
- the accession of the target database sequence that the query was aligned againstidentity
- the percentage of identical amino acid residues that were aligned against each other in the local alignmentlength
- the total length of the local alignment, which including matching and mismatching positions of query and subject, as well as gap positions in the query and subject.mismatches
- the number of non-identical amino acid residues aligned against each other.gap_openings
- the number of gap openings.query_start
- the starting coordinate of the local alignment in the query (1-based).query_end
- the ending coordinate of the local alignment in the query (1-based).target_start
- the starting coordinate of the local alignment in the subject (1-based).target_end
- the ending coordinate of the local alignment in the subject (1-based).e_value
- the expected value of the hit quantifies the number of alignments of similar or better quality that you expect to find searching this query against a database of random sequences the same size as the actual target database. This number is most useful for measuring the significance of a hit. By default, DIAMOND will report all alignments with e-value < 0.001, meaning that a hit of this quality will be found by chance on average once per 1,000 queries.bit_score
- the bit score is a scoring matrix independent measure of the (local) similarity of the two aligned sequences, with higher numbers meaning more similar. It is always >= 0 for local Smith Waterman alignments.
- Parameters
path (
str
) – The path to the DIAMOND output file.- Return type
DataFrame