ARB
Classes | Macros | Enumerations | Functions | Variables
ed4_protein_2nd_structure.hxx File Reference

Adds support for protein structure prediction, comparison of two protein secondary structures and of amino acid sequences with protein secondary structures as well as visualization of the match quality in EDIT4. More...

#include "aw_window.hxx"
Include dependency graph for ed4_protein_2nd_structure.hxx:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Classes

struct  name_value_pair
 Defines a name-value pair (e.g. for awars, menu entries, etc.). More...
 

Macros

#define PFOLD_AWAR_ENABLE   "Pfold/enable"
 Enable structure match. More...
 
#define PFOLD_AWAR_SELECTED_SAI   "Pfold/selected_SAI"
 Selected reference protein secondary structure SAI (i.e. the SAI that is used for structure comparison). More...
 
#define PFOLD_AWAR_PAIR_TEMPLATE   "Pfold/pairs/%s"
 Structure pairs that define the match quality (see pfold_pairs) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT. More...
 
#define PFOLD_AWAR_SYMBOL_TEMPLATE   "Pfold/symbols/%s"
 Symbols for the match quality as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT. More...
 
#define PFOLD_AWAR_SYMBOL_TEMPLATE_2   "Pfold/symbols2"
 Symbols for the match quality as used for match method SECSTRUCT_SEQUENCE. More...
 
#define PFOLD_AWAR_MATCH_METHOD   "Pfold/match_method"
 Selected method for computing the match quality (see PFOLD_MATCH_METHOD). More...
 
#define PFOLD_AWAR_SAI_FILTER   "Pfold/SAI_filter"
 Filter SAIs for given criteria (string); used in option menu for SAI selection. More...
 
#define PFOLD_PAIRS   6
 
#define PFOLD_PAIR_CHARS_2   "##++~~-- "
 Symbols for the match quality as used for match method SECSTRUCT_SEQUENCE in ED4_pfold_calculate_secstruct_match(). More...
 
#define cf_former(aa, strct)   ((strct!=2) ? cf_parameters[aa][strct] : cf_parameters_norm[aa][strct])
 Returns the former value of an amino acid depending on the given structure type. More...
 
#define cf_breaker(aa, strct)   ((strct!=2) ? cf_parameters[aa][strct+2] : 0)
 Returns the breaker value of an amino acid depending on the given structure type. More...
 

Enumerations

enum  PFOLD_STRUCTURE { ALPHA_HELIX = 0, BETA_SHEET = 1, BETA_TURN = 2, STRUCTURE_SUMMARY = 3 }
 Protein secondary structure types. More...
 
enum  PFOLD_MATCH_TYPE {
  STRUCT_PERFECT_MATCH, STRUCT_GOOD_MATCH, STRUCT_MEDIUM_MATCH, STRUCT_BAD_MATCH,
  STRUCT_NO_MATCH, STRUCT_UNKNOWN, PFOLD_MATCH_TYPE_COUNT
}
 Match quality for secondary structure match. More...
 
enum  PFOLD_MATCH_METHOD { SECSTRUCT_SECSTRUCT, SECSTRUCT_SEQUENCE, SECSTRUCT_SEQUENCE_PREDICT, PFOLD_MATCH_METHOD_COUNT }
 Defines the methods for match computation. For details refer to ED4_pfold_calculate_secstruct_match(). More...
 

Functions

GB_ERROR ED4_pfold_calculate_secstruct_match (const unsigned char *structure_sai, const unsigned char *structure_cmp, int start, int end, char *result_buffer, PFOLD_MATCH_METHOD match_method=SECSTRUCT_SEQUENCE)
 Compares a protein secondary structure with a primary structure or another secondary structure. More...
 
GB_ERROR ED4_pfold_set_SAI (char **protstruct, GBDATA *gb_main, const char *alignment_name, long *protstruct_len=NULp)
 Sets the reference protein secondary structure SAI. More...
 
AW_windowED4_pfold_create_props_window (AW_root *awr, const WindowCallback *refreshCallback)
 Creates the "Protein Match Settings" window. More...
 

Variables

name_value_pair pfold_match_type_awars []
 Awars for the match type; binds the PFOLD_MATCH_TYPE to the corresponding awar name. More...
 
charpfold_pairs [PFOLD_PAIRS]
 Match pair definition (see PFOLD_MATCH_TYPE) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT in ED4_pfold_calculate_secstruct_match(). More...
 
charpfold_pair_chars [PFOLD_PAIRS]
 Symbols for the match quality (defined by PFOLD_MATCH_TYPE) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT in ED4_pfold_calculate_secstruct_match(). More...
 

Detailed Description

Adds support for protein structure prediction, comparison of two protein secondary structures and of amino acid sequences with protein secondary structures as well as visualization of the match quality in EDIT4.

Author
Markus Urban
Date
2008-02-08

This file contains functions that predict a protein secondary structure from its primary structure (i.e. the amino acid sequence) and for visualizing how good a sequence matches a given secondary structure. Two secondary structures can be compared, too. The initial values for the match symbols and other settings are defined here, as well as functions that create a "Protein Match Settings" window allowing the user to change the default properties for match computation.

See also
The functions for protein structure prediction are based on a statistical method known as Chou-Fasman algorithm. For details refer to "Chou, P. and Fasman, G. (1978). Prediction of the secondary structure of proteins from their amino acid sequence. Advanced Enzymology, 47, 45-148.".
Attention
The used method for secondary structure prediction is fast which was the main reason for choosing it. Performance is important for a large number of sequences loaded in the editor. However, it is not very accurate and should only be used as rough estimation. For our purpose, the algorithm as well as own adaptions to it are used to get an approximate overview if a given amino acid sequence does not match a certain secondary structure.

Definition in file ed4_protein_2nd_structure.hxx.

Macro Definition Documentation

#define PFOLD_AWAR_ENABLE   "Pfold/enable"
#define PFOLD_AWAR_SELECTED_SAI   "Pfold/selected_SAI"

Selected reference protein secondary structure SAI (i.e. the SAI that is used for structure comparison).

Definition at line 40 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars(), ED4_pfold_create_props_window(), ED4_pfold_select_SAI_and_update_option_menu(), ED4_pfold_set_SAI(), and setup_pfold_config().

#define PFOLD_AWAR_PAIR_TEMPLATE   "Pfold/pairs/%s"

Structure pairs that define the match quality (see pfold_pairs) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT.

Definition at line 41 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars(), ED4_pfold_calculate_secstruct_match(), ED4_pfold_create_props_window(), and setup_pfold_config().

#define PFOLD_AWAR_SYMBOL_TEMPLATE   "Pfold/symbols/%s"
#define PFOLD_AWAR_SYMBOL_TEMPLATE_2   "Pfold/symbols2"

Symbols for the match quality as used for match method SECSTRUCT_SEQUENCE.

Definition at line 43 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars(), ED4_pfold_calculate_secstruct_match(), ED4_pfold_create_props_window(), and setup_pfold_config().

#define PFOLD_AWAR_MATCH_METHOD   "Pfold/match_method"

Selected method for computing the match quality (see PFOLD_MATCH_METHOD).

Definition at line 44 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars(), ED4_pfold_create_props_window(), ED4_show_protein_match_on_device(), and setup_pfold_config().

#define PFOLD_AWAR_SAI_FILTER   "Pfold/SAI_filter"

Filter SAIs for given criteria (string); used in option menu for SAI selection.

Definition at line 45 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars(), ED4_pfold_create_props_window(), ED4_pfold_select_SAI_and_update_option_menu(), and setup_pfold_config().

#define PFOLD_PAIRS   6
#define PFOLD_PAIR_CHARS_2   "##++~~-- "

Symbols for the match quality as used for match method SECSTRUCT_SEQUENCE in ED4_pfold_calculate_secstruct_match().

The ten symbols represent the match quality ranging from 0 - 100% in steps of 10%.

Definition at line 100 of file ed4_protein_2nd_structure.hxx.

Referenced by ed4_create_all_awars().

#define cf_former (   aa,
  strct 
)    ((strct!=2) ? cf_parameters[aa][strct] : cf_parameters_norm[aa][strct])

Returns the former value of an amino acid depending on the given structure type.

The definition is used for method SECSTRUCT_SEQUENCE in ED4_pfold_calculate_secstruct_match() to get the former value of an amino acid depending on the found structure type at its position. It addresses cf_parameters for ALPHA_HELIX and BETA_SHEET and cf_parameters_norm for BETA_TURN.

Definition at line 117 of file ed4_protein_2nd_structure.hxx.

Referenced by ED4_pfold_calculate_secstruct_match().

#define cf_breaker (   aa,
  strct 
)    ((strct!=2) ? cf_parameters[aa][strct+2] : 0)

Returns the breaker value of an amino acid depending on the given structure type.

The definition is used for method SECSTRUCT_SEQUENCE in ED4_pfold_calculate_secstruct_match() to get the breaker value of an amino acid depending on the found structure type at its position. It addresses cf_parameters for ALPHA_HELIX and BETA_SHEET and returns 0 for BETA_SHEET, because it has no breaker values.

Definition at line 127 of file ed4_protein_2nd_structure.hxx.

Referenced by ED4_pfold_calculate_secstruct_match().

Enumeration Type Documentation

Protein secondary structure types.

Defines the various types of protein secondary structure. The order (or at least the individual values) are important, because they are used to access various arrays.

Enumerator
ALPHA_HELIX 

Alpha-helix.

BETA_SHEET 

Beta-sheet.

BETA_TURN 

Beta-turn.

STRUCTURE_SUMMARY 

Structure summary.

Definition at line 55 of file ed4_protein_2nd_structure.hxx.

Match quality for secondary structure match.

Enumerator
STRUCT_PERFECT_MATCH 

Perfect match.

STRUCT_GOOD_MATCH 

Good match.

STRUCT_MEDIUM_MATCH 

Medium match.

STRUCT_BAD_MATCH 

Bad match.

STRUCT_NO_MATCH 

No match.

STRUCT_UNKNOWN 

Unknown structure.

PFOLD_MATCH_TYPE_COUNT 

Number of match types.

Definition at line 72 of file ed4_protein_2nd_structure.hxx.

Defines the methods for match computation. For details refer to ED4_pfold_calculate_secstruct_match().

Enumerator
SECSTRUCT_SECSTRUCT 

Compare two protein secondary structures.

SECSTRUCT_SEQUENCE 

Compare an amino acid sequence with a reference protein secondary structure.

SECSTRUCT_SEQUENCE_PREDICT 

Compare a full prediction of the protein secondary structure from its amino acid sequence with a reference protein secondary structure.

PFOLD_MATCH_METHOD_COUNT 

Number of match methods.

Definition at line 103 of file ed4_protein_2nd_structure.hxx.

Function Documentation

GB_ERROR ED4_pfold_calculate_secstruct_match ( const unsigned char structure_sai,
const unsigned char structure_cmp,
int  start,
int  end,
char result_buffer,
PFOLD_MATCH_METHOD  match_method = SECSTRUCT_SEQUENCE 
)

Compares a protein secondary structure with a primary structure or another secondary structure.

Parameters
[in]structure_saiReference protein structure SAI (secondary structure)
[in]structure_cmpProtein structure to compare (primary or secondary structure)
[in]startThe start of the match computation (visible area in editor)
[in]endThe end of the match computation (visible area in editor)
[out]result_bufferResult buffer for match symbols
[in]match_methodMethod for structure match computation
Returns
Error description, if an error occurred; 0 otherwise

This function compares a protein secondary structure with a primary structure (= amino acid sequence) or another secondary structure depending on match_method.

Match method SECSTRUCT_SECSTRUCT:
Two secondary structures are compared one by one using the criteria defined by pfold_pairs. The match symbols are taken from pfold_pair_chars.
Match method SECSTRUCT_SEQUENCE:
An amino acid sequence is compared with a secondary structure by taking cohesive parts of the structure - gaps in the alignment are skipped - and computing the normalized difference of former and breaker values for this region in the given sequence such that a value from 0 - 100% for the match quality is generated. By dividing this value into steps of 10% it is mapped to the match symbols defined by PFOLD_PAIR_CHARS_2. Note that bends ('S') are assumed to fit everywhere (=> best match symbol), and if a structure is encountered but no corresponding amino acid the worst match symbol is chosen.
Match method SECSTRUCT_SEQUENCE_PREDICT:
An amino acid sequence is compared with a secondary structure using a full prediction of the secondary structure from its sequence via ED4_pfold_predict_structure() and comparing it one by one with the reference structure. Note that not the structure summary is used for comparison, but the individual predicted structure types as returned in structures[4]. The match criteria are defined in pfold_pairs which is searched in ascending order, i.e. good matches first, then the worse ones. If a match is found the corresponding match symbol (as defined by pfold_pair_chars) is chosen. Note that if a structure is encountered but no corresponding amino acid the worst match symbol is chosen.

The match criteria (for SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT) as well as the match symbols (for all methods) can be adjusted by the user in the "Protein Match Settings" dialog. The result of the match computation (i.e. the match symbols) is written to the result buffer.

Definition at line 690 of file ED4_protein_2nd_structure.cxx.

References ALPHA_HELIX, ARB_strdup(), ED4_root::aw_root, AW_root::awar(), BETA_SHEET, BETA_TURN, cf_breaker, cf_former, e4_assert, ED4_AWAR_GAP_CHARS, ED4_pfold_init_statics(), ED4_pfold_predict_structure(), ED4_pfold_round_sym(), ED4_ROOT, error(), GB_export_error(), length, max_breaker_value, max_former_value, min, min_former_value, name_value_pair::name, NULp, PFOLD_AWAR_PAIR_TEMPLATE, PFOLD_AWAR_SYMBOL_TEMPLATE, PFOLD_AWAR_SYMBOL_TEMPLATE_2, PFOLD_MATCH_METHOD_COUNT, PFOLD_MATCH_TYPE_COUNT, AW_awar::read_string(), SECSTRUCT_SECSTRUCT, SECSTRUCT_SEQUENCE, SECSTRUCT_SEQUENCE_PREDICT, start, STRUCT_NO_MATCH, STRUCT_PERFECT_MATCH, and STRUCT_UNKNOWN.

Referenced by ED4_show_protein_match_on_device().

GB_ERROR ED4_pfold_set_SAI ( char **  protstruct,
GBDATA gb_main,
const char alignment_name,
long protstruct_len = NULp 
)

Sets the reference protein secondary structure SAI.

Parameters
[out]protstructPointer to reference protein secondary structure SAI
[in]gb_mainMain database
[in]alignment_nameName of the alignment to search for
[out]protstruct_lenLength of reference protein secondary structure SAI
Returns
Error description, if an error occurred; 0 otherwise

The function searches the database gb_main for the currently selected SAI as defined by PFOLD_AWAR_SELECTED_SAI and assigns the data of the alignment alignment_name to protstruct. If protstruct_len is specified the length of the new reference SAI is stored. The function is used in the editor to initialize the reference protein secondary structure SAI and to update it if the selected SAI is changed in the "Protein Match Settings" dialog. For this purpose it should be called with &ED4_ROOT->protstruct and &ED4_ROOT->protstruct_len.

Definition at line 967 of file ED4_protein_2nd_structure.cxx.

References ED4_root::aw_root, AW_root::awar(), ED4_ROOT, error(), GB_read_string(), GBS_global_string(), GBT_find_SAI(), GBT_find_sequence(), long, NULp, PFOLD_AWAR_ENABLE, PFOLD_AWAR_SELECTED_SAI, AW_awar::read_int(), AW_awar::read_string(), ta, and AW_awar::write_int().

Referenced by ARB_main(), ED4_alignment_length_changed(), ED4_pfold_select_SAI_and_update_option_menu(), and ED4_manager::update_bases_and_rebuild_consensi().

AW_window* ED4_pfold_create_props_window ( AW_root awr,
const WindowCallback *  refreshCallback 
)

Creates the "Protein Match Settings" window.

Parameters
[in]awrRoot window
[in]cbCallback struct
Returns
Window

The "Protein Match Settings" window allows the user to configure the properties for protein match computation. These settings include turning the match computation on and off (bound to awar PFOLD_AWAR_ENABLE), selecting the reference protein secondary structure SAI (bound to awar PFOLD_AWAR_SELECTED_SAI), choosing the match method (bound to awar PFOLD_AWAR_MATCH_METHOD, see PFOLD_MATCH_METHOD) and the definition of the match pairs (bound to awar PFOLD_AWAR_PAIR_TEMPLATE and pfold_match_type_awars, see PFOLD_MATCH_TYPE and pfold_pairs) as well as the match symbols (bound to awar PFOLD_AWAR_SYMBOL_TEMPLATE and pfold_match_type_awars or PFOLD_AWAR_SYMBOL_TEMPLATE_2, see PFOLD_MATCH_TYPE and pfold_pair_chars or PFOLD_PAIR_CHARS_2). Via a filter (bound to PFOLD_AWAR_SAI_FILTER) the SAIs shown in the option menu can be narrowed down to a selection of SAIs whose names contain the specified string. The callback function ED4_pfold_select_SAI_and_update_option_menu() is bound to the SAI option menu and the SAI filter to update the selected SAI in the editor or the selection in the SAI option menu.

Definition at line 1062 of file ED4_protein_2nd_structure.cxx.

References AW_window::at(), AW_POPDOWN(), ED4_root::aw_root, AW_ROOT_DEFAULT, AW_root::awar(), AWT_insert_config_manager(), ED4_pfold_select_SAI_and_update_option_menu(), ED4_ROOT, makeHelpCallback(), name_value_pair::name, PFOLD_AWAR_ENABLE, PFOLD_AWAR_MATCH_METHOD, PFOLD_AWAR_PAIR_TEMPLATE, PFOLD_AWAR_SAI_FILTER, PFOLD_AWAR_SELECTED_SAI, PFOLD_AWAR_SYMBOL_TEMPLATE, PFOLD_AWAR_SYMBOL_TEMPLATE_2, AW_awar::read_int(), and setup_pfold_config().

Referenced by ED4_root::generate_window().

Variable Documentation

name_value_pair pfold_match_type_awars[]

Awars for the match type; binds the PFOLD_MATCH_TYPE to the corresponding awar name.

Definition at line 39 of file ED4_protein_2nd_structure.cxx.

Referenced by ed4_create_all_awars().

char* pfold_pairs[PFOLD_PAIRS]

Match pair definition (see PFOLD_MATCH_TYPE) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT in ED4_pfold_calculate_secstruct_match().

Definition at line 60 of file ED4_protein_2nd_structure.cxx.

Referenced by ed4_create_all_awars().

char* pfold_pair_chars[PFOLD_PAIRS]

Symbols for the match quality (defined by PFOLD_MATCH_TYPE) as used for match methods SECSTRUCT_SECSTRUCT and SECSTRUCT_SEQUENCE_PREDICT in ED4_pfold_calculate_secstruct_match().

Definition at line 50 of file ED4_protein_2nd_structure.cxx.

Referenced by ed4_create_all_awars().