|  | OpenMS
    2.6.0
    | 
Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture. More...
#include <OpenMS/MATH/STATISTICS/PosteriorErrorProbabilityModel.h>
| Public Member Functions | |
| PosteriorErrorProbabilityModel () | |
| default constructor  More... | |
| ~PosteriorErrorProbabilityModel () override | |
| Destructor.  More... | |
| bool | fit (std::vector< double > &search_engine_scores, const String &outlier_handling) | 
| fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. Uses two Gaussians to fit. And Gauss+Gauss or Gumbel+Gauss to plot and calculate final probabilities.  More... | |
| bool | fitGumbelGauss (std::vector< double > &search_engine_scores, const String &outlier_handling) | 
| fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. Uses Gumbel+Gauss for everything. Fits Gumbel by maximizing log likelihood.  More... | |
| bool | fit (std::vector< double > &search_engine_scores, std::vector< double > &probabilities, const String &outlier_handling) | 
| fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one).  More... | |
| void | fillDensities (const std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density) | 
| Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.  More... | |
| void | fillLogDensities (const std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density) | 
| Writes the log distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.  More... | |
| void | fillLogDensitiesGumbel (const std::vector< double > &x_scores, std::vector< double > &incorrect_density, std::vector< double > &correct_density) | 
| Writes the log distributions of gumbel and gauss densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.  More... | |
| double | computeLogLikelihood (const std::vector< double > &incorrect_density, const std::vector< double > &correct_density) | 
| computes the Likelihood with a log-likelihood function.  More... | |
| double | computeLLAndIncorrectPosteriorsFromLogDensities (const std::vector< double > &incorrect_log_density, const std::vector< double > &correct_log_density, std::vector< double > &incorrect_posterior) | 
| std::pair< double, double > | pos_neg_mean_weighted_posteriors (const std::vector< double > &x_scores, const std::vector< double > &incorrect_posteriors) | 
| std::pair< double, double > | pos_neg_sigma_weighted_posteriors (const std::vector< double > &x_scores, const std::vector< double > &incorrect_posteriors, const std::pair< double, double > &means) | 
| GaussFitter::GaussFitResult | getCorrectlyAssignedFitResult () const | 
| returns estimated parameters for correctly assigned sequences. Fit should be used before.  More... | |
| GaussFitter::GaussFitResult | getIncorrectlyAssignedFitResult () const | 
| returns estimated parameters for correctly assigned sequences. Fit should be used before.  More... | |
| GumbelMaxLikelihoodFitter::GumbelDistributionFitResult | getIncorrectlyAssignedGumbelFitResult () const | 
| returns estimated parameters for correctly assigned sequences. Fit should be used before.  More... | |
| double | getNegativePrior () const | 
| returns the estimated negative prior probability.  More... | |
| double | computeProbability (double score) const | 
| TextFile | initPlots (std::vector< double > &x_scores) | 
| initializes the plots  More... | |
| const String | getGumbelGnuplotFormula (const GaussFitter::GaussFitResult ¶ms) const | 
| returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively.  More... | |
| const String | getGaussGnuplotFormula (const GaussFitter::GaussFitResult ¶ms) const | 
| returns the gnuplot formula of the fitted gauss distribution.  More... | |
| const String | getBothGnuplotFormula (const GaussFitter::GaussFitResult &incorrect, const GaussFitter::GaussFitResult &correct) const | 
| returns the gnuplot formula of the fitted mixture distribution.  More... | |
| void | plotTargetDecoyEstimation (std::vector< double > &target, std::vector< double > &decoy) | 
| plots the estimated distribution against target and decoy hits  More... | |
| double | getSmallestScore () | 
| returns the smallest score used in the last fit  More... | |
| void | tryGnuplot (const String &gp_file) | 
| try to invoke 'gnuplot' on the file to create PDF automatically  More... | |
|  Public Member Functions inherited from DefaultParamHandler | |
| DefaultParamHandler (const String &name) | |
| Constructor with name that is displayed in error messages.  More... | |
| DefaultParamHandler (const DefaultParamHandler &rhs) | |
| Copy constructor.  More... | |
| virtual | ~DefaultParamHandler () | 
| Destructor.  More... | |
| virtual DefaultParamHandler & | operator= (const DefaultParamHandler &rhs) | 
| Assignment operator.  More... | |
| virtual bool | operator== (const DefaultParamHandler &rhs) const | 
| Equality operator.  More... | |
| void | setParameters (const Param ¶m) | 
| Sets the parameters.  More... | |
| const Param & | getParameters () const | 
| Non-mutable access to the parameters.  More... | |
| const Param & | getDefaults () const | 
| Non-mutable access to the default parameters.  More... | |
| const String & | getName () const | 
| Non-mutable access to the name.  More... | |
| void | setName (const String &name) | 
| Mutable access to the name.  More... | |
| const std::vector< String > & | getSubsections () const | 
| Non-mutable access to the registered subsections.  More... | |
| Static Public Member Functions | |
| static std::map< String, std::vector< std::vector< double > > > | extractAndTransformScores (const std::vector< ProteinIdentification > &protein_ids, const std::vector< PeptideIdentification > &peptide_ids, const bool split_charge, const bool top_hits_only, const bool target_decoy_available, const double fdr_for_targets_smaller) | 
| extract and transform score types to a range and score orientation that the PEP model can handle  More... | |
| static void | updateScores (const PosteriorErrorProbabilityModel &PEP_model, const String &search_engine, const Int charge, const bool prob_correct, const bool split_charge, std::vector< ProteinIdentification > &protein_ids, std::vector< PeptideIdentification > &peptide_ids, bool &unable_to_fit_data, bool &data_might_not_be_well_fit) | 
| update score entries with PEP (or 1-PEP) estimates  More... | |
| static double | getGumbel_ (double x, const GaussFitter::GaussFitResult ¶ms) | 
| computes the gumbel density at position x with parameters params.  More... | |
|  Static Public Member Functions inherited from DefaultParamHandler | |
| static void | writeParametersToMetaValues (const Param &write_this, MetaInfoInterface &write_here, const String &prefix="") | 
| Writes all parameters to meta values.  More... | |
| Private Member Functions | |
| void | processOutliers_ (std::vector< double > &x_scores, const String &outlier_handling) const | 
| transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case)  More... | |
| PosteriorErrorProbabilityModel & | operator= (const PosteriorErrorProbabilityModel &rhs) | 
| assignment operator (not implemented)  More... | |
| PosteriorErrorProbabilityModel (const PosteriorErrorProbabilityModel &rhs) | |
| Copy constructor (not implemented)  More... | |
| Static Private Member Functions | |
| static double | transformScore_ (const String &engine, const PeptideHit &hit, const String ¤t_score_type) | 
| static double | getScore_ (const StringList &requested_score_types, const PeptideHit &hit, const String &actual_score_type) | 
| Private Attributes | |
| GaussFitter::GaussFitResult | incorrectly_assigned_fit_param_ | 
| stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively.  More... | |
| GumbelMaxLikelihoodFitter::GumbelDistributionFitResult | incorrectly_assigned_fit_gumbel_param_ | 
| GaussFitter::GaussFitResult | correctly_assigned_fit_param_ | 
| stores gauss parameters  More... | |
| double | negative_prior_ | 
| stores final prior probability for negative peptides  More... | |
| double | max_incorrectly_ | 
| peak of the incorrectly assigned sequences distribution  More... | |
| double | max_correctly_ | 
| peak of the gauss distribution (correctly assigned sequences)  More... | |
| double | smallest_score_ | 
| smallest score which was used for fitting the model  More... | |
| const String(PosteriorErrorProbabilityModel::* | getNegativeGnuplotFormula_ )(const GaussFitter::GaussFitResult ¶ms) const | 
| points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences.  More... | |
| const String(PosteriorErrorProbabilityModel::* | getPositiveGnuplotFormula_ )(const GaussFitter::GaussFitResult ¶ms) const | 
| points to getGumbelGnuplotFormula  More... | |
| Additional Inherited Members | |
|  Protected Member Functions inherited from DefaultParamHandler | |
| virtual void | updateMembers_ () | 
| This method is used to update extra member variables at the end of the setParameters() method.  More... | |
| void | defaultsToParam_ () | 
| Updates the parameters after the defaults have been set in the constructor.  More... | |
|  Protected Attributes inherited from DefaultParamHandler | |
| Param | param_ | 
| Container for current parameters.  More... | |
| Param | defaults_ | 
| Container for default parameters. This member should be filled in the constructor of derived classes!  More... | |
| std::vector< String > | subsections_ | 
| Container for registered subsections. This member should be filled in the constructor of derived classes!  More... | |
| String | error_name_ | 
| Name that is displayed in error messages during the parameter checking.  More... | |
| bool | check_defaults_ | 
| If this member is set to false no checking if parameters in done;.  More... | |
| bool | warn_empty_defaults_ | 
| If this member is set to false no warning is emitted when defaults are empty;.  More... | |
Implements a mixture model of the inverse gumbel and the gauss distribution or a gaussian mixture.
This class fits either a Gumbel distribution and a Gauss distribution to a set of data points or two Gaussian distributions using the EM algorithm. One can output the fit as a gnuplot formula using getGumbelGnuplotFormula() and getGaussGnuplotFormula() after fitting.
test performance and make fitGumbelGauss available via parameters.
allow charge state based fitting
allow semi-supervised by using decoy annotations
allow non-parametric via kernel density estimation
| Name | Type | Default | Restrictions | Description | 
|---|---|---|---|---|
| out_plot | string | If given, the some output files will be saved in the following manner: | ||
| number_of_bins | int | 100 | Number of bins used for visualization. Only needed if each iteration step of the EM-Algorithm will be visualized | |
| incorrectly_assigned | string | Gumbel | Gumbel, Gauss | for 'Gumbel', the Gumbel distribution is used to plot incorrectly assigned sequences. For 'Gauss', the Gauss distribution is used. | 
| max_nr_iterations | int | 1000 | Bounds the number of iterations for the EM algorithm when convergence is slow. | |
| neg_log_delta | int | 6 | The negative logarithm of the convergence threshold for the likelihood increase. | |
| outlier_handling | string | ignore_iqr_outliers | ignore_iqr_outliers, set_iqr_to_closest_valid, ignore_extreme_percentiles, none | What to do with outliers: - ignore_iqr_outliers: ignore outliers outside of 3*IQR from Q1/Q3 for fitting - set_iqr_to_closest_valid: set IQR-based outliers to the last valid value for fitting - ignore_extreme_percentiles: ignore everything outside 99th and 1st percentile (also removes equal values like potential censored max values in XTandem) - none: do nothing | 
default constructor
| 
 | override | 
Destructor.
| 
 | private | 
Copy constructor (not implemented)
| double computeLLAndIncorrectPosteriorsFromLogDensities | ( | const std::vector< double > & | incorrect_log_density, | 
| const std::vector< double > & | correct_log_density, | ||
| std::vector< double > & | incorrect_posterior | ||
| ) | 
computes the posteriors for the datapoints to belong to the incorrect distribution
| incorrect_posterior | resulting posteriors | 
| double computeLogLikelihood | ( | const std::vector< double > & | incorrect_density, | 
| const std::vector< double > & | correct_density | ||
| ) | 
computes the Likelihood with a log-likelihood function.
Returns the computed posterior error probability for a given score.
| 
 | static | 
extract and transform score types to a range and score orientation that the PEP model can handle
| protein_ids | the protein identifications | 
| peptide_ids | the peptide identifications | 
| split_charge | whether different charge states should be treated separately | 
| top_hits_only | only consider rank 1 | 
| target_decoy_available | whether target decoy information is stored as meta value | 
| fdr_for_targets_smaller | fdr threshold for targets | 
| void fillDensities | ( | const std::vector< double > & | x_scores, | 
| std::vector< double > & | incorrect_density, | ||
| std::vector< double > & | correct_density | ||
| ) | 
Writes the distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.
| void fillLogDensities | ( | const std::vector< double > & | x_scores, | 
| std::vector< double > & | incorrect_density, | ||
| std::vector< double > & | correct_density | ||
| ) | 
Writes the log distributions densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.
| void fillLogDensitiesGumbel | ( | const std::vector< double > & | x_scores, | 
| std::vector< double > & | incorrect_density, | ||
| std::vector< double > & | correct_density | ||
| ) | 
Writes the log distributions of gumbel and gauss densities into the two vectors for a set of scores. Incorrect_densities represent the incorrectly assigned sequences.
fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. Uses two Gaussians to fit. And Gauss+Gauss or Gumbel+Gauss to plot and calculate final probabilities.
| search_engine_scores | a vector which holds the data points | 
| bool fit | ( | std::vector< double > & | search_engine_scores, | 
| std::vector< double > & | probabilities, | ||
| const String & | outlier_handling | ||
| ) | 
fits the distributions to the data points(search_engine_scores) and writes the computed probabilities into the given vector (the second one).
| search_engine_scores | a vector which holds the data points | 
| probabilities | a vector which holds the probability for each data point after running this function. If it has some content it will be overwritten. | 
| bool fitGumbelGauss | ( | std::vector< double > & | search_engine_scores, | 
| const String & | outlier_handling | ||
| ) | 
fits the distributions to the data points(search_engine_scores). Estimated parameters for the distributions are saved in member variables. computeProbability can be used afterwards. Uses Gumbel+Gauss for everything. Fits Gumbel by maximizing log likelihood.
| search_engine_scores | a vector which holds the data points | 
| const String getBothGnuplotFormula | ( | const GaussFitter::GaussFitResult & | incorrect, | 
| const GaussFitter::GaussFitResult & | correct | ||
| ) | const | 
returns the gnuplot formula of the fitted mixture distribution.
| 
 | inline | 
returns estimated parameters for correctly assigned sequences. Fit should be used before.
| const String getGaussGnuplotFormula | ( | const GaussFitter::GaussFitResult & | params | ) | const | 
returns the gnuplot formula of the fitted gauss distribution.
| 
 | inlinestatic | 
computes the gumbel density at position x with parameters params.
References GaussFitter::GaussFitResult::sigma, and GaussFitter::GaussFitResult::x0.
| const String getGumbelGnuplotFormula | ( | const GaussFitter::GaussFitResult & | params | ) | const | 
returns the gnuplot formula of the fitted gumbel distribution. Only x0 and sigma are used as local parameter alpha and scale parameter beta, respectively.
| 
 | inline | 
returns estimated parameters for correctly assigned sequences. Fit should be used before.
| 
 | inline | 
returns estimated parameters for correctly assigned sequences. Fit should be used before.
| 
 | inline | 
returns the estimated negative prior probability.
| 
 | staticprivate | 
gets a specific score (either main score [preferred] or metavalue) @requested_score_types the requested score_types in order of preference (will be tested with a "_score" suffix as well) @hit the PeptideHit to extract from @actual_score_type the current score type to take preference if matching
| 
 | inline | 
returns the smallest score used in the last fit
| 
 | private | 
assignment operator (not implemented)
plots the estimated distribution against target and decoy hits
| std::pair<double, double> pos_neg_mean_weighted_posteriors | ( | const std::vector< double > & | x_scores, | 
| const std::vector< double > & | incorrect_posteriors | ||
| ) | 
| x_scores | Scores observed "on the x-axis" | 
| incorrect_posteriors | Posteriors/responsibilities of belonging to the incorrect component | 
| std::pair<double, double> pos_neg_sigma_weighted_posteriors | ( | const std::vector< double > & | x_scores, | 
| const std::vector< double > & | incorrect_posteriors, | ||
| const std::pair< double, double > & | means | ||
| ) | 
| x_scores | Scores observed "on the x-axis" | 
| incorrect_posteriors | Posteriors/responsibilities of belonging to the incorrect component | 
| 
 | private | 
transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case)
| 
 | staticprivate | 
transform different score types to a range and score orientation that the model can handle (engine string is assumed in upper-case)
| engine | the search engine name as in the SE param object @hit the PeptideHit to extract transformed scores from @current_score_type the current score type of the PeptideIdentification to take precedence | 
| void tryGnuplot | ( | const String & | gp_file | ) | 
try to invoke 'gnuplot' on the file to create PDF automatically
| 
 | static | 
update score entries with PEP (or 1-PEP) estimates
| PEP_model | the PEP model used to update the scores | 
| search_engine | the score of search_engine will be updated | 
| charge | identifications with the given charge will be updated | 
| prob_correct | report 1-PEP | 
| split_charge | if charge states have been treated separately | 
| protein_ids | the protein identifications | 
| peptide_ids | the peptide identifications | 
| unable_to_fit_data | there was a problem fitting the data (probabilities are all smaller 0 or larger 1) | 
| data_might_not_be_well_fit | fit was successful but of bad quality (probabilities are all smaller 0.8 and larger 0.2) | 
| 
 | private | 
stores gauss parameters
| 
 | private | 
points either to getGumbelGnuplotFormula or getGaussGnuplotFormula depending on whether one uses the gumbel or the gaussian distribution for incorrectly assigned sequences.
| 
 | private | 
points to getGumbelGnuplotFormula
| 
 | private | 
| 
 | private | 
stores parameters for incorrectly assigned sequences. If gumbel fit was used, A can be ignored. Furthermore, in this case, x0 and sigma are the local parameter alpha and scale parameter beta, respectively.
| 
 | private | 
peak of the gauss distribution (correctly assigned sequences)
| 
 | private | 
peak of the incorrectly assigned sequences distribution
| 
 | private | 
stores final prior probability for negative peptides
| 
 | private | 
smallest score which was used for fitting the model
 1.8.16
 1.8.16