|  | OpenMS
    2.6.0
    | 
2D hierarchical clustering implementation optimized for large data sets containing many small clusters i.e. dimensions of clusters << dimension of entire dataset More...
#include <OpenMS/COMPARISON/CLUSTERING/GridBasedClustering.h>
| Public Types | |
| typedef GridBasedCluster::Point | Point | 
| cluster centre, cluster bounding box, grid index  More... | |
| typedef GridBasedCluster::Rectangle | Rectangle | 
| typedef ClusteringGrid::CellIndex | CellIndex | 
| typedef std::multiset< MinimumDistance >::const_iterator | MultisetIterator | 
| typedef boost::unordered::unordered_multimap< int, MultisetIterator >::const_iterator | NNIterator | 
|  Public Types inherited from ProgressLogger | |
| enum | LogType { CMD, GUI, NONE } | 
| Possible log types.  More... | |
| Public Member Functions | |
| GridBasedClustering (Metric metric, const std::vector< double > &data_x, const std::vector< double > &data_y, const std::vector< int > &properties_A, const std::vector< int > &properties_B, std::vector< double > grid_spacing_x, std::vector< double > grid_spacing_y) | |
| initialises all data structures  More... | |
| GridBasedClustering (Metric metric, const std::vector< double > &data_x, const std::vector< double > &data_y, std::vector< double > grid_spacing_x, std::vector< double > grid_spacing_y) | |
| initialises all data structures  More... | |
| void | cluster () | 
| performs the hierarchical clustering (merges clusters until their dimension exceeds that of cell)  More... | |
| void | extendClustersY () | 
| extends clusters in y-direction if possible (merges clusters further in y-direction, i.e. clusters can now span multiple cells)  More... | |
| void | removeSmallClustersY (double threshold_y) | 
| removes clusters with bounding box dimension in y-direction below certain threshold  More... | |
| std::map< int, GridBasedCluster > | getResults () const | 
| returns final results (mapping of cluster indices to clusters)  More... | |
|  Public Member Functions inherited from ProgressLogger | |
| ProgressLogger () | |
| Constructor.  More... | |
| ~ProgressLogger () | |
| Destructor.  More... | |
| ProgressLogger (const ProgressLogger &other) | |
| Copy constructor.  More... | |
| ProgressLogger & | operator= (const ProgressLogger &other) | 
| Assignment Operator.  More... | |
| void | setLogType (LogType type) const | 
| Sets the progress log that should be used. The default type is NONE!  More... | |
| LogType | getLogType () const | 
| Returns the type of progress log being used.  More... | |
| void | startProgress (SignedSize begin, SignedSize end, const String &label) const | 
| Initializes the progress display.  More... | |
| void | setProgress (SignedSize value) const | 
| Sets the current progress.  More... | |
| void | endProgress () const | 
| Ends the progress display.  More... | |
| void | nextProgress () const | 
| increment progress by 1 (according to range begin-end)  More... | |
| Private Member Functions | |
| void | init_ (const std::vector< double > &data_x, const std::vector< double > &data_y, const std::vector< int > &properties_A, const std::vector< int > &properties_B) | 
| initialises all data structures  More... | |
| bool | mergeVeto_ (const GridBasedCluster &c1, const GridBasedCluster &c2) const | 
| checks if two clusters can be merged Each point in a cluster can (optionally) have two properties A and B. Properties A need to be the same, properties B need to differ in each cluster. Methods checks if this is violated in the merged cluster.  More... | |
| bool | findNearestNeighbour_ (const GridBasedCluster &cluster, int cluster_index) | 
| determines the nearest neighbour for each cluster  More... | |
| void | eraseMinDistance_ (const std::multiset< MinimumDistance >::const_iterator it) | 
| remove minimum distance object and its related data  More... | |
| Private Attributes | |
| Metric | metric_ | 
| metric for measuring the distance between points in the 2D plane  More... | |
| ClusteringGrid | grid_ | 
| grid on which the position of the clusters are registered used in cluster method  More... | |
| std::map< int, GridBasedCluster > | clusters_ | 
| list of clusters maps cluster indices to clusters  More... | |
| std::map< int, GridBasedCluster > | clusters_final_ | 
| list of final clusters i.e. clusters that are no longer merged  More... | |
| std::multiset< MinimumDistance > | distances_ | 
| list of minimum distances stores the smallest of the distances in the head  More... | |
| boost::unordered::unordered_multimap< int, std::multiset< MinimumDistance >::const_iterator > | reverse_nns_ | 
| reverse nearest neighbor lookup table for finding out which clusters need to be updated faster  More... | |
| boost::unordered::unordered_map< int, std::multiset< MinimumDistance >::const_iterator > | distance_it_for_cluster_idx_ | 
| cluster index to distance iterator lookup table for finding out which clusters need to be updated faster  More... | |
| Additional Inherited Members | |
|  Static Protected Member Functions inherited from ProgressLogger | |
| static String | logTypeToFactoryName_ (LogType type) | 
| Return the name of the factory product used for this log type.  More... | |
|  Protected Attributes inherited from ProgressLogger | |
| LogType | type_ | 
| time_t | last_invoke_ | 
| ProgressLoggerImpl * | current_logger_ | 
|  Static Protected Attributes inherited from ProgressLogger | |
| static int | recursion_depth_ | 
2D hierarchical clustering implementation optimized for large data sets containing many small clusters i.e. dimensions of clusters << dimension of entire dataset
The clustering problem therefore simplifies to a number of local clustering problems. Each of the local problems can be solved on a couple of adjacent cells on a larger grid. The grid spacing is determined by the expected typical cluster size in this region.
Each data point can have two additional properties A and B. In each cluster all properties A need to be the same, all properties B different.
| typedef ClusteringGrid::CellIndex CellIndex | 
| typedef std::multiset<MinimumDistance>::const_iterator MultisetIterator | 
| typedef boost::unordered::unordered_multimap<int, MultisetIterator>::const_iterator NNIterator | 
| typedef GridBasedCluster::Point Point | 
cluster centre, cluster bounding box, grid index
| typedef GridBasedCluster::Rectangle Rectangle | 
| 
 | inline | 
initialises all data structures
| data_x | x-coordinates of points to be clustered | 
| data_y | y-coordinates of points to be clustered | 
| properties_A | property A of points (same in each cluster) | 
| properties_B | property B of points (different in each cluster) | 
| grid_spacing_x | grid spacing in x-direction | 
| grid_spacing_y | grid spacing in y-direction | 
References GridBasedClustering< Metric >::init_().
| 
 | inline | 
initialises all data structures
| data_x | x-coordinates of points to be clustered | 
| data_y | y-coordinates of points to be clustered | 
| grid_spacing_x | grid spacing in x-direction | 
| grid_spacing_y | grid spacing in y-direction | 
References GridBasedClustering< Metric >::init_().
| 
 | inline | 
performs the hierarchical clustering (merges clusters until their dimension exceeds that of cell)
References ClusteringGrid::addCluster(), OpenMS::Constants::c, GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, ProgressLogger::endProgress(), DBoundingBox< D >::enlarge(), GridBasedClustering< Metric >::eraseMinDistance_(), GridBasedClustering< Metric >::findNearestNeighbour_(), GridBasedCluster::getBoundingBox(), GridBasedCluster::getCentre(), ClusteringGrid::getIndex(), GridBasedCluster::getPoints(), GridBasedCluster::getPropertiesB(), GridBasedCluster::getPropertyA(), DPosition< D, TCoordinateType >::getX(), DPosition< D, TCoordinateType >::getY(), GridBasedClustering< Metric >::grid_, DIntervalBase< D >::maxPosition(), DIntervalBase< D >::minPosition(), ClusteringGrid::removeCluster(), GridBasedClustering< Metric >::reverse_nns_, ProgressLogger::setProgress(), and ProgressLogger::startProgress().
Referenced by GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
| 
 | inlineprivate | 
remove minimum distance object and its related data
Remove the distance object behind it from distances_ and remove all corresponding data from the auxiliary data structures reverse_nns_ and distance_it_for_cluster_idx_.
| it | Iterator of distance to be removed from distances_ | 
References GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, and GridBasedClustering< Metric >::reverse_nns_.
Referenced by GridBasedClustering< Metric >::cluster().
| 
 | inline | 
extends clusters in y-direction if possible (merges clusters further in y-direction, i.e. clusters can now span multiple cells)
References ClusteringGrid::addCluster(), GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_final_, ClusteringGrid::getClusters(), ClusteringGrid::getGridSpacingX(), ClusteringGrid::getGridSpacingY(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, ClusteringGrid::isNonEmptyCell(), and ClusteringGrid::removeCluster().
| 
 | inlineprivate | 
determines the nearest neighbour for each cluster
| cluster | cluster for which the nearest neighbour should be found | 
| cluster_index | index of cluster | 
| Should | the cluster be removed from the cluster list? | 
References GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::clusters_final_, GridBasedClustering< Metric >::distance_it_for_cluster_idx_, GridBasedClustering< Metric >::distances_, GridBasedCluster::getCentre(), ClusteringGrid::getClusters(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, ClusteringGrid::isNonEmptyCell(), GridBasedClustering< Metric >::mergeVeto_(), GridBasedClustering< Metric >::metric_, and GridBasedClustering< Metric >::reverse_nns_.
Referenced by GridBasedClustering< Metric >::cluster(), and GridBasedClustering< Metric >::init_().
| 
 | inline | 
returns final results (mapping of cluster indices to clusters)
References GridBasedClustering< Metric >::clusters_final_.
| 
 | inlineprivate | 
initialises all data structures
| data_x | x-coordinates of points to be clustered | 
| data_y | y-coordinates of points to be clustered | 
| properties_A | property A of points (same in each cluster) | 
| properties_B | property B of points (different in each cluster) | 
References ClusteringGrid::addCluster(), GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::clusters_, GridBasedClustering< Metric >::findNearestNeighbour_(), ClusteringGrid::getIndex(), GridBasedClustering< Metric >::grid_, seqan::position(), and ClusteringGrid::removeCluster().
Referenced by GridBasedClustering< Metric >::GridBasedClustering().
| 
 | inlineprivate | 
checks if two clusters can be merged Each point in a cluster can (optionally) have two properties A and B. Properties A need to be the same, properties B need to differ in each cluster. Methods checks if this is violated in the merged cluster.
| c1 | cluster 1 | 
| c2 | cluster 2 | 
References seqan::find(), GridBasedCluster::getPropertiesB(), and GridBasedCluster::getPropertyA().
Referenced by GridBasedClustering< Metric >::findNearestNeighbour_().
| 
 | inline | 
removes clusters with bounding box dimension in y-direction below certain threshold
| threshold_y | minimal dimension of the cluster bounding box | 
References GridBasedClustering< Metric >::clusters_final_, DIntervalBase< D >::maxY(), and DIntervalBase< D >::minY().
| 
 | private | 
list of clusters maps cluster indices to clusters
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
| 
 | private | 
list of final clusters i.e. clusters that are no longer merged
Referenced by GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), GridBasedClustering< Metric >::getResults(), and GridBasedClustering< Metric >::removeSmallClustersY().
| 
 | private | 
cluster index to distance iterator lookup table for finding out which clusters need to be updated faster
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().
| 
 | private | 
list of minimum distances stores the smallest of the distances in the head
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().
| 
 | private | 
grid on which the position of the clusters are registered used in cluster method
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::extendClustersY(), GridBasedClustering< Metric >::findNearestNeighbour_(), and GridBasedClustering< Metric >::init_().
| 
 | private | 
metric for measuring the distance between points in the 2D plane
Referenced by GridBasedClustering< Metric >::findNearestNeighbour_().
| 
 | private | 
reverse nearest neighbor lookup table for finding out which clusters need to be updated faster
Referenced by GridBasedClustering< Metric >::cluster(), GridBasedClustering< Metric >::eraseMinDistance_(), and GridBasedClustering< Metric >::findNearestNeighbour_().
 1.8.16
 1.8.16