CEA,
LITEN,
Laboratoire des systèmes solaires (L2S). 50 av. du Lac
Léman,
73375 LE BOURGET DU LAC - CEDEX
Lespinats, S., Grando,
D., Maréchal, E., Hakimi, M.A.,Tenaillon, O., and Bastien, O.
(2011) “How
phylogenetic inference methods can benefit from Multi Dimensional
Scaling
evolution.” Evolutionary
bioinformatics, 7, pp 61-85.
Abstract
: Whatever the phylogenetic method,
genetic sequences are often described as strings of characters, thus
molecular sequences can be viewed as elements of a multi-dimensional
space. As a consequence, studying motion in this space (ie, the
evolutionary process) must deal with the amazing features of
high-dimensional spaces like concentration of measured phenomenon.
To
study how these features might influence phylogeny reconstructions, we
examined a particular popular method: the Fitch-Margoliash algorithm,
which belongs to the Least Squares methods. We show that the Least
Squares methods are closely related to Multi Dimensional Scaling.
Indeed, criteria for Fitch-Margoliash and Sammon’s mapping are somewhat
similar. However, the prolific research in Multi Dimensional Scaling has
definitely allowed outclassing Sammon’s mapping.
Least
Square methods for tree reconstruction can now take advantage of these
improvements. However, “false neighborhood” and “tears” are the two main
risks in dimensionality reduction field: “false neighborhood”
corresponds to a widely separated data in the original space that are
found close in representation space, and neighbor data that are
displayed in remote positions constitute a “tear”. To address this
problem, we took advantage of the concepts of “continuity” and
“trustworthiness” in the tree reconstruction field, which limit the risk
of “false neighborhood” and “tears”. We also point out the
concentration of measured phenomenon as a source of error and introduce
here new criteria to build phylogenies with improved preservation of
distances and robustness.