![]() METHOD OF MASS SPECTROMETRY IDENTIFICATION OF AN UNKNOWN MICROORGANISM SUB-GROUP FROM A SET OF REFER
专利摘要:
A method of mass spectrometric identification of an unknown microorganism subgroup from a set of reference subgroups, comprising a step of constructing a knowledge base and a subgroup classification model associated with from acquiring at least one set of learning spectra of microorganisms identified as belonging to subgroups of a group and comprising: ○ Constructing an adjustment model for correcting mass offsets. overload of spectra acquired from reference mass-over-loads common to different subgroups ○ Mass-overload adjustment of all peak lists of learning spectra. ○ The construction of a subgroup classification model and the associated knowledge base from the adjusted learning spectra 公开号:FR3035410A1 申请号:FR1553731 申请日:2015-04-24 公开日:2016-10-28 发明作者:Valerie Monnin 申请人:Biomerieux SA; IPC主号:
专利说明:
[0001] The invention relates to the field of classification of microorganisms, especially bacteria, by means of spectrometry. The invention finds particular application in the identification of microorganisms by means of mass spectrometry, for example MALDI-TOF (acronym for "Matrix-assisted laser desorption / ionization time of flight"). It is known to use spectrometry or spectroscopy to identify microorganisms, and more particularly bacteria. To do this, a sample of an unknown microorganism to be identified is prepared and a mass spectrum of the sample is acquired and pre-processed, in particular to eliminate the noise, smooth the signal and subtract the baseline (commonly known as baseline "). A step of detecting the peaks present in the acquired spectra is then carried out. The peaks of the spectrum thus obtained are then classified using classification tools associated with data of a knowledge base constructed from lists of reference peaks each associated with a microorganism or a group of microorganisms (strain , class, order, family, genus, species, etc ...) identified. More particularly, the identification of microorganisms by classification conventionally consists of: in a first step of construction, using a supervised training, of a classification model associated with a knowledge base as a function of so-called mass spectra "Learning" microorganisms whose groups, especially species, the classification model and the knowledge base defining a set of rules distinguishing these different groups are known beforehand; in a second step of identifying an unknown particular microorganism by: acquiring a mass spectrum thereof; and o applying to the acquired spectrum the classification model in relation to the associated knowledge base previously constructed to determine at least one group, more particularly one species, to which the unknown microorganism belongs. Typically, a mass spectrometry identification apparatus comprises a mass spectrometer and an information processing unit receiving the measured spectra and implementing the aforementioned second step. The first step is implemented by the manufacturer of the device that builds the knowledge base, as well as the classification model and integrates it into the machine before being operated by a customer. On the other hand some devices allow their users to develop their own knowledge bases and associated classification models. [0002] In order to acquire a mass spectrum of a sample by MALDI-TOF spectrometry, it is deposited on a support comprising different receiving locations, also called plate. The sample is then covered with a matrix that allows the crystallization of the sample. [0003] The use of mass spectrometry identification apparatus requires regular calibration in order to ensure the accuracy and precision of the expected mass-to-charge measurements in the analyzed spectrum. Two conventional techniques exist and are performed routinely to guarantee these parameters. [0004] External calibration or calibration is a routine technique performed on most mass spectrometry devices. For this technique, the deposition of a standard mixture (or external calibrant) is carried out at a location distinct from that of the sample on the plate, which support of the sample in the apparatus. The external calibration consists in adjusting the mass axis on the load (m / z) of the mass spectra of the standard mixture, whose content is known, so that the peaks observed coincide with their theoretical position, a list of reference peaks corresponding to characteristic mass-over-loads previously defined for this standard. During the external calibration, the presence of the reference peaks corresponding to these characteristic mass-over-charges is sought in the list of peaks of the spectrum of the standard mixture, with a given tolerance on the expected position. The spectrum of the standard mixture is then realigned according to the observed position of each of the reference mass-over-charges found. Subsequently, the transformation applied to realign the spectrum of the standard mixture is applied to the spectrum of the sample to be analyzed in order to realign its position on the 5 m / z axis. This method has the advantage of being able to work on very small quantities of samples without risk of signal suppression. However, the external calibration is not precise enough for the classification of microorganisms, especially at taxonomic levels lower than the species level. [0005] 10 Calibration or internal calibration is used to obtain maximum measurement accuracy. This technique can be complementary to the external calibration in order to bring more precision on the position of the masses-on-charges of the spectrum. This calibration is termed internal because a standard mixture (or internal calibrant) is incorporated into the sample to be analyzed before the acquisition. [0006] In the framework of MALDI-TOF spectrometry, the matrix (α-cyano-4-hydroxycinnamic acid (α-HCCA), etc.) is deposited on the sample and standard assembly in order to co-crystallize them. Thus, in the analysis of the acquired mass spectrum, the assignment of known mass-over-loads of the compounds of the standard mixture makes it possible to calculate calibration constants. These constants are then used for calculating the masses-on-charges of the unknown compounds. [0007] However, the main drawback of this method is the risk of the signal being removed from the analyte ions present in the sample by a too high concentration of the standard mixture. In the context of a biological sample preparation method by tryptic digestion, the positions of mass-over-charges corresponding to trypsin can also be used as an internal calibrator. [0008] It is known that the identification of certain species or subspecies of microorganisms by MALDI-TOF spectrometry requires a high accuracy on acquired spectra in order to differentiate groups of closely related species. More particularly, the distinction of closely related species, the identification of microorganisms at the subspecies or at the strain level (strains of different serotypes, strains of different pathotypes, strains of different genotypes, etc.) are notoriously complex. . These subgroups in fact have very similar spectra that do not make their possible distinction with the knowledge bases and classification algorithms developed for identification at the group level, for example at the higher taxonomic level. This limit is due in particular to the resolution achieved by the 5 mass spectrometry devices but also to the variability of the acquisitions on the same device as well as between different devices. For example, a shift in the position of the peaks of the spectra for several acquisitions of the same sample can be observed. This offset may be visible for example for acquisitions of a sample deposited in a single location or in multiple locations of the sample carrier. This variability leads to an uncertainty in the mass-on-charge measurement, which does not interfere with identification at the group level, but which prevents discrimination at levels below the group such as subgroups, typically at low levels. lower than the species of the microorganism. The object of the invention is to reduce this variability by improving the accuracy of the position of the peaks of the acquired mass spectra. The invention also aims to provide a method that does not modify the existing sample preparation methods and can be used directly with existing protocols, especially without the use of additional external or internal standard. Another object of the invention is to provide a method for the identification of microorganism at subgroup level following identification at the group level. The invention thus relates to a process for identifying the group of an unknown microorganism followed by the identification of the subgroup of this same microorganism by mass spectrometry. For this purpose, the invention relates to a method of identification by mass spectrometry of an unknown microorganism subgroup among a set of reference subgroups, comprising: a first step of constructing a base of knowledge and an associated group classification model from a set of microorganism learning spectra identified as belonging to said group - A second step of building a knowledge base and a model of classification by associated subgroup from acquisition of at least one set of microorganism learning spectra identified as belonging to said subgroups of the group comprising: 0 constructing an adjustment model for correcting mass-to-charge offsets of the spectra acquired from reference mass-over-loads common to the different subgroups. emble lists of peaks of learning spectra. The construction of a subgroup classification model and the associated knowledge base from the adjusted learning spectra A third stage of classification to a subgroup of an unknown microorganism comprising: acquiring at least one spectrum of the unknown microorganism o The classification into a group of said spectrum according to said group classification model and said group knowledge base o The mass-overload adjustment of the entire list of peaks of said spectrum according to the adjustment model allowing the correction of mass-on-charge offsets of the spectrum of the unknown microorganism o Classification in a subgroup of said group by said subgroup classification model and the knowledge base The invention thus makes it possible to identify the group of an unknown microorganism followed directly by the identification of the subgroup (subspecies, strain type, etc.) of this subgroup. same microorganism by mass spectrometry, all without making a second acquisition of the mass spectrum of the sample containing the unknown microorganism nor adding internal standard. The invention thus has the same effect on the mass-on-charge accuracy as the use of an internal standard, and makes it possible to propose a routine procedure for the user of the identical mass spectrometry apparatus. at a simple group level identification. In addition, the invention is particularly economical in the time required for the development of the knowledge base at subgroup level and routine classification of unknown microorganisms and without additional costs of external or internal standard. The majority of the steps of the method according to the invention are also automatable in order to limit the number of interventions necessary for the construction of the classification model and the associated knowledge base, as well as the routine analysis of unknown microorganisms. . By group and subgroup is meant a hierarchical tree representation of the types of reference microorganisms used in the construction of the knowledge bases, for example in terms of evolution and / or phenotype and / or genotype. . The subgroup level always corresponds to a subset of the group. In the case of bacteria, the group can thus be a species in the sense of conventional analysis techniques, a subgroup may then be a subspecies of the group or a particular phenotype of the group. However, one group may also consist of several species that are not distinguished by conventional analysis techniques, so that each corresponding subgroup may correspond to one or more of these species. Advantageously, a step of optimizing the list of mass-over-loads of reference based on the quality of the adjustment obtained following at least one of the adjustment steps can be carried out. [0009] The identification and selection of reference mass-over-loads common to the different subgroups can be obtained from masses-on-loads known a priori or deduced according to statistical criteria of frequency of the presence of the peaks in each. subgroups of the group. For this, the method according to the invention may comprise a step consisting in: - Discretizing the mass-over-charge space of each of the spectra of each subgroup 30 - Detecting the presence or absence of peaks around the subgroups masses-on-loads defined by the step of discretization according to a tolerance factor - Filter said masses-on-charges as a function of the frequency of presence of peaks for each of the subgroups 5 - Approximate the position of the masses-on- retained charges The discretization step may advantageously be performed over a range of masses-overloads restricted compared to the mass-on-charge interval obtained following the acquisition of the spectrum. The approximation step may advantageously consist in seeking a position representative of the distribution of the positions of the peaks present around each of the masses-on-charges retained. The identification of the reference mass-over-loads of the process can thus be based on a statistical analysis of the frequency of presence of the peaks of the spectra acquired for the construction of a knowledge base of the subgroups, both for the implementation of the classification model and its routine use. [0010] Advantageously, the method comprises during the step of constructing a knowledge base and an associated subgroup classification model: - The construction of a second adjustment model allowing the correction of the mass offsets -on-charge spectra acquired from reference mass-over-charges common to the different subgroups 20 - A second mass-on-charge adjustment step of the set of peak lists of the learning spectra from the second adjustment model Advantageously, the method comprises a step of controlling the adjustment following at least one of the masses-on-charge adjustment steps during the step of building a knowledge base and an associated subgroup classification model. [0011] The parameters of the adjustment model (s) can advantageously be obtained by a so-called robust estimation method. Advantageously, the reference mass-over-charges common to the various known subgroups are selected by a step consisting of: - Detecting the presence or absence of peaks around the reference mass-over-loads according to a tolerance factor filtering said masses-on-charges as a function of the frequency of presence of peaks for each of the subgroups and / or approximating the position of the reference mass-over-charges 5 retained Advantageously, the step of building a base of knowledge and an associated subgroup classification model comprising a step of discretizing the masses-overloads of the acquired spectra. Advantageously, the step of constructing a knowledge base and an associated subgroup classification model comprising a step of processing the intensities of the acquired spectra. Advantageously, the step of constructing a knowledge base and an associated subgroup classification model comprising a step of controlling the quality of the spectra acquired. According to one embodiment, the mass spectrometry is a spectrometry. MALDI-TOF. The invention also relates to a device for identifying a microorganism by mass spectrometry, comprising: a mass spectrometer capable of producing mass spectra of microorganisms to be identified; a computing unit capable of identifying the microorganisms associated with the mass spectra produced by the spectrometer by implementing a method according to any one of the preceding claims. [0012] BRIEF DESCRIPTION OF THE FIGURES The invention will be better understood on reading the following description, given solely by way of example, with reference to the accompanying drawings, in which: FIG. 1 is a flowchart of the method according to the invention; FIG. 2 is a flowchart of step 100 of the method according to the invention; FIG. 3a is a flowchart of step 200 of the method according to the invention; FIG. 3b is a flowchart of step 240 of the method according to the invention; FIG. 3c is a flowchart of step 300 of the method according to the invention; FIG. 3d is a flowchart of step 400 of the method according to the invention; FIG. 4 is a plot for each subgroup A to E, of a group considered, of the frequency of each peak obtained on the spectra corresponding to said subgroup in the range 5330 Th-5410 Th - the FIGS. 5a-1c are a plot of an iterative calculation example in three iterations of three approximate mass-over-loads; FIG. 6 is a plot for two Alpha-Beta mass-over-loads of the presence frequency of a peak for each subgroup A to F, the median of the residues for each subgroup, the interquartile range of the residues for each subgroup - Figures 7a and 7b are a plot of the result of a first fit and a second FIGS. 8a and 8b are a plot of the result of a first fit and a second fit according to the invention - FIGS. 9a and 9b are a plot of the result of a first adjustment and a second fit according to FIG. invention - Figures 10a and 10b are a plot of the result The accuracy of an adjustment according to the invention is illustrated in FIGS. 1a and 1b. FIG. 12 is a plot of the result of an adjustment of the invention. FIG. Sub-group Level of Microorganism DETAILED DESCRIPTION OF THE INVENTION It will now be described in connection with the flowchart of FIG. 1, a method according to the invention. [0013] The method comprises a first step 100 of constructing a knowledge base and a cluster classification model from a set of microorganism learning spectra identified as belonging to said group. In general, this step can be carried out in multiple ways to obtain for one or more given group (s), a knowledge base and a classification model for determining whether an unknown microorganism mass spectrum belongs to a given group. audit group from the list of acquired spectrum peaks. An exemplary embodiment of this first step 100 is detailed in FIG. 2. Step 100 can thus begin with a step 110 of acquiring a set of learning mass spectra of one or more identified microorganisms. as belonging to a group, and an external calibration mass spectrum, by means of MALDI-TOF mass spectrometry (acronym for "Matrix-assisted laser desorption / ionization time of flight"). MALDI-TOF mass spectrometry is well known per se and will therefore not be described in more detail later. For example, see Jackson O. Lay, "Maldi-tof Spectrometry of Bacteria," Mass Spectrometry Reviews, 2001, 20, 172-194. The spectra acquired are then pretreated, in particular to denoise them, smooth them or remove their baseline if necessary, in a manner known per se. The acquisition of a mass spectrum may consist of the realization of several shots of the laser on the sample in question, and this at one or more positions of the sample on the support. The spectrum obtained then consists of a "synthetic" spectrum obtained by the summation, the average, the median or any other method intended to weight the contribution of the intensities of each spectrum of each of the shots for the formation of the "synthetic" spectrum. This accumulation of shots, well known in itself, makes it possible in particular to increase the signal-to-noise ratio by limiting the influence of non-recurring phenomena due to the sample, the apparatus, to the conditions of realization of the acquisition, etc. A step of detecting the peaks present in the acquired spectra is then carried out at 120, for example by means of a peak detection algorithm based on the detection of local maxima. A list of peaks for each acquired spectrum, including the location, also called the mass-on-charge value, and the peak intensity of the spectrum, is thus produced. Advantageously, the peaks are detected in the Thomson (Th) range [m. ; ] 10 predetermined, preferably the range [m. ; = [3000; 17000] Thomson. Indeed, it has been observed that sufficient information for the identification of microorganisms is grouped in this mass-to-charge ratio range, and that there is therefore no need to take into account a wider range. [0014] The process continues, at 130, with an external calibration step from the acquired calibration mass spectrum. Calibration (or external calibration) consists in adjusting the m / z axis of the mass spectra of a reference sample, the content of which is known, so that the peaks observed coincide with their theoretical position. For example, an Escherichia Coh strain serves as an external standard for detecting deviations and correcting masses-on-charge offsets. A list of reference peaks corresponding to characteristic mass-over-loads was previously defined for this calibrator. During this calibration step, the presence of the reference peaks corresponding to these characteristic weight-over-loads is sought in the list of peaks of the spectrum, with a given tolerance on the expected position. The spectrum is then realigned according to the position observed. The transformation used to realign the calibrated peaks acquired on the reference peaks will then be used to realign the peaks of the sample spectrum. According to an exemplary implementation of this step 130, for each acquisition group (for example 4x4 locations on an acquisition medium for a VITEK® MS device marketed by the Applicant), an Escherichia coli calibration strain (ATCC 8739 ) is deposited on the location reserved for the calibration of said acquisition group. Once the spectrum of the acquired calibration strain has been acquired, the presence of 11 reference peaks corresponding to masses-on-charges characteristic of Escherichia Coh is sought, with a tolerance of 0.07% around the expected position of the peaks. If at least 8 of the 11 peaks are within the expected position range, the spectrum peaks of the calibration strain will be realigned according to their reference position. The transformation used to realign the acquired calibrant peaks on the reference peaks, for example a first or second order polynomial transformation, will then be used to realign the spectral peaks of all the other locations of the acquisition group. Optionally and as a precaution, the acquisition operation can be stopped if a minimum number of detected reference peaks is not reached. For example if less than 8 characteristic mass-overloads are detected. It is also possible to extend the tolerance around the positions of the expected reference peaks to 0.15%. In this case, if at least 5 characteristic overload weights are detected with the new expanded tolerance, it is preferable to first realign the peaks of the calibrator spectrum and to search in a second time for a larger number of reference peaks. with the initial tolerance of 0.07%. If a larger quantity of peaks is then found, the peaks of the spectrum are realigned a second time according to the transformation found. The acquisition, pretreatment and peak detection of the other samples making up the acquisition group can also be performed after the calibration step by applying the transformation found on the peak lists corresponding to the spectra of the samples. Alternatively, the step 130 may consist of or be complemented by an internal adjustment step from a calibrant mixed with the sample during the acquisition step 110. Following the calibration step 130 the method according to the invention can comprise a step of controlling the quality of the acquired spectra 140 and / or a step of discretizing the masses-on-charges 150 and / or a step of processing the intensity of the spectra 155. The order of realization of these steps 140, 150, 155 may vary. Optionally, the process therefore continues, at 140, with a step of controlling the quality of the acquired spectra. For example, it can be verified that the number of identified peaks is sufficient: too low a number of peaks not allowing the exploitation of the spectrum acquired for the classification of the microorganism considered while too many can be indicative 3035410 13 noise. In a complementary manner, a test based on the intensity of the peaks detected can also be performed during this step of controlling the quality of the spectra. Following step 130, optionally of step 140, a step of discretization massessur-loads, or "binning" masses-on-charges 150 can be achieved. To do this the Thomson range [mi. ; ] is subdivided into breadth intervals or "bins", for example constant or constant on a logarithmic scale. For each interval comprising several peaks, a single peak may be conserved, advantageously the peak having the highest intensity. This method is therefore used to align the spectra and reduce the effects of slight weight-on-charge position errors, the alignment obtained being directly related to the size of the discretization intervals. A reduced list is thus produced from each of the lists of peaks of the measured spectra. Each component of the list corresponds to an interval of the discretization and has as value the intensity of the peak preserved for this interval, the value "0" signifying that no peak has been detected in this interval. Following step 130, optionally of step 140, optionally of step 150, a step 155 of processing the intensity of the spectra can also be performed. Intensity is a very variable size from one spectrum to another and / or from one spectrometer to another. Due to this variability, it is difficult to take into account the raw intensity values in the classification tools. This step can therefore be performed on the raw spectra, before mass-on-charge discretization or after step 150. This can notably consist of a step of thresholding the intensities, the intensities below the threshold being considered as zero. and intensities above the threshold being maintained. As a variant, the intensity lists obtained by this thresholding or following a discretization step can be "binarized" by setting the value of a component of the list to "1" when a peak is greater than the threshold or present in the corresponding discretization interval, and "0" when a peak is below the threshold or when no peak is present in this discretization interval. Alternatively, the intensities lists obtained are transformed according to a logarithmic scale, by setting the value of the component to "0" when no peak is present in the interval or when a peak is below the threshold. Finally, a normalization of each of the intensity lists (ie, raw, thresholded, "binarized" or transformed according to a logarithmic scale) can be realized. [0015] Advantageously, the intensity lists are transformed according to a logarithmic scale and then standardized. This has the effect of making the learning of the classification algorithms carried out later more robust. From these lists of peaks each corresponding to a microorganism learning spectrum identified as belonging to a group, the process continues with the creation in step 160 of a knowledge base per group and in step 170 of a group classification model. The knowledge base comprising the parameterization of the classification model and the information on the groups of each microorganism used for learning and for classifying an unknown microorganism among the groups of learning microorganisms. [0016] A cluster classification model is established in step 170 from known supervised classification algorithms such as the nearest neighbor method, logistic regression, discriminant analysis, classification trees, regression methods. type "LASSO" or "elastic net", type of algorithms SVM (acronym for the English expression "support vector machine"). [0017] According to FIG. 1, the method continues in step 200 by constructing a knowledge base and a subgroup classification model from a set of microorganism learning spectra identified as from the previous group and subgroups of this group. [0018] Step 200 is detailed in Figure 3a. This step 200 comprises the acquisition 210 of at least one spectrum of a microorganism whose group and subgroup are known for each of said subgroups. This acquisition step is performed similarly to step 110. The acquired spectrum is thus pre-processed, in particular to denoise it, smooth it or remove its baseline if necessary. The method continues according to step 220 by identifying the peaks of the spectra similarly to step 120, the external or internal calibration of each of the spectra similarly to step 130, optionally the control of the spectra. their quality in a manner similar to step 140. Preferably, step 210 can be directly performed simultaneously at step 110 of the method in order to limit the number of manual steps required for the acquisition steps. The steps 110 and 210 then consist of a single step of acquiring a spectrum of a microorganism whose group and subgroup are known. In the same way, step 220 is then performed simultaneously with steps 120 and 130 and optionally step 140. [0019] Following step 220, the spectra of the microorganisms whose group and subgroups are known are then represented as a set of peak lists, each peak list corresponding to a microorganism whose group and the subgroup are known. From these lists of peaks, the method continues with a step 230 of constructing an adjustment model allowing the correction of the mass-on-charge offsets of the acquired spectra. This construction step 230 comprises firstly a step of identification and selection of reference mass-over-loads common to the different subgroups. Indeed, a mass-on-load that would not be common to the various subgroups of the group would be a mass-over-load discriminant, an adjustment model that would be based on this mass-overload would therefore be biased. Ideally, these masses-on-charges are common to the different subgroups and do not show peaks in the immediate vicinity of the spectrum in order to obtain a mass-over-charge list particularly characterizing the group. According to a first alternative 240, these masses-on-loads of references common to the different subgroups are deduced from statistical criteria. [0020] As illustrated in FIG. 3b, these weight-over-loads of references can in particular be obtained by: A first step 241 of discretizing the range of the masses-on-charges of interest. This step can be performed over a mass-on-load interval of the restricted peak lists with respect to the mass-on-charge interval obtained from the acquisition, known to contain most of the mass-over-charges. characteristic charges of microorganisms, for example on the mass-over-loads range 3000 to 17000Th. From this interval, it is discretized: o either by regular mass-on-charge interval (eg 1 Th) o or increasing mass-on-charge interval. [0021] Thus, an assembly 30f is obtained 16 fm (i)}; i = 1, / corresponding to the set of masses-on-charges obtained after discretization, each value m (i) being separated from the value m (i + 1) by a mass-on-charge interval called no discretization . It is defined a tolerance factor ti defining an interval around each of the mass-over-loads m (0. For the good realization of the process it should be noted that the discretization chosen must guarantee at least the recovery of the intervals defined by the tolerance factor ti of one mass-on-charge compared to the next one, ideally an overlap at half the width of the gap.So, a fine discretization step is preferable to a step of discretization too wide in order to not to discard a mass-on-load which would be characteristic of the subgroups and therefore useful for the adjustment.This step of fine discretization thus makes it possible to limit the loss of information A way of guaranteeing the recovery of the intervals of a mass-on-load with respect to the following is to define the iteratively discretization by the formula m (i + 1) = m (i) + t1 * m (i) with ti being the tolerance factor, and the initialization from m (1) to the bo minimum of the range of mass-over-charges of interest. The step of discretization is thus equal to t1 * m (i). For example, for the mass-over-charge range of interest from 3000 to 17000 Th with a tolerance t1 = 0.0008, the discretization step at 3000 Th is 2.4 Th while the step of discretization at 17000 Th is 13.6 Th. [0022] Another, simpler, way of ensuring the overlap of the intervals of one mass-overload relative to the next is to define the discretization on the minimum bound of the mass-over-load range of interest by the formula, m (i + 1) = m (i) + t1 * m (1) For example, for the mass-on-charge range of interest 3000 to 17000Th with a tolerance t1 = 0.0008, the step of Discretization applicable to the entire mass-on-load range is 25 3000 * 0.0008 = 2.4 Th. This follows a second step 242 of detecting the presence or absence of one or more peaks in the meantime according to ti around each mass-on-load m (i) defined by the step of discretization. [0023] For each spectrum, the tolerance ti makes it possible to take into account the uncertainty on the position of the mass-on-charge sought in each of the spectra acquired. Thus, let X = fx (s)}; s = 1,. The list of mass-over-loads of the spectrum under consideration and that is the tolerance factor applied to the masses-on-charges. The operation consists of searching for the presence of a peak among X = fx (s)}; s = 1, S in the range defined by the tolerance around mass-on-load m (i) considered, namely the interval [m (i) - m (i) * t1; m (i) + m (i) * t1 In order to optimize the computation time, the presence of a peak in the interval considered can be noted as 10 1, the absence of a peak or the presence of several peaks is noted 0 in order to obtain a presence matrix in the form of the following table 1, T being the number of acquired learning spectra: subgroup m (1) m (2) ... m (I-1) m (I) Spectrum (1) A 0 0 1 1 Spectrum (2) A 0 0 1 1 ... Spectrum (T-1) B 0 1 1 1 Spectrum (T) B 1 1 1 1 Table 1 From of this matrix, a third step 243 consists in filtering the masses-on-charges as a function of the frequency of presence of peaks by subgroups. The frequency of presence of a peak in the interval defined by the tolerance around each mass-on-load m (i) defined during the discretization step is calculated by subgroup and reduced to a percentage. [0024] This step is illustrated in FIG. 4. FIG. 4 represents, for each subgroup A to E, of the group considered, the frequency of each peak obtained on the spectra corresponding to said subgroup in the interval 5330 Th-5410. Th. Subsequently, the masses-on-loads m (i) having a percentage of presence greater than a threshold, for example 60%, represented by a dashed horizontal line in FIG. 4, for each of the sub- groups to be discriminated against, are retained. [0025] Thus, a set is obtained: fm (j); j = 1,, J; J / masses-on-loads among fm (i)}; i = 1, / retained after the filtering step on the frequency. For example, according to Table 2 below, only the mass-over-loads m (1-1) and m (1) are retained after filtering. Frequency (%) by sub-m (1) m (2) ... m (I-1) m (I) group A 0 0 100 100 B 50 100 100 100 Table 2 10 From this list of masses over-loads filtered according to a frequency threshold, the next step 244 consists in approximating the position of said weight-on-charges retained. The masses-on-charges retained have a coarse precision dependent on the discretization carried out in step 241. A step of approximating the position of these masses-on-charges is thus carried out in order to obtain a representative position of the distribution. peak positions present around mass-on-charge m (j). This calculation of the representative position may for example comprise a step of estimating a Gaussian function representative of the distribution of the peaks as well as the search for the position of the extremum of this function. Another method may consist in carrying out several steps of iterative calculation of the median value of the positions of the peaks present around the mass-on-load m (j). For this method using the median, let MU be the theoretical value of the mass-on-load position. Let M (j, 0) = m (j), M (j, n + 1) be obtained from following algorithm: For each spectrum, a step of the method consists of looking for a peak among X = fx (s)}; s = 1, S is in the range around mass-on-charge M (j, n), namely the interval [MU, n) - MU, n) * t2; MU, n) + MU, n) * t2] With t2 a tolerance factor around the position of the mass-on-load MU, n), the value of the tolerance factor ti being greater than or equal to t2. The value of M (j, n + 1) is then obtained by calculating the median of the values of the peaks retained over all the spectra in the interval around M (j, n). The stopping criterion of this optimization step may be for example a predefined number of iterations and / or be based on an incremental control. [0026] For example, in the case where a predefined number of iteration is defined: Let N be the predefined number of iterations, M (j) is approximated by 1VT (j) = MU, N). In the case where the method comprises a step of controlling the increment, let E be a fixed tolerance for the approximate calculation of M (j). The iterations end as soon as: n + 1) - M (j, n) I <E 10 MU) is then approximated by 117 / (j) = M (j, n + 1) In order to ensure the convergence of this method by controlling the increment and to save the calculation time necessary for this step, a maximum number of iterations N can also be predefined. The stopping criterion based on a predefined number of iterations N = 3 is thus preferred for the implementation of the invention. An exemplary iterative calculation in three iterations is illustrated for three mass-over-loads in FIGS. 5a to 5i. In FIG. 5a, the median MU, 1) calculated from the values of the peaks around M (j, 0) is equal to 5339.6 Th and represented in a dashed vertical line. In a second iteration, illustrated in FIG. 5d, the median M (j, 2) is thus calculated from the values of the peaks around M (j, 1), a new value equal to 5339.8 Th 20 is then obtained . In Figure 5d, M (j, 1) is represented by a solid vertical line, M (j, 2) is represented by a dashed vertical line. In a third iteration, illustrated in FIG. 5g, the median M (j, 3) is thus calculated from the values of the peaks around M (j, 2), a value equal to 5339.8 Th is then still obtained, demonstrating the convergence of the method. In Fig. 5g M (j, 2) is represented by a solid vertical line, 25M (j, 3) is represented by a dashed vertical line. The calculation is stopped on this third iteration is the approximated value of 5339.8 Th is retained for the mass-on-load retained by the discretization of 5338 Th. [0027] A similar three-step calculation is performed for each of the theoretical weight-over-loads obtained as a result of the discretization. Thus, FIGS. 5b, 5e and 5h illustrate a convergence of the mass-on-load retained by the discretization MU + 1, 0) = m (j + 1) of a value of 5340 Th towards an approximate value of MU + 1, 3) of 5339.8 Th. Similarly, FIGS. 5c, 5f and Si 5 illustrate a convergence of the mass-on-charge retained by the discretization MU + 2.0) = m (j + 2) of a value of 5342 Th to an approximate value of MU + 2, 3) of 5339.8 Th. Following step 244 of approximation, the method continues with a step 245 of mass-over-suppression. identical approximate loads. Following the approximation carried out, a list fm (j), / i7 / (j), j = 1, ...., J is obtained. Because of the initial discretization chosen so as to guarantee overlap of the intervals of one mass-on-load with respect to the next, several mass-over-charges retained m (j) can correspond to the same mass-on-load. approximate load. The approximations / i7 / (j) of these masses-on-charges are in this case equal or almost equal according to the precision retained on the calculation of the value. Table 3 below illustrates, in particular, the position of the weighted masses retained and approximated in the range 5338 to 5398 Th for an exemplary implementation of the invention with a step of discretization of 2 Th. Position of the masses. Position of masses - Position of masses - over - loads over - loads In (j) approximated it ".4 (j) approximated it" .4 (j) retained 5338 5339.8 5339.8 5340 5339.8 5342 5339.8 5378 5381.2 5381.2 5380 5381.2 5382 5381.5 5384 5381.2 5394 5397.4 5397.4 5396 5397.4 5398 5397.4 Table 3 One approximation is thus kept for each value. [0028] 3035410 21 A new list R = fR (k)}; k = 1,. , K; K <J mass-over-loads of reference of the group is thus obtained. According to a second alternative 250, these masses-on-charges common to the various subgroups are known a priori. This knowledge can for example be obtained from the list of peaks used as reference peaks for classification at the group level. These peaks being known to represent the group, they have a high probability of being used as reference mass-over-charges within the meaning of the present invention. These mass-over-charges may also be known from prior analyzes by mass spectrometry or by other analytical methods and making it possible to know the theoretical mass-on-charge of a peak for a molecule or protein characteristic of the various subunits. -groups and therefore the group considered. Optionally and for the purpose of improving the selection of these mass-over-charges, a step similar to step 242 of detecting the presence or absence of one or more peaks in a tolerance range around each mass. -on-reference load known a priori can be performed. This step 242 may be followed by a step similar to step 243 of filtering the mass-charges as a function of the frequency of presence of peaks by subgroups can be performed. The frequency of presence of a peak in the range defined by the tolerance around each known reference mass-over-load is calculated per subgroup and reduced to a percentage. Alternatively or in a complementary manner, this step 242 may be followed by a step similar to the step 244 of approximation of the position of the known reference weight-over-loads a priori can be performed. [0029] Once the list of reference mass-over-loads obtained as a result of step 240 or 250, the method is continued by adjusting the mass-over-loads of all the peak lists in the step 260 according to Figure 3a. For each spectrum represented by a list of peaks, the objective of step 260 is to adjust the positions of all peaks by learning a transformation model from the position of the reference mass-over-loads. The parameters of this model are estimated so that the peaks observed on the spectrum coincide at best with the approximate position of the reference mass-over-charges obtained at the end of step 240 or with the theoretical position of the mass-on reference load obtained at the end of step 250. [0030] For each spectrum in peak list format: - Let X = fx (s)}; s = 1,. , S the list of masses-on-loads of the peaks of the spectrum considered - Let R = fR (k)}; k = 1, K the list of mass-over-loads of reference - Let t3 be the tolerance factor around the position of the mass-on-load fR (k)}, for example t3 = 0.0004. The value of the tolerance factor t2 being greater than or equal to t3 For each reference mass-over-load fR (k): The method consists in finding a mass-on-load among fx (s), s = 1, S is in the range defined by the tolerance around the mass-on-load fR (k)}, namely the interval [R (k) - R (k) * t3; R (k) + R (k) * t3] In some cases, where the mass-overload shift of the spectrum is too large or for example when the spectra include only a few peaks, no peak is observed in the interval considered. [0031] 20 Let the sequence of observations ff (1); x (1)}, 1 {1,, K} the list of reference mass-over-loads fR (/)} for which a peak in position x (1) on the spectrum considered has been observed. The transformation to be applied to the masses-on-charges of the spectrum is modeled by the model R = f (x), the model f being able to be: - a model of linear regression: C = [30 + 13 1x; 130, [31 being the constants of the model a model of polynomial regression of degree 2: C = p, + 131x + 132 x2; 130, P1, 132 being the constants of the model - a nonlinear or non-parametric regression model, such as local regression models of the Spline, Loess or Lowess type, kernel regression models, etc. A linear regression model is preferred for the implementation of the invention in order to limit the prediction error when extrapolating the model outside the mass-on-charge domain used to estimate the parameters of said model. . The need to extrapolate appears, for example, when the selected reference mass-over-loads cover only a subset of the mass-over-charge domain of interest or when the mass-overload shift of the spectrum considered is too large. relative to the tolerance t3 considered. [0032] The estimation of the parameters of the model can be carried out by the ordinary least squares method. However, outlier values can be observed on some masses-overloads, due for example to the specificity of the sample tested or to an initial mass-overload offset that is too great over a certain area of the mass-on range. -charge. However, the least squares method is very sensitive to the presence of outliers, even in small numbers. In order to obtain estimates of the parameters that are not influenced by outliers, it is preferable to use a so-called robust estimation method that simultaneously solves the problem of detecting outliers and estimating model parameters. Tukey's "biweight" estimator is thus preferred for the implementation of the invention, preferably solved via the use of the IRLS (Iteratively Reweighted Least Squares) iterative weighted least squares algorithm. Other robust estimation methods can obviously be envisaged, among them the LMS-Least Median of Squares method, the LTS-Least Trimmed Squares method. English language) as well as any method derived from the class of M-estimators for which Tukey's "biweight" estimator is a particular example. The adjusted position of all the peaks of the spectrum is then inferred via the model previously learned on the reference mass-over-loads. The mass-on-charge correction is thus extrapolated outside the mass-over-load range used for adjustment: For each mass-on-load x (s), the weight-on-load adjusted is obtained by-i (s) = f (x (s)) 3035410 24 - We denote g (s) = {; -i (s)}; s = 1, S the list of the adjusted positions of the spectrum peaks Following the adjustment step 260, an optional step 265 may consist of optimizing the reference mass-over-charge list based on the quality of the adjustment obtained. [0033] The objective of this step is to ensure that the quality of each reference mass-over-load selected is similar between the different subgroups of interest. For each reference mass-on-load R = {R (k)}; k = 1, ...., K; K <J and each subgroup: The method comprises a step of calculating the frequency of presence of a peak for each subgroup after adjusting the masses-on-charges of each spectrum in the interval defined by the tolerance t3 around the mass-on-charge R (k). This frequency is a first indicator. Following this step, the method comprises a step of calculating the difference in the position of the peaks for each subgroup after adjustment to the reference mass-on-charge, for example by calculating the median or the average of the residues associated with the mass-on-charge R (k). This gap is a second indicator. A step of calculating the dispersion of the position of the peaks for each subgroup after adjustment with respect to the reference mass-over-load, for example by calculating a standard deviation, an extension , or an interquartile range of residues associated with the massur-load R (k). In general, this dispersion calculation step can be carried out by any method making it possible to quantify the dispersion of the values of the positions of the peaks observed. This dispersion is a third indicator. From this calculation, the step 265 continues with a step of suppressing certain reference mass masses on the basis of the non-homogeneity of at least one of the three indicators between the subgroups of the group under consideration. [0034] FIG. 6 illustrates for two Alpha and Beta mass-over-loads the calculation of: the frequency of presence of a peak for each subgroup A to F the median of the residues for each subgroup represented by a line Each of these three indicators allows, for example, to conserve the mass-over-the-interquartile range of residues for each sub-group represented by the range of each box-to-mustache 3035410. Alpha charge and spread the mass-on-charge Beta. Indeed, the mass-on-charge Alpha has a frequency of about 100% between the subgroups, a median of the residues close to 0 for each subgroup and a dispersion of the similar residues between each subgroup. group. On the other hand, it is relevant to exclude mass-over-charge Beta because the frequency of presence of a peak is less than 60% for 2 subgroups, the median of the residues is shifted beyond a threshold of 1 or -1 for the subgroup A, a median threshold being fixed at 1 or -1 in dashed line. In addition, the interquartile range of residues is significantly higher for subgroups A and E. Calculation of these three criteria therefore makes it possible to establish thresholds for eliminating or maintaining mass-over-loads of statistically. Step 265 then ends with a readjustment step similar to step 260 but performed only from the masses-on-charges retained after the step of removing certain reference weight-over-loads on the basis of the non-homogeneity of at least one of the three indicators between the subgroups of the group considered. Optionally, step 260 or step 265 may be followed by a step 270 of learning and constructing a second model for adjusting mass-charges on the mass-on range. -load of interest for classification by subgroup. Step 270 resumes step 230 of identification and selection of reference mass-over-loads common to the different subgroups and step 260 of learning and building a mass adjustment model. over-loads in order to construct a second adjustment model from the lists of peaks that have already undergone a first adjustment, and therefore with weight-on-load offsets assumed to be lower. The first adjustment step, following step 260, can indeed lead to an extrapolation of the weight-over-load registration over certain areas of the mass-over-load range of interest following an offset. initial mass-over-loads important. A second step of learning and constructing a second model allowing the adjustment of the masses-on-charges via a polynomial regression model, for example of order 2, can be carried out in order to adjust more finely the position of the peaks over a wider range of masses-on-charges. For this, steps 230, and 260, or even 265, are reproduced to select a list of reference mass-overloads common to the different subgroups and to adjust the mass-overloads of all the peak lists on the Mass-on-charge range of interest for subgroup classification. Figures 7a and 7b illustrate the interest of this second adjustment step. Figure 7a illustrates the result of a first adjustment via a linear regression model for a spectrum of a given subgroup A. The black curve represents the difference between the reference mass-over-load and the position of the mass-on-charge observed before adjustment. The gray curve represents the difference between the reference mass-over-load and the position of the mass-on-load after adjustment. Due to a high initial mass-to-charge offset, only the reference mass-over-loads between 4000 Th and 8000 Th were detected. The mass-on-charge correction model is then extrapolated outside this mass-over-charge interval over all the peaks of the spectrum under consideration. The use of a linear model in first intention, makes it possible to limit the error of extrapolation. FIG. 7b illustrates the result of a second adjustment of the same spectrum via a second order polynomial regression model. The black curve represents the difference between the reference mass-on-load and the position of the mass. overload observed after the first adjustment, but before the second adjustment. The gray curve represents the difference between the reference mass-over-load and the position of the mass-on-load after the second adjustment. It can be seen that the model has been adjusted to masses-on-charges detected between 3000 Th and 12000 Th, making it possible to adjust the position of the peaks more finely over a wider mass-on-load range. [0035] Step 270 may optionally be repeated n times in order to construct an nth adjustment pattern and thereby improve the fit of the spectra. The next step 280 finally consists in learning and building a knowledge base and in the next step 290 of a dedicated classification algorithm allowing the discrimination of the subgroups from the lists of peaks. spectra having undergone the adjustment or mass-on-charge adjustment steps described above. When the weight-on-charge adjustment step or steps have significantly improved the accuracy of the location of the peaks, the classification algorithm can be: 5 - based on the calculation of a tolerant distance, for example equal or advantageously lower than for a classification at the group level. - based on a matrix of peaks, obtained for example by discretization of the masses-overloads as described in step 150. The step used for the discretization of the masses-surcharges being identical or advantageously finer than for a classification at group level 10 . All known classification algorithms can be used, such as logistic regression, discriminant analysis, classification trees, "LASSO" or "elastic net" regression methods, SVM-type algorithms. Anglo-Saxon expression "vector support machine"). The method according to the invention therefore makes it possible to obtain a mass-overload adjustment model comprising 1 to n mass-over-load reference lists and 1 to n mass-on-charge adjustment models as well as a knowledge base and a classification algorithm dedicated to the discrimination of the subgroups of the group under consideration. From the knowledge base and a classification algorithm dedicated to discrimination of groups and the knowledge base and a classification algorithm dedicated to the discrimination of subgroups of at least one group of groups considered the process continues with a step of classifying an unknown microorganism. The method therefore continues, in FIG. 1, by a group classification step 300. As previously described, this step is based on the group knowledge base, and the associated group classification algorithm, pre-existing or constructed from a set of microorganism spectra whose groups were previously identified. [0036] The step 300 of group classification starts, according to FIG. 3c, by a step 310 of acquisition of at least one mass spectrum of said unknown microorganism. Step 310 begins with the preparation of a sample of the unknown microorganism to be identified, followed by the acquisition of one or more mass spectra of the sample prepared by means of a mass spectrometer, for example a spectrum of MALDI-TOF type. This step is performed similarly to step 110. Following the acquisition step, the method continues with a step 320 of detecting the peaks 10 of the spectra similarly to step 120 and of external or internal calibration 330 of these spectra, similarly to step 130. This step aims to obtain a peak alignment for group classification of said microorganism. As previously discussed, the external calibration consists of adjusting the m / z axis of the mass spectra of a reference sample, the contents of which are known and arranged at a different point on the plate than the sample, so that what the peaks observed coincide with their theoretical position. The realization of this step is thus similar to step 130, the peaks of the spectrum of the unknown microorganism being realigned according to the transformation applied to the spectrum of the calibrant. Following this step, the method comprises a step 340 of classifying the one or more lists of peaks obtained. The group classification algorithm, in relation to the associated group knowledge base, is therefore implemented. One or more groups (family, germ, species, ...) are thus identified for the sample analyzed. Advantageously, and in order to improve the group classification step, this step may be preceded by a step of controlling the quality of the spectra in a manner similar to step 140 as well as possibly by a step of discretization of the spectra. weight-on-charges, similar to step 150 and / or a step of intensities processing, similar to step 155. Alternatively, step 340, may not be performed in the case where the group of microorganism analyzed is known but the subgroup is unknown. In this case, the process proceeds directly to step 350. [0037] In a next step 350, a result of the classification step is obtained, for example in the form of a probability score of belonging of the unknown microorganism to one or more groups. In the case where the selected group or at least one of the selected groups is represented in the knowledge base by subgroup, the method according to the invention is continued by a subgroup classification step 400. As previously described, this step is based on the knowledge base by subgroup constructed as well as the associated subgroup classification algorithm, obtained from a set of microorganism spectra including groups and subgroups. were previously identified. According to FIG. 3d, the step 400 of classification by subgroup thus begins with a step 410 of recognizing a classification result of step 350 of a group for which a knowledge base by subgroup and a Subgroup classification algorithm exists. For example, a taxonomic group consisting of Eseherichia colt species and Shigella genus may be associated with a knowledge base by taxonomic subgroups separating Eseherichia colt non 0157 (subgroup A), Eseherichia colt 0157 (subgroup B), Shigella species: Shigella dysenteriae (subgroup C), Shigella flexneri (subgroup D), Shigella boydii (subgroup E), Shigella sonnet (subgroup F), etc. [0038] The next step 420 then consists in adjusting the masses-on-charges of the list of peaks obtained following step 330 using the model obtained following step 260, and masses reference overloads, group characteristics, defined in step 240 or reference weight-over-loads, group characteristics retained as a result of step 250. In the case where a second adjustment model has been created, the list of peaks is then adjusted a second time using the adjustment model obtained following step 270, the characteristic weight-on-loads used then being those of the second model. In the same way, in the case where an nth adjustment model has been created, the peak list is then adjusted an nth time using the adjustment model obtained as a result of the step 270, the masses-on-loads 30 characteristics used then being those of the n-th model. [0039] Optionally, the method may be continued by a step 430 of controlling the quality of the weight-on-charge adjustment. For this, a number (or a percentage) of mass-reference overloads detected on the acquired spectrum (s) may be defined as necessarily greater than a given threshold. Alternatively, or additionally, a RMSE (Root Mean Squared Error) error between the theoretical position of each reference mass-over-load and the position after adjustment of these mass-charges on the acquired spectrum (s) can be defined as necessarily less than a given threshold. A standard calculation of the mean squared error can thus be obtained by the following equation: ## EQU3 ## A list of the reference mass-over-loads for which a peak has been observed on the spectrum considered of the adjustment model obtained following step 260, possibly 270 ok (1) being the adjusted weight-on-load obtained by k (1) = f (R (1)). Following step 420 or 430, the method continues with a step 440 of spectrum classification adjusted from the sub-group knowledge base and the classification algorithm for discriminating previously learned and defined subgroups Advantageously and in order to improve the subgroup classification step, this step may be preceded by a mass-on-charge discretization step, similar to step 150 and / or an intensity processing step; ape 155. In a next step 450, a result of the subgroup classification step is obtained, for example in the form of a probability score of belonging the unknown microorganism to one or more subgroups. [0040] 3035410 31 Example of subgroup classification of a group formed by Escherichia coli and Shigella genus. The method according to the invention is applied to the classification of serogroups of Escherichia neck species and Shigella species. The method thus aims to distinguish subgroups according to their pathogenicity. The method uses a MALDI-TOF VITEK® MS mass spectrometer (bioMérieux, France) sold by the applicant comprising a VITEK® MS v2.0.0 group knowledge base, also called VITEK® MS v2.0.0 database. VITEK® MS 10 also includes an associated group classification algorithm using a multivariate classification associated with the group knowledge base. A membership score for each of the groups is obtained following the step of classification by the algorithm of a spectrum of an unknown microorganism. [0041] The method according to the invention thus makes it possible to propose a classification in two stages, by group then by subgroup, which can be performed routinely on a mass spectrometry apparatus. First, the group, here a taxonomic group at the species level would be identified and in the case of the Escherichia coh / Shigella group a second level of classification by subgroup is proposed to differentiate the 4 species of Shigella from said serogroup group. 0157 Escherichia neck species and non-0157 serogroups of the species Escherichia neck. A first batch A of 116 strains of microorganisms of which the Escherichia neck and Shigella group and subgroups are identified by conventional phenotypic identification and serotyping techniques is created. This lot will be used for the construction of a knowledge base and a classification model by reference subgroup. This lot A contains: o 60 strains of Escherichia neck non 0157 (esh-col reference) constituting the subgroup A o 8 strains of Escherichia neck type 0157 (reference esh-o157) constituting the subgroup B o 12 strains of Shigella dysenteriae (reference shg-dys) constituting the subgroup C 3035410 32 o 12 strains of Shigella flexneri (reference shg-flx) constituting the subgroup D o 12 strains of Shigella boydii (reference shg-boy) constituting the subgroup E o 12 strains of Shigella sonnei (reference shg-son) constituting the subgroup F These 116 microorganisms are not distinguished by the current VITEK® MS apparatus, the classification algorithm of the apparatus thus classifying in the group "Escherichia coli / Shigella" of the associated knowledge base. In order to proceed with the acquisition of lot A microorganism spectra by mass spectrometry, the samples containing these microorganisms are prepared according to a standard protocol: - Collection of a colony after culture on a growth agar medium using of an oese - Resuspension of the colony in a 2 mL Eppendorf tube containing 300 μL of demineralized water 15 - Addition of 0.9 mL of absolute ethanol and mixing (vortex) - Centrifugation for 2 min at 10,000 rpm - Elimination of the supernatant using a pipette - Add 40uL of Formal Acid 70% and mix (vortex) - Add 40uL Acetonitrile and mix (vortex) 20 - Centrifugation for 2 min at 10000 rpm - 1 μL deposition of supernatant - drying - addition of 1 μl of HCCA matrix A quantity of each sample of each strain is deposited on a Maldi plate for use with the VITEK® MS. Acquisitions are made in duplicate or quadruplicate. The acquisition is carried out using the LaunchPad V2.8 software and with the following parameters: - Linear mode - "Rastering: Regular circular" 30 - 100 profiles / sample 3035410 33 - 5 shots / profile - Acquisition between 2000 and 20000 Thomsons - Auto-quality parameter activated 5 Following the acquisition of these spectra, the VITEK® MS instrument performs pretreatment and external calibration from the spectrum acquisition of a strain of Escherichia Coh calibration. (ATCC 8739) filed on the location reserved for the calibration of the acquisition group. Once the spectrum of the calibration strain acquired, the presence of 11 reference peaks corresponding to masses-on-charges characteristic of Escherichia Coh is sought, with a tolerance of 0.07% around the expected position of the peaks. If at least 8 of the 11 peaks are in the expected position range, the spectrum peaks of the calibration strain will be realigned according to their reference position. The resulting transformation is used to realign the spectra of the acquired samples. A total of 388 spectra corresponding to the 116 strains of the LOT A group thus allow the creation of a knowledge base at the group level and an associated classification algorithm. In order to confirm that the LOT A microorganisms are not distinguished by the apparatus and belong to the same group for the VITEK® MS v2.0.0 database and the associated algorithm, a group classification step is performed. The results of this classification for lot A are given in table 4 below: Batch samples  'Bad group identified No Group Total Escherichia group identified coli / Shiaella t. esh-co I 19 '19' es h-o157 31 31 c. where 39 39 'z. h dys, 3- '32 sirg-flx 1 46 47, -7 ,. Table 4 3035410 34 99.7% of the spectra in Lot A are correctly predicted as belonging to the Escherichia coh / Shigella group of the VITEK® MS v2.0.0 database. A single spectrum obtained from a strain of the species Shigella flexneri is not identified, although of good quality. It is still preserved for the construction of the knowledge base at the subgroup level in the following 5 steps. From this base of 388 spectra corresponding to lot A and to the Escherichia coh / Shigella group, a knowledge base at the subgroup level and an associated classification method are created. [0042] For this, the adjustment of the masses-on-load positions of the detected peaks is carried out in two adjustment steps by the successive construction of two adjustment models. In a first adjustment step, similar to the realization of steps 230, 240 and 260, massessur-characteristic loads of the group, known a priori, for the Escherichia coh / Shigella group and situated between 4000 and 10,000 Th and corresponding to masses-on-charges of the calibrant are sought in the 388 spectra. The tolerance around the position of these masses-on-charges on each of the acquired spectra is fixed at t = 0.0005%. From the observed position of these mass-over-loads and their theoretical position, a linear regression model is calculated in order to realign them to their theoretical position. The resulting transformation is also applied to all peaks of each of the acquired spectra. Following this first step, a second adjustment step 270 is performed via a second-order polynomial regression model adjusted to a reference mass-on-charge list determined statistically according to the method described in step 240. For this, each of the adjusted spectra following the first adjustment step is discretized in the mass-over-charge range of interest with steps of 1Th between 3000 and 6000 Th, of 2 Th between 6000 and 10,000 Th and 3 Th between 10,000 and 20000 Th. Each spectrum is thus discretized in 8366 mass-over-charge intervals. The presence or absence of peaks is sought with a tolerance of 0.0003% around each mass-on-load m (i) defined by the discretization according to the method described in step 242. Mass-over-loads m (i) thus obtained are then filtered according to the frequency of occurrence of peaks for each of the subgroups according to the method described in step 243. 133 masses-on-charges with a minimum frequency of presence for each of the sub-groups -group of 60% are thus retained. This makes it possible to select masses-on-charges which are particularly characteristic of the group. [0043] The position of these masses-on-loads is then approximated according to a statistical model of the position of the masses-on-charges retained. This step corresponds to the step 244 described. From the corrected positions, the identical or near-identical approximated mass-sur-charges are suppressed, in order to retain a list of 46 unique mass-over-loads characteristic of the group. It is considered that 2 masses-on-charges after approximation are identical if the difference observed between the 2 masses-on-charges is less than 0.1Th. This step corresponds to the step 245 described. Position of mass-on-positions Position of mass-masses-on-load es approximate loads i: ', after selected retained overloads adjustment I (initial discretization) 5338 5339.8 5339.8 5340 5339.8 534: 5339.8 53.7.s 5381.2 5381.2 5380 53812 82 5381.2 53.84 5351.2 -7394 539-.4 z 'z 9 4 5396 39'.4 5398 Table 5 The preceding table 5 illustrates the mass-over-load interval 5338 to 5398 Th the position of the masses-on -charges selected on the discrete space of masses-on-loads, the approximate value of these same masses-on-charges and the final list of the masses-on-charges retained after removal of identical masses-on-charges. Subsequently, an adjustment step is thus performed in a similar manner to step 270 from the positions of the masses-on-charges retained. An optional step of controlling and optimizing the list of mass-over-loads of reference based on the quality of adjustment obtained makes it possible to retain a reduced list of 37 final reference mass-over-loads. This step is based on criteria as defined in step 265. Five weight-on-charges are eliminated because having for at least one of the subgroups a percentage of presence of a peak after adjustment less than 60 %, ie a median of residues greater than 1Th, ie an interquartile range of residues greater than 2Th. From this list of mass-over-loads of reduced references, the process continues with a readjustment of all the masses-surcharges of the lists of peaks of the group. According to FIG. 8a, the method comprises a first similar adjustment to step 260 via a linear regression model fitted to reference masses-on-charges detected only between 5000 and 10,000 Th because of a high initial offset of weight-on-loads. Mass-on-charge correction is extrapolated outside this mass-overload interval. The use of a first-line linear model makes it possible to limit the extrapolation error on the mass-over-load list of the spectrum considered. According to FIG. 8b, the method comprises a second adjustment similar to step 270 via a second-order polynomial regression model fitted to masses-on-charges detected between 3000 and 12000 Th, making it possible to adjust more precisely the position of the peaks of the spectrum considered over a wider mass-on-charge range. [0044] Fig. 9a illustrates for a mass-over-charge range the peak position observed among all the spectra of the corresponding group and subgroup before adjustment. Figure 9b illustrates the position of the same peaks after a second adjustment, demonstrating the quality of the adjustment performed as well as the suitability of the mass-on-load selected as the reference mass-on-load. [0045] 3035410 37 The accuracy claimed by the manufacturer after external calibration of the VITEK® MS is 400 ppm, which is a Thomson accuracy of 1.2Th to 3000Th / 4.4Th at 11000 Th. The accuracy in Thomson observed after calibration external, Figure 10a, is median of the order of the claimed accuracy on the dataset considered, namely of the order of 1.2Th 5 for the masses-on-charges to 3000Th and of the order of 3Th for the masses-on-charges to 11000Th. After the second adjustment of the masses-on-charges by the method according to the invention, FIG. 10b, the precision is of the order of 0.12Th at 3000Th and of 0.44Th at 11000Th, ie a precision of the order of 40 ppm. This increase in accuracy after adjustment by the method according to the invention demonstrates the relevance of the selected reference mass-over-loads and the quality of the adjustment achieved. A dedicated knowledge base and classification algorithm for discriminating subgroups of the Escherichia coli / Shigella group from the adjusted peak listings of spectra described above is then constructed according to the method described in steps 280. 15 and 290. For this purpose, a knowledge base and a dedicated classification algorithm are constructed to distinguish the following six subgroups: - Escherichia coli non 0157, subgroup A - Escherichia coli 0157, subgroup B 20 - Shigella dysenteriae, subgroup C - Shigella flexneri, subgroup D - Shigella boydii, subgroup E - Shigella sonnei, subgroup F 25 As an example, Fig. 11a illustrates, for a mass range of overloads containing a mass allowing the discrimination of subgroup Escherichia neck 0157 of the other subgroups, the position of the peaks observed among all the spectra of the corresponding group and subgroups before adjustment nt. Figure 1b illustrates the position of the same peaks after a second adjustment, demonstrating that it is then possible to use the presence / absence of the 10139 Th peak with a tolerance of +/- 2 Th to detect the subgroup Escherichia Coh type 0157 where this peak is absent. [0046] In order to verify the capacity of the classification model and the knowledge base by subgroup associated with classifying microorganisms by subgroup, a second batch B of 31 strains identified as belonging to the group Eseherichia coh / Shigella and whose subgroups are known by conventional methods of analysis is also constituted. This lot B, called the evaluation lot, contains 31 strains of Shiga Toxin Eseherichia Coh (STEC) of 6 different serotypes 0: 026, 045, 0103, 0111, 0121 and 0145. The sample preparation protocol is identical to that previously used. Two spectra are acquired per strain in order to obtain a list of 62 distributed spectra according to the following Table 6. Seroty, eg 0 ATCC number Serotype number0 ATCC number Sezoty number 0 ATCC number Number of spectra spectra spectra 0103 10 0121 1, :: 026 i BAA-1 2 B - 2 BAA-21S1 BAA-2200 2 BAA- 2203 4 BAA-1S6 BAA-Q 2 BAA '220 2 BAA-2188 BAA-2ii 2 BAA-2221 2 BAA - "04 2 BAA-.13 BAA -'2O 2 0111 12 0145 1,: iii 045 lii BAA 9 BAA-16, BAA, BAA-180 4 BAA-219- '2 BAA-: a 2 BAA-184 2 BAA-2211 2 BAA-21a1 2 BAA-2180 2 BAA-222- '2 BAA-2198 2 BAA-2201 2 BAA-. [0047] 23 2 BAA-2202 Table 6 These strains are notably identified in the American Type Culture Publication 15 ATCC Collection: "Big Six" Non-O157 Shiga Toxin-Producing Escherichia coli (STEC) Research Materials To confirm that microorganisms in lot B are not distinguished by the apparatus and the state of the art knowledge base and thus belong to the same group, a group classification step according to step 300 is carried out. of this classification for lot B are given in table 7 below: neither links in lot B Bad No Group - Total group identified Escherishia group identified as IFS hella - e. -. esh-col 0101121Z 2 esh-col 0'1011-2 4 4 esh-col 010 Fr25 4 4 esh-col 011-11 - S 12 12 esh-coi 0121 F19 10 10 esh-col 0145H-25 2 2 esh- 014.5: I-48 2 2 esh-col 0-145: Nonma t1e 6 esh-col 026: 1--11 10 _ esh-col 045: 1-2 10 - Total 0 0 62 62 Table 7 5 100 % of the spectra are correctly predicted as belonging to the Escherichia coh / Shigella group by the classification algorithm and the VITEK® MS knowledge base v2.0.0 All the spectra of the B-lot are retained for the evaluation of the knowledge base and the subgroup classification algorithm according to step 400. The method according to the invention is implemented from the previously created subgroup 10 knowledge base as well as the associated classification algorithm. The expected classification for Lot B is a result of the subgroup type Escherichia coli non 0157. For this, the masses-on-loads of the list of peaks obtained during the classification stage at the group level are adjusted using first and second mass-overload adjustment models previously defined. In order to improve the performance of the classification, and optionally, a quality control of the weight-on-load adjustment is performed. The quality criteria defined to ensure the quality of the weight-to-charge adjustment of each spectrum are as follows: For the spectrum considered, at least 28 mass-charges must be detected among the 37 mass-over-charges. predefined reference loads and a Root Mean Square Error (RIUSE) mean squared error between the theoretical position of each reference mass-over-load and the position after adjustment of these mass-over-loads on the spectra. acquired less than 1. 5 spectra do not meet these criteria, 58 reach them. [0048] The 58 spectra selected are classified from the knowledge base and the classification algorithm allowing classification at the previously defined subgroup level. As shown in FIG. 12, all spectra are correctly identified with the Escherichia coli non 0157 subgroup with high scores. In addition, the second best score obtained on another subgroup is very much lower, which ensures the robustness of the classification.
权利要求:
Claims (15) [0001] REVENDICATIONS1. A method of mass spectrometric identification of an unknown microorganism subgroup from a set of reference subgroups, comprising: - A first step of constructing a knowledge base and a cluster classification model associated from a set of learning spectra of microorganisms identified as belonging to the said group - A second step of building a knowledge base and an associated subgroup classification model from the acquisition at least one set of microorganism learning spectra identified as belonging to said subgroups of the group comprising: 0 Constructing an adjustment model for correcting the mass-on-charge offsets of the spectra acquired at from masses-over-loads of reference common to the different subgroups o The adjustment of the masses-on-loads of all the lists of peaks of the learning spectra e. o The construction of a subgroup classification model and the associated knowledge base from the adjusted learning spectra - A third stage of classification to a subgroup of an unknown microorganism including: o Acquisition at least one spectrum of the unknown microorganism o The classification into a group of said spectrum according to said group classification model and said group knowledge base o The mass-overload adjustment of the entire list of peaks of said spectrum according to the adjustment model for correction of the mass-on-charge offsets of the spectrum of the unknown microorganism o Classification in a subgroup of said group by said subgroup classification model and the knowledge base under -group 3035410 42 [0002] 2. Identification method according to claim 1, comprising during the step of constructing a knowledge base and an associated subgroup classification model: 5 - The construction of a second adjustment model allowing the correction of the mass-on-charge offsets of the spectra acquired from masses-on-charges of reference common to the different subgroups - A second stage of adjustment of the masses-on-charges of all the lists of peaks of the learning spectra from the second adjustment model 10 [0003] 3. Identification method according to claim 1 or 2, comprising a step of optimizing the reference weight-over-charge list based on the quality of the adjustment obtained following at least one of the adjustment steps. [0004] 4. Identification method according to claims 1 to 3, the construction of an adjustment model using a known list of reference mass-over-loads common to the different subgroups [0005] 5. The identification method according to claim 4, wherein the reference mass-over-charges common to the various known subgroups are selected by a step of: detecting the presence or absence of peaks around the mass-over-charges; reference loads according to a tolerance factor - Filter said mass-over-charges as a function of the frequency of presence of peaks 25 for each of the subgroups and / or to approximate the position of the reference mass-overloads selected [0006] 6. Identification method according to claims 1 to 5, the construction of an adjustment model using a list of reference mass-over-loads common to the different subgroups deduced according to statistical criteria of frequency of the presence of the peaks in each of the subgroups of the group 3035410 43 [0007] 7. The identification method as claimed in claim 6, wherein the reference mass-over-charges common to the different subgroups are deduced by a step of: - discretizing the space of the masses-on-charges of each of the spectra of each subgroup - Detect the presence or absence of peaks around the masses-on-loads defined by the step of discretization according to a tolerance factor - Filter said masses-on-charges as a function of the frequency of presence of peaks for each of the subgroups 10 - Approximate the position of the masses-on-charges retained [0008] 8. Identification method according to claim 7, the discretization step being performed over a range of masses-on-charges restricted with respect to the mass-overload interval obtained following the acquisition of the spectrum. [0009] 9. Identification method according to one of claims 5 to 8, the approximation step of seeking a position representative of the distribution of the positions of the peaks present around each mass-on-charges retained 20 [0010] 10. Identification method according to one of the preceding claims, the step of constructing a knowledge base and an associated subgroup classification model comprising a mass-on-charge discretization step of the spectra. acquired [0011] 11. Identification method according to one of the preceding claims, the step of building a knowledge base and an associated subgroup classification model comprising a step of processing the intensities of the acquired spectra. [0012] 12. Identification method according to one of the preceding claims, the step of constructing a knowledge base and an associated subgroup classification model comprising a step of controlling the quality of the acquired spectra 3035410 44 [0013] 13. Identification method according to one of the preceding claims, the parameters of the model or adjustment models being obtained by a so-called robust estimation method 5 [0014] 14. Identification method according to one of the preceding claims, the spectra acquired for the first step of building a knowledge base and an associated group classification model being directly used for the second step of construction of a knowledge base and an associated subgroup classification model, the groups and subgroups of the learning microorganisms being known [0015] 15. A device for identifying a microorganism by mass spectrometry, comprising: a mass spectrometer capable of producing mass spectra of microorganisms to be identified; a computing unit capable of identifying the microorganisms associated with the mass spectra produced by the spectrometer by implementing a method according to any one of the preceding claims.
类似技术:
公开号 | 公开日 | 专利标题 EP2834777B1|2017-12-20|Identification of microorganisms by structured classification and spectrometry FR3035410A1|2016-10-28|METHOD OF MASS SPECTROMETRY IDENTIFICATION OF AN UNKNOWN MICROORGANISM SUB-GROUP FROM A SET OF REFERENCE SUBGROUMS EP2798575B1|2018-01-10|Method for identifying micro-organisms by mass spectrometry and score normalisation CA2563420C|2013-05-28|Method for searching content particularly for extracts common to two computer files EP2786397B1|2019-10-02|Method for identifying microorganisms by mass spectrometry EP2565904B1|2015-10-21|Method and apparatus for estimating a molecular mass parameter in a sample FR3047586A1|2017-08-11|METHOD AND DEVICE FOR ANONYMOUSING DATA STORED IN A DATABASE WO2016075409A1|2016-05-19|Method for monitoring an aircraft engine operating in a given environment WO2015158615A1|2015-10-22|Digital detection method EP3028202B1|2017-09-13|Method and device for analysing a biological sample EP3212801B1|2020-04-29|Method for identifying a microbe in a clinical sample FR2767943A1|1999-03-05|CLASSIFICATION APPARATUS USING A COMBINATION OF STATISTICAL METHODS AND NEURAL NETWORKS, IN PARTICULAR FOR THE RECOGNITION OF ODORS FR2941961A1|2010-08-13|METHOD FOR DETERMINING ANTIBIOTIC RESISTANCE OF MICROORGANISMS WO2019211367A1|2019-11-07|Method for automatically generating artificial neural networks and method for assessing an associated risk EP2353272A1|2011-08-10|Method for characterising entities at the origin of fluctuations in a network traffic Arlot2014|Contributions to statistical learning theory: estimator selection and change-point detection Lesage2013|Use of auxiliary information in survey sampling at the sampling stage and the estimation stage WO2019162620A1|2019-08-29|Method for determining a pathology by analysis of faecal material using maldi-tof mass spectrometry WO2020169917A1|2020-08-27|Method for identifying an item by olfactory signature Lovik et al.2017|Combining factors from different factor analyses WO2020245380A1|2020-12-10|Method for grouping pulses intercepted by a listening system; associated computer program product and readable information medium
同族专利:
公开号 | 公开日 JP2018513382A|2018-05-24| CN107533593B|2021-09-14| US20190049445A1|2019-02-14| FR3035410B1|2021-10-01| WO2016185108A1|2016-11-24| EP3286678A1|2018-02-28| JP6611822B2|2019-11-27| CN107533593A|2018-01-02|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题 EP2600385A1|2011-12-02|2013-06-05|bioMérieux, Inc.|Method for identifying microorganisms by mass spectrometry| EP2600284A1|2011-12-02|2013-06-05|bioMérieux, Inc.|Method for identifying micro-organisms by mass spectrometry and score normalisation| EP2648133A1|2012-04-04|2013-10-09|Biomerieux|Identification of microorganisms by structured classification and spectrometry| FR2951548B1|2009-10-15|2011-11-11|Biomerieux Sa|METHOD FOR CHARACTERIZING AT LEAST ONE MICROORGANISM BY MASS SPECTROMETRY| US9677109B2|2013-03-15|2017-06-13|Accelerate Diagnostics, Inc.|Rapid determination of microbial growth and antimicrobial susceptibility| CN103245716A|2013-05-23|2013-08-14|中国科学院化学研究所|Quick high-sensitivity microbiological identification method based on micromolecular metabolic substance spectral analysis|US10930371B2|2017-07-10|2021-02-23|Chang Gung Memorial Hospital, Linkou|Method of creating characteristic peak profiles of mass spectra and identification model for analyzing and identifying microorganizm| EP3660504A1|2018-11-30|2020-06-03|Thermo Fisher ScientificGmbH|Systems and methods for determining mass of an ion species| WO2021245798A1|2020-06-02|2021-12-09|株式会社島津製作所|Method for identifying microorganism identification marker|
法律状态:
2016-04-25| PLFP| Fee payment|Year of fee payment: 2 | 2016-10-28| PLSC| Publication of the preliminary search report|Effective date: 20161028 | 2017-04-26| PLFP| Fee payment|Year of fee payment: 3 | 2018-04-25| PLFP| Fee payment|Year of fee payment: 4 | 2019-04-25| PLFP| Fee payment|Year of fee payment: 5 | 2020-04-27| PLFP| Fee payment|Year of fee payment: 6 | 2021-04-26| PLFP| Fee payment|Year of fee payment: 7 |
优先权:
[返回顶部]
申请号 | 申请日 | 专利标题 FR1553731A|FR3035410B1|2015-04-24|2015-04-24|METHOD OF IDENTIFICATION BY MASS SPECTROMETRY OF AN UNKNOWN MICROORGANISM SUB-GROUP AMONG A SET OF REFERENCE SUB-GROUPS|FR1553731A| FR3035410B1|2015-04-24|2015-04-24|METHOD OF IDENTIFICATION BY MASS SPECTROMETRY OF AN UNKNOWN MICROORGANISM SUB-GROUP AMONG A SET OF REFERENCE SUB-GROUPS| JP2017555309A| JP6611822B2|2015-04-24|2016-04-21|A method for identifying unknown microbial subgroups from a set of reference subgroups by mass spectrometry| CN201680023818.2A| CN107533593B|2015-04-24|2016-04-21|Method for identifying a subpopulation of unknown microorganisms from a collection of reference subpopulations by mass spectrometry| US15/569,005| US20190049445A1|2015-04-24|2016-04-21|Method for identifying by mass spectrometry an unknown microorganism subgroup from a set of reference subgroups| PCT/FR2016/050940| WO2016185108A1|2015-04-24|2016-04-21|Method for identifying by mass spectrometry an unknown microorganism subgroup from a set of reference subgroups| EP16726914.1A| EP3286678A1|2015-04-24|2016-04-21|Method for identifying by mass spectrometry an unknown microorganism subgroup from a set of reference subgroups| 相关专利
Sulfonates, polymers, resist compositions and patterning process
Washing machine
Washing machine
Device for fixture finishing and tension adjusting of membrane
Structure for Equipping Band in a Plane Cathode Ray Tube
Process for preparation of 7 alpha-carboxyl 9, 11-epoxy steroids and intermediates useful therein an
国家/地区
|