专利摘要:
method implemented by processor and bidirectional siamese short-term memory (bilstm) network-based classifier system. These are organizations that are constantly flooded with questions that range from mundane to unanswered. therefore, it is the department that actively seeks automated assistance, especially to lighten the burden of routine but time-consuming tasks. embodiments of the present disclosure provide a bilstm siamese network-based classifier for identifying query target classes and provide responses to queries belonging to the identified target class, which acts as an automated assistant that lightens the burden of answering queries in well-defined domains. the siamese (sm) model is trained for one season and then the same base net is used to train the classification model (cm) for b s interactively until the best accuracy is observed in the validation test, where sm ensures that it learns which phrases are semantically similar / different while cm learns to predict the target class of each user query. In the present context, it is assumed that a and b are hyperparameters and are tuned for best performance in the validation set.
公开号:BR102018004799A2
申请号:R102018004799-0
申请日:2018-03-09
公开日:2019-03-26
发明作者:Puneet Agarwal;Prerna KHURANA;Gautam Shroff;Lovekesh Vig;Ashwin Srinivasan
申请人:Tata Consultancy Services Limited;
IPC主号:
专利说明:

Invention Patent Descriptive Report for “METHOD IMPLANTED BY PROCESSOR AND CLASSIFIER SYSTEM BASED ON A SHORT-TERM LONG-TERM MEMORY NETWORK (BILSTM) BIDIRECTIONAL CROSS REFERENCE TO RELATED REQUESTS AND PRIORITY [001] This order claims priority over Patent Application No. IND 201721032101, filed on September 11, 2017.
TECHNICAL FIELD [002] In this document, the disclosure refers, in general, to assistance systems for frequently answered questions (FAQ), and, more particularly, to a classifier based on Siamese Short-Term Memory Network Bidirectional (BiLSTM) to identify target class of queries and provide answers to them.
BACKGROUND [003] Recently, deep learning algorithms have gained enormous popularity due to their unbelievable performances in the fields of computer vision and speech recognition tasks. One of the seminal works on Natural Language Processing (NLP) that solved tasks, such as, for example, Grammatical Class marking, fragmentation, Recognition of Mentioned Entities and Semantic Role Identification, used convolutional neural networks (CNNs). CNNs were used for the text classification task using word level as well as character level approaches, these networks capture local resources using convolutional filters. In particular, chatbots attracted the attention of researchers and gave rise to many different lines of work, such as the one involving open domain questions that respond with the use of large knowledge graphics. Still another line of work was concerned with the construction of a generative model for the generation of dialogue, some use the sequence-2-sequence model, which takes a question as an input and tries to generate the answer automatically. Similarly, another very prolific line of research involved using reinforcement learning to answer users' questions in a dialogue-based system. The key issue with these generative models is that they often produce grammatically incorrect sentences, although the answers must be legally correct.
Petition 870180022152, of 03/20/2018, p. 5/43
2/29
SUMMARY [004] The modalities of the present disclosure present technological improvements as solutions to one or more of the technical problems mentioned above recognized by the inventors in conventional systems. For example, in one aspect, a processor-implemented method is provided to identify the target class of queries and produce responses from them. The method implemented by the processor comprises: obtaining by the classifier based on a Siamese Short-Term Memory Network (BiLSTM) Bidirectional, through one or more hardware processors, one or more user queries, in which one or more user queries comprise a sequence of words, in which the BiLSTM Siamese network-based classifier system comprises a Siamese model and a classification model, and in which the Siamese model and the classification model comprise a common base network which includes a layer of incorporation, a single layer of BiLSTM and a Dense layer distributed over time (TDD); perform iteratively: represent, in the layer of incorporation of the common base network, one or more user queries as a sequence of vector representations of each word learned using a word-to-vector model;
[005] in which the word sequence is replaced by corresponding vectors and the corresponding vectors are initialized using the word-to-vector model, and in which the corresponding vectors are updated continuously during network-based classifier training BiLSTM Siamese; insert, in the single BiLSTM layer of the common base network, the sequence of vector representations of each word to generate hidden states 't' at each time step, where the vector representation of each word is inserted in at least one forward order and reverse order; process, through the Dense Layer Distributed in Time (TDD) of the common base network, an output obtained from the BiLSTM layer in order to obtain a vector sequence; obtain, using a maxpool layer of the classification model, the maximum value in the sense of the dimension of the vector sequence to form a final vector; and determine by a softmax layer of the classification model, at least one target class of one or more queries based on the final vector formed and issue a response to one or more queries based on the target class
Petition 870180022152, of 03/20/2018, p. 6/43
3/29 determined, in which a square root Kullback-Leibler divergence loss function is applied to the vector sequence to optimize the classification model.
[006] In one embodiment, the method may additionally include determining, during the training of the classifier based on the BiLSTM Siamese network, one or more errors that belong to a set of queries, in which the one or more errors comprise one or more target classes that are determined for the set of queries; generate a set of query-query pairs wrongly classified based on one or more errors; and iteratively train the Siamese model using the set of mismatched query-query pairs together with one or more correct pairs to determine a target class and issue responses for one or more subsequent queries, in which one or more weights from the common basis are shared with the Siamese model and the classification model during the training of the BiLSTM Siamese based classifier.
[007] In one modality, the method may additionally include obtaining, with the use of one or more shared weights, a plurality of consultations incorporating one or more consultations through the Siamese model;
[008] apply a loss of contrasting divergence in the plurality of consultation incorporations to optimize the Siamese model; and update one or more classifier parameters based on the BiLSTM Siamese network. In one modality, the step of applying a loss of contrasting divergence comprises: calculating, the Euclidean distance between the plurality of consultation incorporations; and compute the loss of contrasting divergence based on the calculated Euclidean distance.
[009] In another aspect, a Classifier system based on a Siamese Short-Term Bidirectional Memory Network (BiLSTM) is provided to identify the target class of queries and issue responses from them. The system comprises: a memory that stores instructions; one or more communication interfaces; and one or more hardware processors coupled to the memory through one or more communication interfaces, where the one or more hardware processors are configured by the instructions to: obtain by the classifier system based on Siamese Short-Long Memory Network
Petition 870180022152, of 03/20/2018, p. 7/43
4/29
Bidirectional term (BiLSTM), through one or more hardware processors, one or more user queries, in which the one or more user queries comprise a sequence of words, in which the classifier system based on Siamese network BiLSTM comprises a Siamese model and a classification model, in which the Siamese model and the classification model comprise a common base network that includes a layer of incorporation, a single layer of BiLSTM and a Dense layer distributed over time (TDD ); perform iteratively: represent, in the layer of incorporation of the common base network, one or more user queries as a sequence of vector representations of each word learned using a word-to-vector model;
[010] in which the word sequence is replaced by corresponding vectors and the corresponding vectors are initialized using the word-to-vector model, and in which the corresponding vectors are updated continuously during network-based classifier training BiLSTM Siamese; insert, in the single BiLSTM layer of the common base network, the sequence of vector representations of each word to generate hidden states 't' at each time step, where the vector representation of each word is inserted in at least one forward order and reverse order; process, through the Dense Layer Distributed in Time (TDD) of the common base network, an output obtained from the BiLSTM layer in order to obtain a vector sequence; obtain, using a maxpool layer of the classification model, the maximum value in the sense of the dimension of the vector sequence to form a final vector; and determine, using a softmax layer of the classification model, at least one target class of one or more queries based on the final vector and issue a response to one or more queries based on the determined target class, in which one Square root Kullback-Leibler divergence loss function is applied to the vector sequence to optimize the classification model.
[011] In one modality, the one or more hardware processors can be additionally configured by the instructions to: determine, during the training of the classifier based on BiLSTM Hybrid Siamese network, one or more errors that belong to a set of queries, where the one or more errors
Petition 870180022152, of 03/20/2018, p. 8/43
5/29 belong to one or more target classes that are determined for the set of queries; generate a set of query-query pairs wrongly classified; and iteratively train the Siamese model using the set of mismatched query-query pairs along with one or more correct pairs to determine a target class and issue responses to one or more subsequent queries, in which one or more weights of Common base networks are shared with the Siamese model and with the classification model while training the BiLSTM Siamese network-based classifier system. [012] In one embodiment, the one or more hardware processors can be additionally configured by the instructions to: obtain, with the use of one or more shared weights, a plurality of consultation incorporations, passing one or more consultations through the Siamese model; apply a loss of contrasting divergence in the plurality of consultation incorporations to optimize the Siamese model; and update one or more parameters of the BiLSTM Siamese network-based classifier system. In one embodiment, the loss of contrasting divergence is applied by calculating, a Euclidean distance between the plurality of consultation incorporations; and compute the loss of contrasting divergence based on the calculated Euclidean distance.
[013] In yet another aspect, one or more means of storing non-transitory machine-readable information comprising one or more instructions is provided. One or more instructions that, when executed by one or more hardware processors, are obtained by the classifier based on Bi-Directional Short-Term Siamese Memory Network (BiLSTM), by means of one or more hardware processors, one or more user queries, in which one or more user queries comprise a sequence of words, in which the BiLSTM Siamese network-based classifier system comprises a Siamese model and a classification model, and in which the model Siamese and the classification model comprise a common base network that includes a layer of incorporation, a single layer of BiLSTM and a Dense layer distributed over time (TDD); perform iteratively: represent, in the layer of incorporation of the common base network, one or more user queries as a sequence of vector representations of each word learned using a word-to-vector model;
Petition 870180022152, of 03/20/2018, p. 9/43
6/29 [014] in which the word sequence is replaced by corresponding vectors and the corresponding vectors are initialized using the word-to-vector model, and in which the corresponding vectors are updated continuously during the classifier training with BiLSTM Siamese network base; insert, in the single BiLSTM layer of the common base network, the sequence of vector representations of each word to generate hidden states 't' at each time step, where the vector representation of each word is inserted in at least one forward order and reverse order; process, through the Dense Layer Distributed in Time (TDD) of the common base network, an output obtained from the BiLSTM layer in order to obtain a vector sequence; obtain, using a maxpool layer of the classification model, the maximum value in the sense of the dimension of the vector sequence to form a final vector; and determine by a softmax layer of the classification model, at least one target class of one or more queries based on the final vector and issue a response to one or more queries based on the determined target class, in which a Square root Kullback-Leibler divergence loss is applied to the vector sequence to optimize the classification model.
[015] In one embodiment, the instructions that when executed by the hardware processors may cause it to be determined additionally, during the training of the classifier based on BiLSTM Siamese network, one or more errors that belong to a set of queries, in that the one or more errors comprise one or more target classes that are determined for the set of queries; manages a set of query-query pairs wrongly classified based on one or more errors; and iteratively train the Siamese model using the set of mismatched query-query pairs along with one or more correct pairs to determine a target class and issue responses to one or more subsequent queries, in which one or more weights of Common base networks are shared with the Siamese model and with the classification model during the BiLSTM Siamese network based classifier training.
[016] In one embodiment, instructions that when executed by hardware processors can additionally result in the use of one or more shared weights, a plurality of incorporations of
Petition 870180022152, of 03/20/2018, p. 10/43
7/29 consultation by going to one or more consultations through the Siamese model; a loss of contrasting divergence is applied to the plurality of consultation incorporations to optimize the Siamese model; and update one or more parameters of the classifier based on the BiLSTM Siamese network. In a modality, in which the step of applying a loss of contrasting divergence comprises: calculating, the Euclidean distance between the plurality of consultation incorporations; and compute the loss of contrasting divergence based on the calculated Euclidean distance.
[017] It should be understood that both the previous general description and the following detailed description are only exemplary and explanatory and are not restrictive as to the invention as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS [018] The attached drawings, which are incorporated into and constitute a part of this disclosure, illustrate exemplary modalities and, together with the description, serve to explain the revealed principles:
[019] Figure 1 illustrates an exemplary block diagram of a classifier system based on a Bidirectional Short-Long-Term Siamese Memory Network (BiLSTM) to identify the target class of queries and output their responses in accordance with a form of the present disclosure.
[020] Figure 2 illustrates an exemplary flowchart of a method for identifying the target class of queries and generating responses from them in accordance with one embodiment of the present disclosure using the system in Figure 1 in accordance with one embodiment of the present disclosure .
[021] Figure 3 illustrates an illustrative example of a Hybrid Classification and Siamese model with hybrid training procedure in accordance with one embodiment of the present disclosure.
[022] Figure 4 is a graphical representation that illustrates a predicted Probability Distribution (P), new probability distribution obtained after normalization and square root of P, and a target distribution T in accordance with one embodiment of the present disclosure.
[023] Figure 5 depicts a chatbot, called Watt, which answers questions about policies related to the Sickness and License Insurance Scheme (HIS) in accordance with an exemplary embodiment of the present disclosure.
Petition 870180022152, of 03/20/2018, p. 11/43
8/29 [024] Figure 6 illustrates sample queries from the Health Insurance Scheme dataset that depicts similar queries from a grouping according to one embodiment of the present disclosure.
[025] Figure 7 depicts (A) the incorporation of BiLSTM and (B) incorporations of HSCM-IT obtained in a model of classification of the system of Figures 1 and 2 according to one embodiment of the present disclosure.
[026] Figures 8A and 8B depict graphical representations that illustrate the variation of True Positive, Abstention and False Positive categories in relation to the entropy threshold according to one embodiment of the present disclosure.
DETAILED DESCRIPTION OF MODALITIES [027] The exemplary modalities are described with reference to the attached drawings. In the Figures, the leftmost digit (s) of a reference number identifies the Figure in which the reference number appears first. Whenever convenient, the same reference numbers will be used by all drawings to refer to the same or similar parts. Although the examples and resources of revealed principles are described in this document, modifications, adaptations, and other implementations are possible without departing from the spirit and scope of the revealed modalities. The following detailed description is intended to be considered as an example only, the true scope and spirit of which is indicated by the following claims.
[028] Typically, companies have large numbers of employees spread across geography. It is not surprising that the HR department of such a large organization is constantly flooded with questions that range from mundane to impossible to answer. Therefore, it is a department that actively seeks automated assistance, especially to relieve the burden of routine, but time-consuming tasks. The modalities of the present disclosure provide a Classifier based on BiLSTM Siamese Network to identify target class of queries and thus provide answers to queries that belong to the identified target class, which acts as an automated assistant that relieves the burden of responding to consultations in well-defined domains, for example, but without limitation, license management and health insurance. In the arena of automated assistants, this constitutes a questioning of closed domain issues, which is known to have better performance than
Petition 870180022152, of 03/20/2018, p. 12/43
9/29 that answering queries on any topic or answering open domain questions. In fact, the modalities of this disclosure focus on the automatic mapping of a query (or question) to a frequently answered question (FAQ) whose answer has been certified manually by the HR department. In principle, if the FAQs and their answers are already there, it may simply be a matter of finding the nearest FAQs and returning your answer (a simple application of finding the nearest neighbor, using a certain phrase representation). But, there are difficulties. First, the FAQ is not really a single question, but several, each of which deals with the same subject, and therefore has a common answer. In itself, this does not appear to pose any undue difficulties, as long as compatibility can be extended against a single question for compatibility against a set of questions, and return the answer associated with the set containing the best compatibility question. The real difficulty arises from the second issue: how to measure the similarity of a new query (that is, one that has not been seen before) to the questions in the FAQ classes A simple measurement based on bags-of-words usually does not work, since the questions are often semantically related and may contain only a few words in common. Consider a query like this: I am a deputy in Hyderabad, but the location of my project is Chennai. Flexi vacations that are shown in the system are according to the Chennai vacation list. Can I enjoy Flexi from both places (see Figure 5). Any question in a FAQ class is unlikely to have any significant compatibility simply on the basis of bag-of-words. Instead, the question is whether flexi licenses from one place apply to another. Thus, even if a set of FAQ classes and their responses have been manually curated, the difficulty of having to devise a measure of semantic similarity that allows to accurately decide the FAQ class for a new query still remains and is faced repeatedly .
[029] Just using BiLSTM for classification cannot be sufficient for the type of data set being worked on. An additional mechanism may be required for the separation of incorporation. With an intuition that the Siamese Model, as well as the classification model, tries to
Petition 870180022152, of 03/20/2018, p. 13/43
10/29 individually separate the incorporations of consultations, the modalities of this disclosure combine the two approaches iteratively. For this, the training of the Siamese model for one season is carried out, and then the transport of the same Base Network to train the Classification Model for the seasons b. This is done iteratively until the best accuracy is observed in the validation data set. Here, the first stage (Siamese Model) ensures that the model learns that the phrases are semantically similar / different while the second stage of an interaction (Classification Model) learns to predict the target class of each user query. In the present context, it is assumed that a and b are hyperparameters and that they have been tuned for the best performance in the validation set.
[030] The modalities of the present disclosure achieve this by providing a classifier based on the BiLSTM Siamese network (also known as the system) to identify the target class of queries and provide answers to them. In the example above, the system correctly finds the FAQ class for the Hyderabad-Chennai consultation. The queries received are mapped to one of the few hundred classes, each associated with an answer certified by the HR department as being a correct answer to all questions in the FAQ class.
PROBLEM FORMALIZATION:
[031] The training data (D) for the FAQ chatbot is available as D = {s lf s 2 , ..., s n }, which is a set of query sets s l . Here, each set of queries s l comprises a set of semantically similar queries X t = {χί, χ2>, ..., x l m }, and their corresponding answer y l , that is, s l = (Χι, γι ). The objective of the problem of what is attempted by the modalities of the present disclosure is to predict the set of queries s that corresponds to the consultation of users x, so that the corresponding answer y can be shown to users. This can also be termed as training data for a given sentence classification problem D. Each set of queries s l is assumed to be a class in the multiple class classification problem, that is, s = argmaxP (s l | x ).
s, ed [032] D training data for a chatbot normally contains
Petition 870180022152, of 03/20/2018, p. 14/43
11/29 some hundreds of classes, to facilitate the management of these classes, they are grouped into high-level categories, such as all classes related to sick leave can be grouped into one category. It was observed that the classes within a group have a high degree of concept overlap.
[033] Now with reference to the drawings, and more particularly, to Figures 1 to 8B, where similar reference characters denote corresponding features consistently throughout all Figures, the preferred modalities are shown and these modalities are described in the context of the system and / or example method to follow.
[034] Figure 1 illustrates an exemplary block diagram of a classifier system based on Bi-Directional Short-Long-Term Siamese Memory Network (BiLSTM) 100 to identify the target class of queries and generate their responses accordingly with one embodiment of the present disclosure. In one embodiment, system 100 includes one or more processors 104, communication interface devices or input / output (I / O) interfaces 106, and one or more data or memory storage devices 102 operatively coupled to one or more processors 104. The one or more processors 104 can be one or more software processing modules and / or hardware processors. In one embodiment, hardware processors can be deployed as one or more microprocessors, microcomputers, microcontrollers, digital signal processors, central processing units, state machines, logic circuits and / or any devices that manipulate signals based on operating instructions. . Among other capabilities, processors are configured to search and execute computer-readable instructions stored in memory. In one embodiment, device 100 can be deployed in a variety of computing systems, such as portable computers, notebooks, portable devices, workstations, large system computers, servers, a network cloud and the like.
[035] I / O interface devices 106 can include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like and can facilitate multiple communications over
Petition 870180022152, of 03/20/2018, p. 15/43
12/29 a wide variety of N / W networks and protocol types, which include wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular or satellite. In one embodiment, I / O interface devices can include one or more ports to connect numerous devices to each other or to another server.
[036] Memory 102 may include any computer-readable medium known in the art that includes, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and / or non-memory. volatile, such as read-only memory (ROM), erasable programmable ROM, fast memories, hard drives, optical discs, and magnetic tapes. In one embodiment, a database 108 can be stored in memory 102, in which database 108 can comprise, but without limitation, information pertaining to user interaction and system 100 comprising queries and responses, etc. In one embodiment, memory 102 can store modeling techniques, for example, Siamese model, classification model and the like, which are performed by one or more hardware processors 104 to carry out the methodology described in this document.
[037] Figure 2, with reference to Figure 1 illustrates an exemplary flowchart of a method to identify the target class of queries and generate responses from them in accordance with a modality of the present disclosure using the system 100 of Figure 1 in accordance with one embodiment of the present disclosure. In one embodiment, the system (systems) 100 comprises one or more storage devices or memory 102 operatively coupled to one or more hardware processors 104 and is configured to store instructions for executing method steps by one or more processors 104 System 100 stores values (and / or parameters) associated with trained models (Siamese model and the classification model). The steps of the method of the present disclosure will now be explained in reference to the components of system 100 as described in Figures 1 and 3, and in the flowchart of Figure 2. In one embodiment of the present disclosure, in step 202, the classifier system based on Bidirectional Short-Long-Term Siamese Memory Network (BiLSTM) 100 obtains, via one or more hardware processors, one or more user queries. In one embodiment, one or more user queries comprises a sequence of words = (w lf w 2 , ... w n ), from
Petition 870180022152, of 03/20/2018, p. 16/43
13/29 varied length n. In one embodiment, the BiLSTM 100 Siamese network-based classifier system comprises a Siamese model 302 and a classification model 304, as shown in Figure 3, and in which each of the Siamese model 302 and the classification model 304 comprises a common base network 306 (also referred to below as the base network) which includes an embedding layer 308 (also referred to below as a recurrent neural network (RNN embedding layer), a single BiLSTM layer 310 and a Layer Time Dense (TDD) 312. The classification model 304 includes a maxpool layer 314 followed by a softmax layer (not shown in Figures 2 and 3), more specifically, Figure 3, with reference to Figures 1 and 2 , illustrates an illustrative example of a Hybrid Classification and Siamese model with hybrid training procedure in accordance with a modality of the present disclosure. step 204, in the embedding layer 308 of the base network common to one or more user queries, they are represented as a vector representation sequence of each word learned using a word-to-vector model in queries, responses and document related policy together. In one embodiment, the word sequence is replaced by corresponding vectors and the corresponding vectors are initialized using the word-to-vector model, and in which the corresponding vectors are updated continuously during the training of the Siamese network-based classifier system of BiLSTM 100. The word-to-vector matrix (also referred to as word2vec below) is used to initialize the weights of an initial recurring embedding layer, which leads to one or more queries as a sequence of 1- hot, and output the encoded sequence of word vectors. Thus, the embedding layer 308 learns the sequential representation of each user query from a sequence of its word vectors x k = (v 1t v 2 , ... v n ). During the training of the rest of the model (including the 100 system), weights from this layer (ie, w2v) are also updated through back propagation.
[038] In one embodiment of the present disclosure, in step 206, the BiLSTM 310 layer of the Siamese model 302 receives the vector representation sequence of each word as input to generate an output (one or more hidden states of
Petition 870180022152, of 03/20/2018, p. 17/43
14/29 t at each time step). In one embodiment, the vector representation of each word is inserted in at least one forward order and an inverse order as a result of each word in the query, it maintains the context of other words on both the left and right sides. Short-term memory networks or LSTMs are a variant of RNNs (Recurrent Neural Networks). LSTMs are designed to mitigate the disappearance gradient issue, which occurs when RNNs learn sequences with long-term patterns. A user query returned by the Embedding layer 308, is represented as a sequence of vectors at each time stamp, that is, = (v 1 , v 2 , ^ v n ), which is the entry for the BiLSTM layer. The LSTM unit output is controlled by a set of doors in IV 'as a function of the previous hidden state h t _ 1 and the entry in the current time step v t , as defined below:
Gateway, i t = σ (θ νί υ { + 0 tli h t _ 1 + bj
Oblivion door, f t = σ (θ νί ν ΐ + 3 hf h t _ 1 + b f )
Exit port, t = σ (θ νο υ { + θ Κο / { _ 1 + b 0 )
Candidate hidden state, g t = tanh (B vg v t + 3 hg h t _ 1 + b q ) (1)
Internal Memory, c t = f t ®c t _ 1 + i t ®g t
Hidden state, h t = o t ®tanh (c t ) [039] Here, σ is the logistic sigmoid function with output in [0.1]. tanh denotes the hyperbolic tangent function with output at [-1, 1], and φ denotes the multiplication of elements by elements. f t can be seen as a function to decide how much information from the old memory will be forgotten, i t to control how much new information will be stored in the current memory cell, and the t controls the output based on the memory cell c t . The layers of Bidirectional LSTM (BiLTSM) 310 are used for the classification model 304, as shown in Figure 4. As mentioned above, the sequence is given as input in forward and reverse orders as a result of each word in the query. the context of other words on both the left and the right side. [040] In one embodiment of the present disclosure, in step 208, the output is sent (or processed) through the Dense Layer Distributed in Time (TDD) 312 of the common base network 306 to obtain a vector sequence. In one embodiment of the present disclosure, in step 210, the maxpool layer 314 of the
Petition 870180022152, of 03/20/2018, p. 18/43
15/29 classification model 304 obtains or removes the maximum dimension value per dimension from the vector sequence to form a final vector. In one embodiment, the classification model 304 uses the common base network above 306 to obtain T hidden states, one at each time step. These hidden states are passed through the maxpool layer 314 which acts as a kind of attention layer in the network and identifies the most important semantic resources of one or more queries. In one embodiment, this maxpool layer 314 removes the maximum value of dimension per dimension to form a final vector.
[041] In one embodiment of the present disclosure, in step 212, a softmax layer of the classification model 304 determines at least one target class of one or more queries based on the final vector formed and issues (or provides) a response to one or more queries based on the target class determined. In one embodiment, system 100 provides a response from one or more predefined responses stored in database 108. In one embodiment, a Square root Kullback-Leibler (KLD) divergence loss function is applied to the vector sequence to optimize the classification model 304. In one embodiment, the cross entropy loss function can be seen as the KL divergence between the predicted discrete probability distribution P (^), V / and {1,2, ... ri} and the target distribution t (^), which is an indicator function with a value of 1 for the right class and zero otherwise. They are represented as P t and Tt correspondingly, that is, KLD (Tt \ Pt) = £ T t log (-). In T t all other terms, except the target class, reduce to zero, as a result it reduces to -log (-), which is the known loss of cross entropy.
x i [042] In order to force the network to learn better the separation of incorporations (query incorporations), the loss above can be increased slightly for all forecasts, that is, regardless of whether the forecast is correct or incorrect. To do this, a square root of all probabilities in the forecast distribution P t and then renormalize to obtain the new probability distribution Q t . Q t has a higher entropy than P t , as shown in Figure
4. More specifically, Figure 4 is a graphical representation that illustrates a Predicted Probability Distribution (P), a new probability distribution obtained
Petition 870180022152, of 03/20/2018, p. 19/43
16/29 after normalization and square root of P, and T is the target distribution in accordance with one embodiment of the present disclosure. As can be seen from Figure 4, it reduces the probability of high likely classes, and slightly increases the probability of low likely classes. Instead of using the categorical cross entropy loss, KLD (Tí \ Qí) which in the case of a deep network, this is equivalent to scale the activation input to the final softmax level in half. As can be seen from the results of the evaluation presented in Tables 1, 2 and 3, this proposed approach helps to obtain better accuracy in the classification of BiLSTM, as well as when iteratively attached to the Siamese network (explained later in this section). This suggests that an artificial increase in loss helps with better separation of query embeds. A similar technique was used for a conventional approach in which the conventional approach took the square root of the predicted distribution and assumed as an auxiliary target distribution for grouping in unsupervised configurations, while the modalities of the present disclosure and the proposed approach remove the square root of the expected distribution and uses it to increase the loss, in the context of classification.
[043] In the model above, it was observed that many of the user queries that belong to a class are often classified incorrectly. In order to improve the classification accuracy, in each iteration after the execution of the classification model 304, a pair of frequently misclassified queries has been identified, that is, if many queries from one class are often predicted in another class in the set of validation data. In other words, during the training of the BiLSTM Siamese network-based classifier, one or more errors that belong to a set of queries were determined, in which the one or more errors comprise one or more target classes that are determined for the set of queries, based on which a set of erroneously classified query-query pairs was generated. The Siamese model was then iteratively trained using the set of mismatched query-query pairs together with one or more correct pairs to determine a target class and issue responses for one or more subsequent queries. As a result of the Siamese model 302 it tries to target the corresponding query incorporations and it becomes comparatively easier
Petition 870180022152, of 03/20/2018, p. 20/43
17/29 for the 306 classification model to classify these queries accurately, leading to better precision, as described below. Here, the fact that the Siamese Model 302 works on a pair of queries at a time is leveraged, which helps to direct the incorporation of queries from these classes into each iteration. In one modality, the one or more weights of the Base network are shared with the Siamese model and the Classification model during the training of the classifier based on the BiLSTM Siamese network. The Siamese model 302 leads to many different pairs of queries {x í , x j ·}, some of which belong to the same class, while others belong to different classes, that is, given a pair of queries, the objective of system 100 is predict whether they belong to the same class {1} or not {0}. As a result, with the use of one or more shared weights, a plurality of query embedding is achieved by passing one or more queries through the Siamese model 302 (for example, the same neural network architecture), in which the loss of contrasting divergence is applied in the plurality of query incorporations to update one or more parameters of the classifier system based on BiLSTM 100 Siamese network (or neural network) through the back propagation and, thus, to optimize the Siamese model. The Siamese network / model 302 contains the base network followed by a single layer of BiLSTM from which the final state is taken as the incorporation of the input query. The BiLSTM 310 layer (which is the penultimate layer of the Siamese model 302 returns the query incorporations e s (xz) and e s (xj) for each of the queries {xi, xj}. First, the Euclidean distance Ds, between the plurality of query incorporations, and s (xz) and and s (xj) are calculated, and the loss of contrasting divergence is computed (or calculated) based on the calculated Euclidean distance, which is illustrated by means of the expression below
L (si, Sj, C,) = C, * ( Ds ) + ( 1 - Cb * max (0, m - Ds) (2) [044] Here, C and {0,1} is the target class for the pair of queries. When the two queries belong to the same class (C, = 1), the first term becomes active and D s alone becomes the loss and the network tries to reduce the distance between the incorporations. two queries belong to different classes (C, = 0) the second expression term (2) becomes active, and if the distance between the additions is greater than the margin m the loss time becomes zero, otherwise the loss is ( m - D s ), this
Petition 870180022152, of 03/20/2018, p. 21/43
18/29 is, try to target separate incorporations. This efficiently leads to incorporations of similar queries together and drives incorporations of different queries separated by the minimum margin distance (m). Here, the pairs are sampled so that the ratio of positive pairs (belonging to the same class) and negative pairs (belonging to a different class) is 1: 2. Negative pairs are sampled so that the queries have a higher Jaccard similarity to each other. The schematic diagram of the Siamese Model 302 is shown in the upper rectangle in Figure 3.
RESULTS OF THE MODEL ASSESSMENT:
[045] Table 1 depicts general statistics for the three data sets (Health Insurance Scheme, License (HIS) and 20Newsgroups) used to perform all assessments. In addition, it shows the data divisions used for training, validation and test data, along with the average sentence length and the number of classes in each data set. The License and HIS chatbot data set is divided into training-validation-test sets at a ratio of 60-20-20.
TABLE 1
Property License HIS 20Newsgroups Training Data 2,801 4,276 7,507 Validation data 934 1,426 1,670 (787) Test data 934 1,426 5,415 Length ofaverage sentence 62 73 429 No. of classes 199 117 4
[046] 20Newsgroups (20NG): It consists of documents from 20 newsgroups. The Bydate version was used and four main categories were selected (comp, politics, rec and religion). The standard 20NG data split was used in test and training sets. In addition, 10% of the training data was used as a validation set. A stopping criterion
Petition 870180022152, of 03/20/2018, p. 22/43
Early 19/29 was employed based on the loss of validation of the classification model.
[047] Details and sample queries of the HR chatbot data are given below:
[048] Large organizations typically have human resources policies designed for employee benefits. These policies are often described in large documents, which are often difficult to read. Employees depend on the widespread perception of these policies or seek assistance from human resources agents, who act as impediments in large organizations, especially when consultations reveal personal information, such as pregnancy or illness. The purpose of the modalities of the present disclosure in the development of a digital assistant was both to ensure that employee inquiries remained confidential and that it provided accurate assistance in the form of curated responses, rather than mere indications for a bulky policy document. System 100 for identifying the target class of queries and providing answers to them (for example, FAQ wizard for HR-policy queries) was developed and integrated in this environment as a chatbot. Figure 5, with reference to Figures 1 to 4, depicts a chatbot, called Watt, which answers questions about policies related to the Sickness and License Insurance Scheme (HIS) in accordance with an exemplary embodiment of the present disclosure. Figure 6, with reference to Figures 1 to 5, illustrates sample queries from the Health Insurance Scheme dataset that depicts similar queries from a grouping according to one embodiment of the present disclosure.
[049] To create initial FAQs as well as a training set, a task force composed of specialists in the field of human resources and which received its separate collaboration group (called HR Bot Teachers). This team first created several sets of similar questions, each called a set of queries, where all the questions in a set of queries are so that they can be answered by a single answer. Then, the responses were curated by teachers, carefully reading the political documents, as well as the deliberation and discussion. 199 of such query sets were created for License policies and 117 for HIS policies. At the
Petition 870180022152, of 03/20/2018, p. 23/43
20/29 process the teachers ended up creating 10,000 different questions.
[050] After the creation of seed data as above, the first version of the system (also referred to below as chatbot) was installed / deployed and the subsequent training and data creation was done from the chatbot interface itself, with the use of command line instructions. Thus, it was possible to train the chatbot by generating the right query set id in the event that the prediction made was wrong; this feedback continuously produces additional training data with the use of which the HSCM-IT classifier is periodically retrained. During training time, in the event that erroneously classified questions are repeated almost literally between retraining intervals, the correction initially provided through the coach's feedback is returned instead of the classifier output, thus generating the illusion of continuous learning ).
DATA PROCESSING:
[051] These queries before being fed into system 100 were preprocessed in the following steps: i) the queries were converted to their lowercase letters, the system was made insensitive to the case by performing this step, ii) removal of special characters from the text and iii) capturing all abbreviations and replacing them with their real meaning, for example, ml is replaced by maternity leave, sml by special maternity leave. There was no removal of stop words, as it was observed that the removal of certain words from the text leads to a slight deterioration in the classifier's performance and, therefore, it was concluded that all words are necessary for better prediction accuracy.
[052] Word Distribution Vectors: After pre-processing the text, word2vec was learned using the skip gram algorithm / technique. All policy documents, chatbot responses as well as questions from all sets of queries were used to learn these domain-specific vector representations of all words. In addition, we tried to incorporate the general purpose word GLOVE learned in the data from the American Wikipedia®, however, it was observed that the incorporation of domain-specific words makes precision better. This may be due to many domain-specific terms or orthogonal meanings of words, such as
Petition 870180022152, of 03/20/2018, p. 24/43
21/29
License.
TRAINING DETAILS:
[053] The base network and its weights were shared in both branches of the Siamese model and in the classification model. The grid search for network hyperparameters was also carried out, namely, number of hidden units in the range of {100 to 350} with a lot size of 50 units, lot size in the range of 20, 40, 64, 128} and the learning rate in the range of {0.1, 0.01, 0.001, 0.0001}, and obtained the best set of parameters as chosen in the validation set. Finally, in the best choice of hyperparameters, each model was trained x times (say 10 times), with different initializations and the average precision / F1 score was observed in the unseen test data set. The best results with 250 hidden base network units for HIS and 300 for License data. While with 150 units hidden in the 20NG data sets were obtained. The batch size of 64 generated the best results across all data sets. The optimizer generated the best results across all data sets with a standard learning rate of 0.001. Finally, hyperparameters a and b were also tuned for the best results in the set of validations and it was found that HSCM-IT performed the best for a = 5 and b = 10.
[054] Regularization: LSTMs require a lot of training data and have a large number of parameters, as a result, they tend to fit training data easily, to avoid such techniques, for example, including early stop, L1 regularization / L2 (weight degradation) and batch normalization were used by the 100 system. Batch normalization is a very recent technique that has managed to reduce the change in internal covariate in the distribution of inputs for the model. It also resulted in faster convergence and better generalizations of the RNNs.
PROGRESSION FOR HYBRID MODEL (HSCM):
[055] The performance of the proposed HSCM-IT (F) technique, with a TF-IDF classifier that follows a bag-of-words approach (A) was compared. The main objective of other reported results is to progressively compare the performance of individual components of HSCM-IT, with its own. The various components that are compared are: (B) Bidirectional 2-layer LSTM, (C)
Petition 870180022152, of 03/20/2018, p. 25/43
22/29
Classification Model, and (D) Siamese Model, (E) HSCM without iterative training procedure. These results were reported in Table 2 for chatbot data sets and in Table 3 in a public 20NG data set. In all of these models, the benefits of using the loss of SQRTKLD were also reported, that is, in all among (B), (C) and (D) two evaluations were performed, one with a loss function of cross entropy and another with a function loss of SQRTKLD. More particularly, Table 2 depicts the comparison of average precision (over 10 operations) between the reference line techniques and the proposed technique (techniques) / proposed HSCM algorithm, with two loss functions of Cross Entropy and SQRT-KLD, in chatbot data sets. * indicates 1 operation only in Table 2. Table 3 shows the comparison of the average F1 Score (during 10 operations), in a data set of 20NG.
TABLE 2
Algorithm / Technique HIS License THE TF-IDF, 1-NN, Cosine Yes 79.80 58.35 B BiLSTM + X entropy 85.09 83.15 BiLSTM + SQRT-KLD 87.23 83.48Classif. + X entropy 86.26 83.44 Ç Classif. + SQRT- 89.76 83.78KLD D Siamese model + 1-NN 72.15 * 63.85 * AND HSCM + SQRT-KLD 89.19 83.44 F HSCM-IT + X entropy 89.12 83.87 HSCM-IT + SQRT-KLD 90.53 84.93
TABLE 3
Algorithm / Technique 20NG THE TF-IDF, 1-NN, Cosine Yes 90.20 B BiLSTM + X entropy 93.56 BiLSTM + SQRT-KLD 94.26
Petition 870180022152, of 03/20/2018, p. 26/43
23/29
Algorithm / Technique 20NG Ç Classif. + X entropy 93.79 Classif. + SQRT-KLD 94.22 F HSCM-IT + X entropy 94.87 HSCM-IT + SQRT-KLD 95.12
[056] Classification based on TF-IDF: The performance of the TF-IDF classifier was evaluated for the first time, which is based on the bag-of-word approach, indicating how many times the characteristic words of each class are present in the data . For this, the first TF-IDF vector for each set of queries, as well as for the user's query (which needs to be classified), was calculated and then the target class was found using the first neighbor closer, with the use of cosine similarity as the distance measure. The results indicate that the 20NG data set has many more class words than the HIS and License data sets. This is also due to the number of classes in chatbot data sets that is much higher than that of the 20NG data set. In the HIS and License data a maximum gain of ~ 11%, ~ 26% in accuracy was observed using the HSC model compared to the TF-IDF model, while in 20NG the corresponding F1 Score gain was observed to be ~ 6% only. The similarity of Jaccard sentences from pair to pair in the three data sets was calculated, and it was observed that the average Jaccard similarity between classes in 20NG is 0.0911 and in HIS and License it is 0.1066 and 0.1264, respectively. It also indicates that the HIS and License data sets are more difficult to classify.
[057] Deep Classification Model with BiLSTM: For the problem given in the description above, the first obvious choice that can be made is to use RNNs, as they involve sequential data. The modalities of the present disclosure therefore used (B) Bidirectional LSTMs as a starting point for the problem. A small gap between TD-IDF and BiLSTM in 20 NG indicates that the classes that were chosen were quite orthogonal, while, on the other hand, the difference increased in HIS data and was the maximum in License data, which highlights the
Petition 870180022152, of 03/20/2018, p. 27/43
24/29 fact that they are in fact the most difficult data among the three.
[058] Classification Model: This model uses an additional maxpool layer for attention. It can be seen that this model alone performs almost equal to (B) in license and 20NG data set, while a small gain was observed in the HIS data set. D) Siamese model with 1-NN: The accuracy of the Siamese model alone was also measured, with the 1NN classifier that uses the Euclidean distance between the user's consultation incorporations x u and the consultation incorporations present in the training data and D train . It can be seen that the accuracy of this model is worse (or not good) than the BiLSTM model itself.
[059] Hybrid Models: E) HSCM and HSCM-IT: Finally, it can be seen that the Hybrid (E) HSCM + SQRT-KLD model does not perform better than the Classification model itself. The proposed approach (F) HSCM-IT by system 100 performs better than all other approaches (A to E) in all data sets (HIS, License and 20NG), however, with a small margin at times. These results prove empirically that it is the iterative training procedure of the hybrid model that brings the main benefit in relation to other approaches and helps to separate the incorporations of different queries. Here, pairs of frequently misclassified queries were included, observed against validation data and extracted from training data, in Siamese training in each iteration.
[060] SQRT-KLD Loss Benefit: Through all three data sets and all deep learning approaches, there was a consistent pattern that SQRT-KLD led to a gain in accuracy / F1 Score over the loss of cross entropy. The F1 Score gain in the 20NG data set is consistently ~ 1%, while the accuracy gain using this loss function in the HIS data set is about 2 to 3%, and in the data set this gain is small.
[061] Separation of Incorporation: To illustrate how the HSCM-IT technique / algorithm helps to drive query incorporations away from queries from other classes, unlike inversion, a subset of classes was taken from the HIS data set. The classes in the HIS and License dataset have been organized into a series of categories, for example, all classes
Petition 870180022152, of 03/20/2018, p. 28/43
25/29 related to sick leave were distributed in the same category, or all classes related to Health Insurance were grouped into one category. Classes within a category are found to have many concepts of overlap, making it difficult to classify accurately. The incorporation of the training data that belong to the classes of the same category was removed and the T-SNE dimensionality reduction technique was used to visualize the degree of separation. Such sample comparison is shown in Figure 7. More particularly, Figure 7 depicts (A) the incorporation of BiLSTM and (B) incorporations of HSCM-IT obtained in the classification model 306 of system 100 of Figures 1 and 2 according to with one embodiment of the present disclosure. Here, queries in the same class share the same shape (for example, circle, square, rectangle, inverted triangle, diamond shape and eclipse shape). For example, all the circles depicted in Figure 7 correspond to class m only. Similarly, all the squares depicted in Figure 7 may correspond to class n only.
[062] Reference line comparison: The algorithm most similar to the proposed approach of finding the Query-Query similarity for the classification of the user's query to retrieve the answers is the RCNN technique. The performance of the proposed technique / algorithm was compared with the RCNN technique in chatbot data sets, as well as in 20NG. The results shown in Table 4 are based on the proposed implementation of the same algorithm. Here, it can be seen that HSCMIT performs better than RCNN by 3% on HIS data, and with 1% on License data.
TABLE 4
Algorithm HIS (Accuracy) LicensePrecision 20NG(Score ofF1) RCNN 87.31 83.30 96.69 * / 94.38 HSCM-IT + SQRT-KLD 90.53 84.93 95.12
INSTALLATION RESULTS:
[063] When installing a machine-based question answer system
Petition 870180022152, of 03/20/2018, p. 29/43
26/29 for human consumption, it is important in practice that the system try to answer your query correctly or refrain from responding instead of giving wrong answers, as far as possible. The entropy of the discrete probability distribution provided by the HSCM-IT Model was used to decide whether to abstain: If the entropy is higher than a chosen threshold τ, the system abstains from responding and instead forwards the user for a human response. To analyze the performance in this configuration, the model forecasts are divided into three categories: True-Positive (or True + band), False-Positive (False + band) and Abstention (or Abstention band). A plot for varying τ values is shown in Figures 8A and 8B, for both HIS and License data sets, respectively. More particularly, Figures 8A and 8B, with reference to Figures 1 to 7, depict graphical representations that illustrate the variation of True Positive, Abstention and False Positive categories in relation to the entropy threshold according to one embodiment of the present disclosure. An appropriate entropy threshold can be identified so that the levels of False-Positive and Abstention cases are kept within tolerable levels, with no significant drop in True-Positive. It can be seen from Figures 8A and 8B that the band (which indicates False +) is comparatively narrower in HSCM-IT than in RCNN plots (especially above 80% True +). This suggests that the HSCM-IT model is more installable in practice than the RCNN model. It can be speculated that the higher precision of the proposed HSCM-IT model can be attributed to the incorporation of separation, which was one of the main objectives. Using the true-positive best case ratio, it can be estimated that, after deploying such chatbots, the daily burden on the HR department to respond to policy-related queries should drop from the current 6,000 levels to less than 1,000.
[064] Last but not least, it is verified again that, for each consultation, the system 100 first decides whether the consultation is about insurance policy or license. The same model (HSCM-IT) is used to classify users' queries into two HIS categories, License, which was observed with very high accuracy (> 96%).
[065] The modalities of the present disclosure provide a method and system of
Petition 870180022152, of 03/20/2018, p. 30/43
27/29 Bidirectional Short-Long-Term Siamese Network-based classifier (BiLSTM) to identify target query classes and provide responses from them that act as a natural language assistant to automatically answer FAQs. System 100 introduces a new loss function SQRT-KLD usable within the softmax layer of a neural network. The modalities also demonstrated an effectiveness of the methodology through empirical evaluations and showed that it is better than a reference line approach both in public data sets and in real life data sets. From the experimental evaluation and its results, it is a clear indication that the HSCM-IT model has better precision and search compensation than the reference line technique, leading to a more implantable algorithm in practice. In addition, system 100 can reside (or is capable of residing or resides) in dedicated hardware or in a computer system that comprises (or resides in) a Graphics Processing Unit (GPU), specifically used for computer learning algorithms. machine or deep learning. Unlike conventional computer systems, system 100 comprises the GPU with high-end data processing components (for example, up to 1,000 to 10,000 cores), where system 100 processes a large amount of data and at the same time , reduces query processing time and, in addition, system 100 is trained on the GPU to accurately improvise by optimizing the Siamese model 302 and the classification model 304.
[066] The written description describes the matter in this document to allow anyone skilled in the art to make and use the modalities. The scope of the modalities of the matter is defined by the claims and may include other changes that occur to those skilled in the art. Such other modifications are intended to be in the scope of the claims if they have similar elements that do not differentiate them from the literal language of the claims, or if they include equivalent elements with insubstantial differences from the literal language of the claims.
[067] It should be understood that the scope of protection extends to such a program and, in addition, to a computer-readable medium that has a message on it; such computer-readable storage media contain
Petition 870180022152, of 03/20/2018, p. 31/43
28/29 program code for implementing one or more steps of the method, when the program operates on a server or mobile device or any suitable programmable device. The hardware device can be any type of device that can be programmed including, for example, any type of computer such as a server or a personal computer, or the like, or any combination thereof. The device may also include media that would, for example, be a hardware medium such as, for example, an application-specific integrated circuit (ASIC), a field programmable port array (FPGA), or a combination of hardware and software media , for example, an ASIC and an FPGA, or at least a microprocessor and at least one memory with software modules located on it. In this way, the means can include both hardware means and software means. The method modalities described in this document could be implemented in hardware and software. The device may also include software media. Alternatively, the modalities can be deployed on different hardware devices, for example, using a plurality of CPUs.
[068] The modalities in this document may comprise hardware and software elements. The modalities that are implemented in software include, but are not limited to, firmware, resident software, microcode, etc. The functions performed by several modules described in this document can be implemented in other modules or combinations of other modules. For the purposes of this description, a computer-readable or computer-usable medium can be any device that can understand, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution device, device or system .
[069] The illustrated steps are established to explain the exemplary modalities shown, and it must be anticipated that the technological development underway will change the way in which particular functions are carried out. These examples are presented in this document for purposes of illustration, not limitation. In addition, the boundaries of functional building blocks have been arbitrarily defined in this document for convenience of description. Alternative boundaries can be defined as long as their specific functions and relationships are properly carried out. At
Petition 870180022152, of 03/20/2018, p. 32/43
29/29 alternatives (which include equivalents, extensions, variations, deviations, etc., from those described in this document) will be apparent to people in the relevant techniques based on the teachings contained in this document. Such alternatives are in the scope and spirit of the revealed modalities. In addition, the words "that you understand", "that you have", "that contains" and "that includes" and other similar forms are intended to have equivalent meaning and to be restrictive in that an item or items followed by any of those words it is not intended to be an exhaustive listing of such an item or items, or is intended to be eliminated only for the item or items listed. In addition, it should be noted that, as used in this document and the appended claims, the singular forms one, one and a / o include several references, except when the context clearly states otherwise.
[070] In addition, one or more computer-readable storage media can be used in the implementation of modalities consistent with the present disclosure. A computer-readable storage medium refers to any type of physical memory in which information or data readable by a processor can be stored. In this way, a computer-readable storage medium can store instructions for execution by one or more processors, including instructions for making processors perform steps or stages consistent with the modalities described in this document. The term “computer-readable medium” should be understood to include tangible items and exclude carrier waves and transient signals, that is, to be non-transitory. Examples include random access memory (RAM), read-only memory (ROM), volatile memory, non-volatile memory, hard drives, CDROMs, BLU-RAYs, DVDs, flash drives, disks, and any other known physical storage media .
[071] The disclosure and examples are intended to be considered as exemplary only, with a true scope and spirit of the revealed modalities being indicated by the following claims.
权利要求:
Claims (12)
[1]
1. Process implemented by a processor characterized by understanding: obtaining through a classifier based on the Siamese Network of
Short-Long-Term Memory (BiLSTM) Bidirectional, through one or more hardware processors, one or more user queries, in which the one or more user queries comprise a sequence of words, in which the classifier based on BiLSTM's Siamese network comprises a Siamese model and a classification model, in which the Siamese model and the classification model comprise a common base network that includes an incorporation layer, a single BiLSTM layer and a Time Dense layer (TDD);
perform iteratively:
represent in the incorporation layer of the common base network, to one or more user queries as a sequence of vector representations of each word learned using a word-to-vector model;
insert, in the single BiLSTM layer of the common base network, the sequence of vector representations of each word to generate hidden states 't' at each timestep, in which the vector representation of each word is inserted in at least one order to forward and reverse order;
process, through the Dense Layer Distributed in Time (TDD) of the common base network, an output obtained from the BiLSTM layer in order to obtain a vector sequence;
obtain, using a maxpool layer of the classification model, maximum value in the sense of the dimension of the vector sequence in order to form a final vector; and determine by a softmax layer of the classification model, at least one target class of one or more queries based on the final vector and issue a response to one or more queries based on the determined target class.
[2]
2. Processor implanted method, according to claim 1, characterized by a Square root Kullback-Leibler divergence loss function applied to the vector sequence to optimize the classification model.
[3]
3. Processor implanted method, according to claim 1, characterized in that the word sequence is replaced by vectors
Petition 870180019522, of 03/09/2018, p. 9/15
2/5 corresponding and corresponding vectors are initialized using the word-to-vector model, and in which the corresponding vectors are updated continuously during the classifier training based on the BiLSTM Siamese network.
[4]
4. Method implemented by processor, according to claim 1, characterized by understanding to determine, during the classifier training based on BiLSTM Siamese network, one or more errors that belong to a set of queries, in which the one or more errors comprise one or more target classes that are determined for the set of queries;
generate a set of query-query pairs wrongly classified based on one or more errors; and iteratively train the Siamese model using the set of mismatched query-query pairs along with one or more correct pairs to determine a target class and issue responses to one or more subsequent queries, in which one or more weights of Common base networks are shared with the Siamese model and with the classification model during the BiLSTM Siamese network based classifier training.
[5]
5. Processor implemented by processor, according to claim 4, characterized by understanding to obtain, with the use of one or more shared weights, a plurality of consultations incorporating one or more consultations through the Siamese model;
apply a loss of contrasting divergence in the plurality of consultation incorporations to optimize the Siamese model; and update one or more classifier parameters based on the BiLSTM Siamese network.
[6]
6. Processor implanted method, according to claim 5, characterized in that the step of applying a loss of contrasting divergence comprises:
calculate, Euclidean distance between the plurality of query incorporations; and compute the loss of contrasting divergence based on distance
Petition 870180019522, of 03/09/2018, p. 10/15
3/5 Euclidean calculated.
[7]
7. Classifier system based on Bi-directional Short-Term Siamese Memory Network (BiLSTM) characterized by comprising:
a memory that stores instructions;
one or more communication interfaces; and one or more hardware processors coupled to the memory via one or more communication interfaces, where the one or more hardware processors are configured by the instructions to be obtained by the classifier system based on Siamese Short-Term Memory Network (BiLSTM) Bidirectional, by means of one or more hardware processors, one or more user queries, in which the one or more user queries comprise a sequence of words, in which the classifier system based on the Siamese network of BiLSTM comprises a Siamese model and a classification model, in which the Siamese model and the classification model comprise a common base network that includes an incorporation layer, a single layer of BiLSTM and a Dense Distributed Time layer (TDD) ;
perform iteratively:
represent in the incorporation layer of the common base network, to one or more user queries as a sequence of vector representations of each word learned using a word-to-vector model;
insert, in the single BiLSTM layer of the common base network, the sequence of vector representations of each word to generate hidden states 't' at each timestep, in which the vector representation of each word is inserted in at least one order to forward and reverse order;
processing, through the Dense layer distributed in time (TDD) of the common base network, an output obtained from the single layer of BiLSTM in order to obtain a vector sequence;
obtain, using a maxpool layer of the classification model, maximum value in the sense of the dimension of the vector sequence in order to form a final vector; and determine with the use of a softmax layer of the classification model, at least one target class of one or more queries based on
Petition 870180019522, of 03/09/2018, p. 11/15
4/5 final vector and issue a response to one or more queries based on the determined target class.
[8]
8. BiLSTM Siamese Network Based Classifier System, according to claim 7, characterized in that a Square root Kullback-Leibler divergence loss function is applied to the vector sequence to optimize the classification model.
[9]
9. BiLSTM Siamese Network Based Classifier System, according to claim 7, characterized in that the word sequence is replaced by corresponding vectors and the corresponding vectors are initialized using the word for vector model, and in which the corresponding vectors are updated continuously during the training of the BiLSTM Siamese network-based classifier system.
[10]
10. BiLSTM Siamese Network Based Classifier System, according to claim 7, characterized in that the one or more hardware processors are additionally configured by the instructions for:
determine, during the training of the classifier based on the BiLSTM Hybrid Siamese network, one or more errors that belong to a set of queries, in which the one or more errors belong to one or more target classes that are determined for the set of queries consultations;
generate a set of query-query pairs wrongly classified; and iteratively train the Siamese model using the set of mismatched query-query pairs along with one or more correct pairs to determine a target class and issue responses to one or more subsequent queries, in which one or more weights of Common base networks are shared with the Siamese model and with the classification model while training the BiLSTM Siamese network-based classifier system.
[11]
11. BiLSTM Siamese Network Based Classifier System, according to claim 10, characterized in that the one or more hardware processors are additionally configured by the instructions for:
obtain, with the use of one or more shared weights, a plurality of consultations incorporating one or more consultations through the Siamese model;
Petition 870180019522, of 03/09/2018, p. 12/15
5/5 apply a loss of contrasting divergence in the plurality of consultation incorporations to optimize the Siamese model; and update one or more parameters of the BiLSTM Siamese network-based classifier system.
[12]
12. BiLSTM Siamese Network Based Classifier System, according to claim 11, characterized in that the loss of contrasting divergence is computed by means of:
calculate a Euclidean distance between the plurality of query embedding; and compute the loss of contrasting divergence based on the calculated Euclidean distance.
类似技术:
公开号 | 公开日 | 专利标题
BR102018004799A2|2019-03-26|METHOD IMPLEMENTED BY A PROCESSOR AND BIDIRED-SHORT-TERM SIAMESE SYNCHESE NETWORK CLASSIFIER SYSTEM
Nickel et al.2017|Poincaré embeddings for learning hierarchical representations
US11227121B2|2022-01-18|Utilizing machine learning models to identify insights in a document
US20160357855A1|2016-12-08|Utilizing Word Embeddings for Term Matching in Question Answering Systems
Khot et al.2015|Gradient-based boosting for statistical relational learning: the Markov logic network and missing data cases
CN108399228B|2020-11-13|Article classification method and device, computer equipment and storage medium
He et al.2018|Joint binary neural network for multi-label learning with applications to emotion classification
CA3049051C|2021-12-28|Resolving abstract anaphoric references in conversational systems using hierarchically stacked neural networks
US20180121792A1|2018-05-03|Differentiable set to increase the memory capacity of recurrent neural networks
US20220027567A1|2022-01-27|Automatic lexical sememe prediction system using lexical dictionaries
CN111368996A|2020-07-03|Retraining projection network capable of delivering natural language representation
Fathony et al.2018|Distributionally robust graphical models
Hakami et al.2017|Compositional approaches for representing relations between words: A comparative study
US20200125992A1|2020-04-23|Systems and methods for conversational based ticket logging
EP3620947A1|2020-03-11|Dynamic intent classification based on environment variables
Khurana et al.2017|Hybrid bilstm-siamese network for faq assistance
US20210312265A1|2021-10-07|Response Generation using Memory Augmented Deep Neural Networks
US20210027770A1|2021-01-28|Multi-turn dialogue response generation with persona modeling
US11176327B2|2021-11-16|Information processing device, learning method, and storage medium
Mankolli et al.2020|Machine learning and natural language processing: Review of models and optimization problems
Ursani et al.2018|A hierarchical Nonlinear discriminant classifier trained through an evolutionary algorithm
Poonguzhali et al.2020|Evaluating the Performance of Keras Implementation of MemNN Model for Simple and Complex Question Answering Tasks
Boenninghoff et al.2021|Self-calibrating Neural-Probabilistic Model for Authorship Verification Under Covariate Shift
Jo2018|Classifying News Articles Using Feature Similarity K Nearest Neighbor
Chung2019|On deep multiscale recurrent neural networks
同族专利:
公开号 | 公开日
JP2019049957A|2019-03-28|
MX2018002974A|2019-03-12|
CA2997797C|2019-12-03|
EP3454260A1|2019-03-13|
CA2997797A1|2019-03-11|
AU2018201670B2|2020-03-26|
AU2018201670A1|2019-03-28|
JP6689902B2|2020-04-28|
US20190080225A1|2019-03-14|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US10386800B2|2015-02-24|2019-08-20|Siemens Industry, Inc.|Variable air volume modeling for an HVAC system|
CN108052577A|2017-12-08|2018-05-18|北京百度网讯科技有限公司|A kind of generic text content mining method, apparatus, server and storage medium|
US11086911B2|2018-07-31|2021-08-10|Wipro Limited|Method and system for generating question variations to user input|
US10983971B2|2018-11-28|2021-04-20|Intuit Inc.|Detecting duplicated questions using reverse gradient adversarial domain adaptation|
US10867338B2|2019-01-22|2020-12-15|Capital One Services, Llc|Offering automobile recommendations from generic features learned from natural language inputs|
WO2020208593A1|2019-04-12|2020-10-15|Incyzr Pty. Ltd.|Methods, systems and computer program products for implementing neural network based optimization of database search functionality|
CN110046240B|2019-04-16|2020-12-08|浙江爱闻格环保科技有限公司|Target field question-answer pushing method combining keyword retrieval and twin neural network|
CN110046244B|2019-04-24|2021-06-08|中国人民解放军国防科技大学|Answer selection method for question-answering system|
US10565639B1|2019-05-02|2020-02-18|Capital One Services, Llc|Techniques to facilitate online commerce by leveraging user activity|
CN110457478A|2019-08-09|2019-11-15|泰康保险集团股份有限公司|Text compliance inspection method and device, electronic equipment and computer-readable medium|
US11232110B2|2019-08-23|2022-01-25|Capital One Services, Llc|Natural language keyword tag extraction|
US10796355B1|2019-12-27|2020-10-06|Capital One Services, Llc|Personalized car recommendations based on customer web traffic|
WO2021142532A1|2020-01-14|2021-07-22|Halterix Corporation|Activity recognition with deep embeddings|
CN111241244A|2020-01-14|2020-06-05|平安科技(深圳)有限公司|Big data-based answer position acquisition method, device, equipment and medium|
CN111651992A|2020-04-24|2020-09-11|平安科技(深圳)有限公司|Named entity labeling method and device, computer equipment and storage medium|
EP3910493A4|2020-05-12|2021-11-17|Paypal Inc|Systems and methods for determining a response to a user query|
WO2022028696A1|2020-08-06|2022-02-10|Huawei Technologies Co., Ltd.|Network management device and method for mapping network devices from various telecom vendors|
CN112308148A|2020-11-02|2021-02-02|创新奇智科技有限公司|Defect category identification and twin neural network training method, device and storage medium|
法律状态:
2019-03-26| B03A| Publication of a patent application or of a certificate of addition of invention [chapter 3.1 patent gazette]|
2022-01-04| B08F| Application dismissed because of non-payment of annual fees [chapter 8.6 patent gazette]|Free format text: REFERENTE A 4A ANUIDADE. |
优先权:
申请号 | 申请日 | 专利标题
IN201721032101|2017-09-11|
IN201721032101|2017-09-11|
[返回顶部]