# Smart Reply: Automated Response Suggestion for Email ## Introduction * Proposes a novel, end-to-end architecture for generating short email responses. * Single most important benchmark of its success is that it is deployed in Inbox by Gmail and assists with around 10% of all mobile responses. * [Link to the paper.](https://arxiv.org/abs/1606.04870) ## Challenges in deploying Smart Reply in a user-facing product * Responses must always be of high quality. Ensured by constructing a target response set to select responses from. * The likelihood of choosing the responses must be maximised. Ensured by normalising the responses and enforcing diversity. * The system should not add latency to emails. Ensured by using a triggering model to decide if the email is suitable to undergo the response generation pipeline. Computation time is further reduced by finding approximate best result instead of the best result. * Ensure privacy by encrypting all the data which adds challenge in verifying the model's quality and debugging the system. ## Architecture ## Preprocess Email * Perform actions like language detection, tokenization, sentence segmentation etc on the input email. ## Triggering Model * A feed-forward neural network (with embedding layer and 3 fully connected hidden layers) to decide if the input email is suitable for suggesting responses. #### Data * Training set of pairs *(o, y)* where *o* is the incoming message and *y* is a boolean variable to indicate if the message had a response. #### Features * Unigrams, bigrams from the messages. * Signals like - is the recipient in the contact list of the sender. ## Response Selection * LSTM network to predict the approximate best response for an incoming message *o* #### Network * Sequence to Sequence Learning. * Reads the input message (token by token) and encode a vector representation. * Compute softmax to get the probability of first output token given the input token sequence. * Keep feeding in the previous response tokens and the input token sequence to compute the probability of next output token. * During inference, approximate the most likely response greedily by taking the most likely response at each timestamp and feeding it back or by using the beam search approach. ## Response Set Generation * Generate a set of high-quality responses that also capture the variability in the intent of the response. * Canonicalize the email response by extracting the semantic structure using a dependency parser. * Partition all response messages into "semantic" clusters. * These semantic clusters define the response space for scoring and selecting possible responses and for promoting diversity among the responses. ## Semantic Intent Clustering * Since a large, labelled dataset is not available, a graph based, semi-supervised approach is used. #### Graph Construction * Manually define a few clusters with a small number of example responses for each cluster. * Construct a graph with frequent response messages (including the labelled nodes) as response nodes (V<sub>R</sub>). * For each response node, extract a set of feature nodes (V<sub>F</sub>) corresponding to features like skip-gram and n-grams and add an edge between the response node and the feature node. * Learn a semantic labelling for all response nodes by propagating semantic intent information (available because of labelled nodes) throughout the graph. * After some iterations, sample some of the unlabeled nodes from the graph, manually label these sample nodes and repeat this algorithm until convergence. * For validation, extract the top k members of each cluster and validate the quality with help of human evaluators. ## Suggestion Diversity * Provide users with a varied set of response by omitting redundant response (by not selecting more than one response from any semantic cluster) and by enforcing negative (or positive) responses. * If the top two responses contain at least one positive (negative) response and none of the top three responses is negative (positive), the third response is replaced with a negative (positive) one. * This is done by performing a second LSTM pass where the search is restricted to only positive (or negative) responses in the target set. ## Strengths * The system is already in production and assists with around 10% of all mobile responses.