DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
 Claims 1, 4, 9, 13, 16, 18 and 19 have been amended by Applicant. Claims 21-24 have been added and no claims have been cancelled. Claims 1-24 are currently pending. 

Response to Arguments
The rejection of claims 1-6, 9-11, 14-20 under 35 U.S.C. 103 has been withdrawn in view of Applicant’s amendments to claims 1, 16 and 19. However, upon further consideration a new grounds of rejection has been made under 35 U.S.C. 103. See Claim Rejections under 35 U.S.C. 103 section further below. 
The rejection of claims 7 and 8 under 35 U.S.C. 103 has been withdrawn in view of Applicant’s amendments to claims 1, 16 and 19. However, upon further consideration a new grounds of rejection has been made under 35 U.S.C. 103. See Claim Rejections under 35 U.S.C. 103 section further below.
The rejection of claim 12 under 35 U.S.C. 103 has been withdrawn in view of Applicant’s amendments to claims 1, 16 and 19. However, upon further consideration a new grounds of rejection has been made under 35 U.S.C. 103. See Claim Rejections under 35 U.S.C. 103 section further below.
The rejection of claim 13 under 35 U.S.C. 103 has been withdrawn in view of Applicant’s amendments to claims 1, 16 and 19. However, upon further consideration a new grounds of rejection has been made under 35 U.S.C. 103. See Claim Rejections under 35 U.S.C. 103 section further below.
Applicant’s arguments with respect to claim(s) 1, 16, and 19 (as emended) and dependent claims therefrom have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.


12.	Claims 1-6, 9-11, 14-18, and 21-24 are rejected under 35 U.S.C. 103 as being unpatentable over Vinyals et al. (U.S Publication 20160180215-A1) in view of Voss (EP 3457332 A1), in further view of Li et al. “Joint Embedding of Hierarchical Categories and Entities for Concept Categorization and Dataless Classification” (2016).

Regarding claim 1, Vinyals teaches a classification method performed by one or more computers (Vinyals, fig.1 abstract), the method comprising: 

processing the sequence of inputs with5 an encoder recurrent neural network (RNN) to generate a respective encoder hidden state for ones of the sequence of inputs (Vinyals 0031: A method performed by one or more computers using an encoder LSTM recurrent neural network that receives an input text segment and generates an alternative representation from the input text segment using the encoder hidden state. Each hidden layers in the network is used as an input to produce an output. Vinyals, Paragraph [0032] further teaches the encoder LSTM neural network has been configured to process each word in a given input text segment to generate the alternative representation of the input text segment. In particular, the encoder LSTM neural network is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network… ;Paragrarph [0004] further teaches a recurrent neural network receives an input sequence and generates an output sequence from the input sequence. An example of a recurrent neural network is a Long Short-term Memory (LSTM) neural network.; Also see [claim 1]), …;
processing the respective ones of the encoder hidden states with a decoder RNN to produce a sequence of outputs, the sequence of outputs including output word embeddings corresponding to the class labels, the sequence of outputs to be a directed hierarchical sequence of outputs representing a directed classification path for the input text block in the multi-level hierarchical classification taxonomy, …(Vinyals, 0032: the encoder LSTM neural network is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network by processing the received input. The system processes the generated alternative representation [sequence of output] of the input text segment using the decoder LSTM neural network. Also see Abstract and [0033]), …  

However, Vinyals fails to distinclty disclose the remaining limitations. 

Neverthesless, Voss teaches generating a sequence of inputs corresponding to an input text block by replacing respective words in the input text block with an input word embedding , the replacing based on mappings stored in an input dictionary… (Voss, Paragraph [0033] teaches before using an example sentence for training the neural network it is pre-processed.; Voss, Paragraph [0035] further teaches a word embedding is used to turn the input into a list of real valued vector. Word embedding is a technique in which words from a vocabulary are mapped to vectors of real numbers.), …
Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, to include the word embedding, as taught by Voss, in order to provide high quality of categorization. (Voss, Abstract). 

However, the combination fails to teach the remaining limitations.

Nevertheless, Li teaches:
… wherein the input word embeddings are dense vectors that project both (a) words in the input dictionary and (b) class labels in a hierarchy structure dictionary into a learned continuous vector space to associate the ones of the words and the respective ones of the class labels, the class labels associated with nodes of a multi-level hierarchical classification taxonomy (Li, Abstract, teaches framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases; Li, Introduction, teaches hierarchical category embedding model which extends category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories.);

the sequence of outputs including output word embeddings corresponding to the class labels, the sequence of outputs to be a directed hierarchical sequence of outputs representing a directed classification path for the input text block in the multi-level hierarchical classification taxonomy (Li, Introduction, teaches hierarchical category embedding model which extends the category embedding model by integrating categories’ hierarchical structure. It considers all ancestor categories of one entity. The final learned entity and category vectors can capture semantic relatedness between entities and categories., Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).).

converting the output word embeddings corresponding to the sequence of outputs into an output classification by replacing the output word embeddings with the class labels based on mappings stored in the hierarchy structure dictionary (Li, Section 3, teaches in order to find representations for categories and entities that can capture their semantic relatedness, we use existing hierarchical categories and entities labeled with these categories, and explore two methods: 1) Category Embedding model (CE model): it replaces the entities in the context with their directly labeled categories to build categories’ context; 2) Hierarchical Category Embedding (HCE model): it further incorporates all ancestor categories of the context entities to utilize the hierarchical information.);

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, to further include the category embedding model and hierarchical embedding model, as taught by Li, in order to handle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification. (Li, Abstract). 



Regarding claim 2, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and the combination further teaches wherein the sequence of outputs is selected, in an output order, from a predetermined vocabulary of outputs representing respective class nodes in a rooted tree representation of the multi-level hierarchical classification taxonomy. (Vinyals, 0036: The set of possible outputs selected are symbols from a pre-determined vocabulary of symbols that determine the hierarchical relationship between other symbols in a sequence which maps to a tree structure representation. Also see [0009 and 0034]; [Note: Li, Section 5, teaches Hierarchical Category Embedding model wherein the category hierarchy is preprocessed to construct a directed acyclic graph (DAG))]. 

Motivation to combine same as stated above for claim 1. 



Regarding claim 3, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 2, and the combination further teaches: 
wherein respective outputs of the sequence of outputs to be predicted at respective successive positions in the output order … (Vinyals 0043: the sequence of symbols from a pre-determined vocabulary arranged according to an output order. Also see [0057]).
respective outputs of the sequence of outputs to be predicted at respective successive positions in the output order corresponds to respective successive levels in the hierarchical classification taxonomy (Note: Li, Section 5, teaches Hierarchical Category Embedding model wherein the category hierarchy is preprocessed to construct a directed acyclic graph (DAG)).

Motivation to combine same as stated above for claim 1. 



Regarding claim 4, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 23, and the combination further teaches: 
wherein the parent-child relations and the subset relationships between corresponding ones of the parent-child classes directly impose interclass relationships between class nodes in the multi-level hierarchical classification taxonomy (Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).; Li, Section 3, teaches the hierarchical category embedding model incorporates all ancestor categories of the context entities to utilize the hierarchical information; Li, Section 3.2 further teaches extending the category embedding model to further incorporate the ancestor categories of the target entity when predicting the context entity. If a category is near an entity, its ancestor categories would also be close to that entity. [Note: ancestor categories understood to read on parent-child relations]).
Motivation to combine same as stated above for claim 1. 
	

Regarding claim 5, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 2, and Vinyals further teaches: 
	wherein processing the respective encoder hidden25 states includes, for respective positions in the output order, producing a decoder hidden state for the ones of the respective positions with the decoder RNN (Vinyals, 0042: The system processes each input using the encoder LSTM neural network to generate the alternative representation [output], which is a hidden state of the encoder LSTM. The system then processes the alternative representation [output] using a decoder LSTM neural network to generate linearized representation [output], each possible linearized representation including the corresponding selected possible output at the first position in the output order. Also see [Vinyals, 0032, 0043 and 0057]). 

	processing the encoder hidden states and the decoder hidden state to generate a set of output scores for the outputs in the predetermined vocabulary. (Vinyals, 0035: The decoder LSTM neural network is an LSTM neural network that includes one or more LSTM layers and that is configured receive a current output in a linearized representation and to generate a respective output score for each of a set of possible outputs.)
	

Regarding claim 6, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 5, and Vinyals further teaches including for respective ones of the positions in the output order, selecting a respective output in the predetermined vocabulary based on the output scores. (Vinyals, 0043: The linearized representation [output] is a sequence of symbols from a pre-determined parse tree vocabulary arranged according to an output order.)


Regarding claim 9, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 5, and Vinyals further teaches:
 further including for respective ones of the positions in the output order:
processing a current output with the decoder RNN to generate an updated decoder RNN hidden state for a first position in the output order; (Vinyals, 0013: Processing the output for the input text segment using the second LSTM neural network [decoder] may comprise initializing a hidden state of the second LSTM neural network [decoder] to the output for the input text segment.);

generating a set of attention scores for the first position from the updated decoder 20 RNN hidden state for the first position and the encoder RNN hidden states for the inputs in the sequence; (Vinyals, 0058: The system generates a respective set of output scores for each maintained possible linearized representation [output] for the current position in the output order.);

normalizing the set of attention scores for the first position to derive a respective set of normalized attention scores for the first position; (Vinyals, 0065: the system can train the networks jointly by backpropagation gradients computed for the decoder LSTM neural network back to the encoder LSTM neural network to adjust the values of the parameters of the encoder LSTM neural network during the training technique.);

selecting an output for the first position based on the normalized attention scores and 25 the updated decoder RNN hidden state for the first position in the output order (Vinyals, 0050: The system processes the selected output using the decoder LSTM neural network to generate a set of next output scores, the system processes the selected output in accordance with the updated hidden state of the network to generate the set of next output scores and to again update the hidden state of the network.).

Motivation to combine same as stated above for claim 1. 


Regarding claim 10, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 9, and Vinyals further teaches: 
including combining the encoder RNN hidden states in accordance with the normalized attention scores to obtain a combination of encoder RNN hidden states for the first position (Vinyals, 0065: the system can train the networks jointly by computed for the decoder LSTM neural network back to the encoder LSTM neural network to adjust the values of the parameters of the encoder LSTM neural network during the training technique.), and 

generating a next 30 decoder RNN hidden state for a next position in the output order by combining the 020/017001- 20 - combination of encoder RNN hidden states for the position with the updated decoder RNN hidden state. (Vinyals, 0057: The system initializes the initial hidden state of the decoder LSTM neural network to the alternative representation of the input text segment and generates the set of initial output scores. With each initial score, it select possible outputs and creates linearized representation corresponding selected possible output at the first position in the output order.). 




Regarding claim 11, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and Vinyals further teaches wherein the encoder RNN and the decoder 5RNN are long short-term memory (LTSM) neural networks. (Vinyals, 0029: The system includes an encoder long short-term memory (LSTM) neural network and a decoder LSTM neural network 120. Also see [0031]).

Motivation to combine same as stated above for claim 1.


Regarding claim 14,  the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and Vinyals further teaches wherein the processing of the respective encoder hidden states terminates when the decoder RNN produces a designated end-of-15sequence placeholder output. (Vinyals, 0048: The system generates a set of initial output scores using the decoder LSTM neural network in accordance with the initial hidden state as well as processes an initial placeholder output and update the hidden state of the network until the highest-scoring output is the end-of-sentence token.)



Regarding claim 15, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and the combination further teaches:
further including outputting a text-based description of the ones of the classes in the multi-level hierarchical classification taxonomy corresponding to the ones of the outputs in the produced sequence of 20 outputs. (Vinyals, 0034: system processes the generated alternative representation of the input text segment using the decoder LSTM neural network to generate a linearized representation of the parse tree for the input text segment)

comprising outputting a text-based description of the ones of the classes classes in the multi-level hierarchical classification taxonomy corresponding to the ones of the outputs in the produced sequence of 20outputs (Li, Section 5, teaches Hierarchical Category Embedding model wherein the category hierarchy is preprocessed to construct a directed acyclic graph (DAG))

Motivation to combine same as stated above for claim 1. 




Regarding claim 16, Vinyals teaches a system comprising one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, cause the one or more computers to perform operations comprising:25 (Vinyals, 0066: Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus [computer].); 
processing the sequence of inputs with5 an encoder recurrent neural network (RNN) to generate a respective encoder hidden state for ones of the sequence inputs (Vinyals 0031: A method performed by one or more computers using an encoder LSTM recurrent neural network that receives an input text segment and generates an alternative representation from the input text segment using the encoder hidden state. Each hidden layers in the network is used as an input to produce an output. Also see [claim 1] and [0032]), … ;

processing the respective ones of the encoder hidden states with a decoder RNN to produce a sequence of outputs, the sequence of outputs to be a directed hierarchical sequence of outputs representing a structured classification path for the input text block in a multi-level hierarchical classification taxonomy wherein the sequence of outputs includes output word embeddings (Vinyals, 0032: the encoder LSTM neural network is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network by processing the received input. The system processes the generated alternative representation [sequence of output] of the input text segment using the decoder LSTM neural network. Also see Abstract and [0033]); 

However, Vinyals fails to distinclty disclose the remaining limitations.
Nevertheless, Voss teaches generating a sequence of inputs corresponding to an input text block by replacing respective words in the input text block with an input word embedding , the replacing based on mappings stored in an input dictionary… (Voss, Paragraph [0033] teaches before using an example sentence for training the neural network it is pre-processed.; Voss, Paragraph [0035] further teaches a word embedding is used to turn the input into a list of real valued vector. Word embedding is a technique in which words from a vocabulary are mapped to vectors of real numbers.), …

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, to include the word embedding, as taught by Voss, in order to provide high quality of categorization. (Voss, Abstract). 

However the combination fails to teach the remaining limitations.

Nevertheless Li teaches: 
… wherein the input word embeddings are dense vectors that project both (a) words in the input dictionary and (b) class labels in a hierarchy structure dictionary into a learned continuous vector space to associate the ones of the words and the respective ones of the class labels, the class labels associated with nodes of a multi-level hierarchical classification taxonomy (Li, Abstract, teaches framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases; Li, Introduction, teaches hierarchical category embedding model which extends category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories.);

 … the sequence of outputs to be a directed hierarchical sequence of outputs representing a structured classification path for the input text block in a multi-level hierarchical classification taxonomy wherein the sequence of outputs includes output word embeddings (Li, Introduction, teaches hierarchical category embedding model which extends the category embedding model by integrating categories’ hierarchical structure. It considers all ancestor categories of one entity. The final learned entity and category vectors can capture semantic relatedness between entities and categories., Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).);

converting the output word embeddings corresponding to the sequence of outputs into an output classification by replacing the output word embeddings in the sequence of outputs with class labels in the output classification based on mapping stored in the hierarchy structure dictionary, wherein the sequence of outputs is produced, in an output order, from a predetermined vocabulary of outputs representing respective class nodes in a directed acyclic graph representation of the multi-level hierarchical classification taxonomy (Li, Section 3, teaches in order to find representations for categories and entities that can capture their semantic relatedness, we use existing hierarchical categories and entities labeled with these categories, and explore two methods: 1) Category Embedding model (CE model): it replaces the entities in the context with their directly labeled categories to build categories’ context; 2) Hierarchical Category Embedding (HCE model): it further incorporates all ancestor categories of the context entities to utilize the hierarchical information.; Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG).; Note: Vinyals, 0036 teaches The set of possible outputs selected are symbols from a pre-determined vocabulary of symbols that determine the hierarchical relationship between other symbols in a sequence which maps to a tree structure representation. Also see [0009 and 0034]);

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, to further include the category embedding model and hierarchical embedding model, as taught by Li, in order to handle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification. (Li, Abstract)


Regarding claim 17, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 16, and the combination further teaches the directed acyclic graph representation of the multi-level hierarchical classification taxonomy is a rooted tree, and a current output to be predicted at a successive position in the output order corresponds to a respective successive level in the hierarchical classification taxonomy (Vinyals, 0036: The set of possible outputs selected are symbols from a pre-determined vocabulary of symbols that determine the hierarchical relationship between other symbols in a sequence which maps to a tree structure representation. Also see [Vinyals, 0009 and 0034]; [Note: Li Section 5 teaches we preprocess the category hierarchy by pruning administrative categories and deleting bottom-up edges to construct a DAG. The final version of data contains 5,373,165 entities and 793,856 categories organized as a DAG with a maximum depth of 18. The root category is “main topic classifications”.]). 


Regarding claim 18, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 16, and the combination further teaches:
wherein the one or more storage devices store classification data including  a trained neural network classification model that includes a neural network trained to map the input text block to an output classification corresponding to the sequence of outputs according to the multi-level hierarchical classification taxonomy (Vinyals, 0070: a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data and a recurrent neural network is a neural network that receives an input sequence and generates an output sequence from the input sequence. Also see [Vinyals, 0031]);

wherein the one or more storage devices store classification data comprising a trained neural network classification model that includes a neural network trained to map the input text block to an output classification corresponding to the sequence of outputs according to the multi-level hierarchical classification taxonomy (Li, Introduction, teaches hierarchical category embedding model which extends the category embedding model by integrating categories’ hierarchical structure. It considers all ancestor categories of one entity. The final learned entity and category vectors can capture semantic relatedness between entities and categories., Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).);

processing the sequence of inputs includes using the trained neural network classification model to generate the respective encoder hidden state for respective ones of the inputs (Vinyals, 0032: the encoder LSTM neural network is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network by processing the received input. Also see [Vinyals, Abstract]); and   

processing the sequence of outputs includes using the trained neural network …(Vinyals, 0005: system is implemented as computer programs on one or more computers and can generate an output for an input text segment using long short-term memory (LSTM) neural networks that employ one or more layers of nonlinear units to predict an output for a received input). 

processing the sequence of outputs includes using the trained neural network classification model to produce the sequence of outputs representing a classification path in the multi-level hierarchical classification taxonomy for the input text 20 block (Li, Introduction, teaches hierarchical category embedding model which extends the category embedding model by integrating categories’ hierarchical structure. It considers all ancestor categories of one entity. The final learned entity and category vectors can capture semantic relatedness between entities and categories., Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).)

Regarding claim 21, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and the combination further teaches further including applying, during training of the encoder RNN and decoder RNN ( Vinyals 0065 teaches in order to configure the encoder LSTM neural network and the decoder LSTM neural network, the system can train the networks using conventional machine learning training techniques, e.g., using Stochastic Gradient Descent with backpropagation through time. In particular, the system can train the networks jointly by backpropagating gradients computed for the decoder LSTM neural network back to the encoder LSTM neural network to adjust the values of the parameters of the encoder LSTM neural network during the training technique.; Vinyals 0032 teaches the encoder LSTM neural network 110 has been configured, e.g., through training, to process each word in a given input text segment to generate the alternative representation of the input text segment in accordance with a set of parameters.), an embedded layer to learn word embeddings for respective ones of words in the input dictionary and ones of the class labels in the hierarchy structure dictionary (Li, Introduction, teaches training the category and entity vectors on Wikipedia, and then evaluating the methods from two applications concept categorization and dataless hierarchical classification; Li Introduction further teaches in this paper we propose two models to simultaneously learn entity and category representation from large-scale knowledge bases. The category embedding model extends the entity embedding method of (Hu et al. 2015) by using category information with entities to learn entity and category embeddings. The hierarchical category embedding model extends the category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories.).

Motivation to combine same as stated above for claim 1. 

Regarding claim 22, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and Li further teaches wherein the multi-level hierarchical classification taxonomy is a directed acyclic graph (Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG)).

	Motivation to combine same as stated above for claim 1. 

	Regarding claim 23, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 2, and the combination further teaches wherein the directed hierarchical sequence of outputs is structured by parent-child relations between respective class nodes, the parent-child relations to induce subset relationships between corresponding parent-child classes, wherein a classification region of a child class in the sequence of outputs is a subset of a classification region of a respective parent class (Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG). The final version of data contains 5,373,165 entities and 793,856 categories organized as a directed acyclic graph (DAG).; Li, Section 3, teaches the hierarchical category embedding model incorporates all ancestor categories of the context entities to utilize the hierarchical information; Li, Section 3.2 further teaches extending the category embedding model to further incorporate the ancestor categories of the target entity when predicting the context entity. If a category is near an entity, its ancestor categories would also be close to that entity. [Note: ancestor categories understood to read on parent-child relations]).  


	Regarding claim 24, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, and the combination further teaches, wherein the sequence of outputs is a first sequence of outputs and the directed classification path is a first directed classification path, further including processing respective ones of the encoder hidden states with the decoder RNN to produce a second sequence of outputs that is different than the first sequence of outputs (Vinyals, Abstract, teaches obtaining an input text segment, processing the input text segment using a first long short term memory (LSTM) neural network to convert the input text segment into an alternative representation for the input text segment, and processing the alternative representation for the input text segment using a second LSTM neural network to generate a linearized representation of a parse tree for the input text segment), the second sequence of outputs representing a second directed classification path for the input text block in the multi-level hierarchical classification taxonomy that is different from the first directed classification path (Li, Section 3.1 teaches In knowledge bases such as Wikipedia, category hierarchies are usually given as DAG or tree structures, entities are categorized into one or more categories as leaves.).

13. 	Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Vinyals in view of Li. 

Regarding claim 19, Vinyals teaches one or more non-transitory computer storage media encoded with a computer program product comprising instructions that, when executed by one or more computers, cause the one or more computers to perform operations including at least:25 (Vinyals, 0066: Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non transitory program carrier for execution by, or to control the operation of, data processing apparatus [computer]. Also see [claim 20]).

processing a source sequence of inputs corresponding to an input text block with5 an encoder recurrent neural network (RNN) to generate a respective encoder hidden state for the source sequence of inputs(Vinyals 0031: A method performed by one or more computers using an encoder LSTM recurrent neural network that receives an input text segment and generates an alternative representation from the input text segment using the encoder hidden state. Each hidden layers in the network is used as an input to produce an output. Also see [claim 1] and [0032]),…;

processing the respective encoder hidden states with a decoder RNN to produce a sequence of outputs, the sequence of outputs to be a directed hierarchical sequence of outputs representing a directed classification path for the input text block in a multi-level hierarchical classification taxonomy, the sequence, the sequence of outputs is to include output word embeddings corresponding to the class labels;  (Vinyals, 0032: the encoder LSTM neural network is configured to receive each word in the input text segment in the input order and, for a given received input, to update the current hidden state of the encoder LSTM neural network by processing the received input. The system processes the generated alternative representation [sequence of output] of the input text segment using the decoder LSTM neural network. Also see Abstract and [0033]), … ; 

However, Vinyals fails to distinclty disclose the remaining limitations. 

Nevertheless Li teaches …the sequence of outputs to be a directed hierarchical sequence of outputs representing a directed classification path for the input text block in a multi-level hierarchical classification taxonomy, the sequence, the sequence of outputs is to include output word embeddings corresponding to the class labels (Li, Abstract, teaches framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases; Li, Introduction, teaches hierarchical category embedding model which extends category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories. Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG).);

… wherein the source sequence of inputs includes input word embeddings converted from input text block based on mappings stored in an input dictionary, wherein word embeddings are dense vectors that project (a) words in the input dictionary and (b) class labels in a hierarchy structure dictionary into a learned continuous vector space (Li, Abstract, teaches framework that embeds entities and categories into a semantic space by integrating structured knowledge and taxonomy hierarchy from large knowledge bases; Li, Introduction, teaches hierarchical category embedding model which extends category embedding model by integrating categories’ hierarchical structure…The final learned entity and category vectors can capture meaningful semantic relatedness between entities and categories.);

converting the output word embeddings into an output classification by replacing the  output word embeddings in the sequence of outputs with class labels in the output classification based on mappings stored in the hierarchy structure dictionary wherein the sequence of outputs is produced, in an output order, from a predetermined vocabulary of outputs representing respective class nodes in a directed acyclic graph representation of the multi-level hierarchical classification taxonomy (Li, Section 3, teaches in order to find representations for categories and entities that can capture their semantic relatedness, we use existing hierarchical categories and entities labeled with these categories, and explore two methods: 1) Category Embedding model (CE model): it replaces the entities in the context with their directly labeled categories to build categories’ context; 2) Hierarchical Category Embedding (HCE model): it further incorporates all ancestor categories of the context entities to utilize the hierarchical information.; Li, Section 5, Experiments, teaches preprocessing the category hierarchy to construct a directed acyclic graph (DAG).; Note: Vinyals, 0036 teaches The set of possible outputs selected are symbols from a pre-determined vocabulary of symbols that determine the hierarchical relationship between other symbols in a sequence which maps to a tree structure representation. Also see [0009 and 0034]);

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, to further include the category embedding model and hierarchical embedding model, as taught by Li, in order to handle both single-word concepts and multiple-word concepts with superior performance on concept categorization and yield state of the art results on dataless hierarchical classification. (Li, Abstract)



	Regarding claim 20, the combination of Vinyals in view of Li teaches all of the limitations of claim 19, and the combination further teaches wherein the directed acyclic graph representation of the multi-level hierarchical classification taxonomy is a rooted tree, and a current output to be predicted at a successive position in the output order corresponds to a respective successive level in the hierarchical classification taxonomy. (Vinyals, 0036: The set of possible outputs selected are symbols from a pre-determined vocabulary of symbols that determine the hierarchical relationship between other symbols in a sequence which maps to a tree structure representation. Also see [Vinyals, 0009 and 0034]; [Note: Li Section 5 teaches we preprocess the category hierarchy by pruning administrative categories and deleting bottom-up edges to construct a DAG. The final version of data contains 5,373,165 entities and 793,856 categories organized as a DAG with a maximum depth of 18. The root category is “main topic classifications”.]). 

Motivation to combine same as stated above for claim 19.



14.	Claim 7 and claim 8 are rejected under 35 U.S.C 103 as being unpatentable over Vinyals in the view of Voss and Li, and further in view of Redlich (U.S Patent 9734169B2).

Regarding claim 7, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 6, however, the combination does not distinctly disclose wherein for respective ones of the positions in the output order, the selecting includes restricting the selection of the respective output to a respective subset of available class nodes in the rooted tree identified in an allow list of allowable class nodes associated with the preceding output.

Nevertheless, Redlich teaches wherein for respective ones of the positions in the output order, the selecting comprises restricting the selection of the respective output to a respective subset of available class nodes in the rooted tree identified in an allow list of allowable class nodes associated with the preceding output. (Redlich, Col. 61, lines 14-67 teaches the inputs may be processed through one or more simple filters extracting white list terms (inclusive lists / allow lists)  or black list terms (exclusive lists) or terms not found in dictionaries).

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, as modified by  the category embedding model and hierarchical embedding model, as taught by Li, to further include the content filter, as taught by Redlich, in order to allow for filtering of uncommon words, terms or data elements not found in a dictionary, thus enhancing accuracy in text classification and text relevance problems, including search.



Regarding claim 8, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 6, however, the combination does not distinctly disclose for respective ones of the positions in the output order, the selecting includes refraining from selecting the respective output from a respective subset of available class nodes in the rooted tree identified in a block list of disallowed class nodes associated with the preceding output.

Nevertheless, Redlich teaches wherein for respective ones of the positions in the output order, the selecting comprises refraining from selecting the respective output from a respective subset of available class nodes in the rooted tree identified in a block list of disallowed class nodes associated with the preceding output. (Redlich, Redlich, Col. 61, lines 14-67 teaches the inputs may be processed through one or more simple filters extracting white list terms (inclusive/allow lists) or black list terms (exclusive/block lists) or terms not found in dictionaries).

Motivation to combine same as stated above for claim 7. 



15.	Claim 12 is rejected under 35 U.S.C 103 as being unpatentable over Vinyals in the view of Voss and Li, and further in view of Cho et al., “Empirical evaluation of gated recurrent neural networks on sequence modeling”.

Regarding claim 12, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein the encoder RNN and the decoder RNN are gated recurrent unit (GRU) neural networks.

Nevertheless, Cho teaches wherein the encoder RNN and the decoder RNN are gated recurrent unit (GRU) neural networks. (Cho, Discussion § 2: both LSTM unit and Gated Recurrent Unit are similar in a way, both can keep the existing content and add the new content on top of it. It is easy for each unit to remember the existence of a specific feature in the input stream for a long series of steps.)

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, as modified by the category embedding model and hierarchical embedding model, as taught by Li, to further include the Gated Recurrent Unit (GRU), as taught by Cho. The motivation would be GRU recurrent neural network are indeed better than more traditional recurrent units (e.g., LSTM) as convergence in CPU time may be reached faster and the final solutions tend to be better. (Cho, Abstract and Section 5). 


16.	Claim 13 is rejected under 35 U.S.C 103 as being unpatentable over Vinyals in the view of Voss and Li, and further in view of Chan (U.S Patent 9799327B1).

Regarding claim 13, the combination of Vinyals in view of Voss and Li teaches all of the limitations of claim 1, however, the combination does not distinctly disclose wherein a first input in the sequence is a designated start-of-sequence placeholder input. 

Nevertheless, Chan teaches wherein a first input in the source sequence is a designated start-of-sequence placeholder input. (Chan, 0008: In some implementations the generated sequence of substrings begins with a start of sequence token <sos> and ends with an end of sequence token <eos>. Also see [0064]). 

Before the effective filing date it would’ve been obvious to a person of ordinary skill in the art to modify the recurrent neural network for generating parse trees, as taught by Vinyals, as modified by the word embedding, taught by Voss, as modified by the category embedding model and hierarchical embedding model, as taught by Li,, to further include Chan’s neural network model to process an input segment from start to end. The motivation would be that before the system starts to emit its inputs it needs a token of some kind to start with. Furthermore, knowing if the initial position in the sequence is the initial position allows the system to know when to updating of the initial hidden state of the attention based RNN. (Chan, Col. 9, lines 30-34 and 45-55).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BEATRIZ RAMIREZ BRAVO whose telephone number is 571-272-2156. The examiner can normally be reached Mon. - Fri. 7:30a.m.-5:00p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV can be reached on 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/B.R.B./Examiner, Art Unit 2123                                                                                                                                                                                                        
/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123