Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Rational for not rejecting claims 15-19 under 35 USC 101
Claims 15-19 were not rejected under USC 101 as computer readable medium per signal because in section 0024 of applicant’s published specification, it is disclosure that “a computer readable storage medium as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Compact Prosecution
Examiner would like to suggest amending the independent claims 1, 15 and 20 to include the limitations “An annotation effort that is focused on a particular area of interest  and text samples that are likely to improve a text item recognition system. 
wherein the annotation includes a manual annotation of instances in a set  and a rule based Text item recognition system” .
                           These amendments will overcome the current rejection. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Rangarajan Sridhar et al (2022/0093088) in view of Bui (20210326371). The combination of references will be referred to as Sridhar in view of Bui in this correspondence. 
Claim 1, Sridhar discloses a computer-implemented method for managing a text-item recognition system,  (Sridhar: Section 0245, lines 5-6 encoding the received sentence received sentence reads on text item recognition) Sentence the method comprising: 
applying said system to a text corpus containing instances of text items to be recognized by the system; (Sridhar teaches in Section 0273, lines 9-11 a corpus that includes pair of sentences acquired for training a machine model) 
(Bui the secondary reference also teaches a semantic text searching (determining associations between semantic meanings of words reads on text item recognition system))  
a set of instances of text items which the system recognized; (Section 0273, lines 8-9- thus “in the semantic tasks separate training corpus may be employed” this means the pair of sentences from the corpus is acquired for training the model – see section 0273, lines 38-40) 
tokenizing the text corpus such that each instance in said set is encoded as a single token; (Section 0244, lines 5-7 “number of tokens” and lines 15-21 shows how the sentence breaks down into single tokens “number of N”) 
 processing the tokenized text via a word embedding scheme to generate a word embedding matrix (Section 0246, lines 18-20 metrics such cosine similarity, cosine distance Euclidean distance defines vector space) comprising vectors which indicate locations of respective tokens in a word embedding space; (Section 0245, lines 10-13- “token vector is embedded in a vector space”)
in response to selection of a seed token corresponding to an instance in said set, (Section 0260, lines 9-10- link or transition between a pair of sentences- one sentence is the token and the other is set sentence)
performing a nearest-neighbor search of an embedding space to identify a set of neighboring tokens for the seed token and for at least a subset of the neighboring token; (Section 0259, lines 21-24 approximate boundary (within the vector space) is determined – thus determining approximate boundary reads on “neighboring tokens”) 
identifying the text corresponding to each neighboring token as a potential instance of a text item to be annotated. (Section 0277, lines 19-25- the indicated pair of sentences reads on each neighboring tokens such as “A man and woman are driving down the street in a jeep” and “A man and a woman are driving down the road in an open air vehicle”- See Section 0309, lines 1-2)
(Section 0028 of the secondary reference (Bui et al. US20210326371) also discloses semantic relationship between two or more words which reads on identifying set of neighboring tokens: for example for “hill”
Neighboring tokens such as mountain and valley are identified) 
Sridhar does not disclose selecting from the text corpus a text item. 
Bui discloses a semantic text searching solution uses a machine learning system. (Section 0025, lines 2-8 “return (Select) portions of a document (text item) that directly match  the string entered by a user” or returning each instance of “turn”).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of picking a portion of the collection of text. The motivation is that the system will not rely on made up data but or other unreliable text instead use a tested text data.  
Claim 2, Sridhar in view of Bui discloses wherein said system provides a confidence value for each instance of a text item recognized by the system, the method including selecting instances of text items having confidence values above a threshold for inclusion in said set of instances. (Sridhar: Section 0309, lines 1-5- the selection of a sentence to be its corresponding second pair is based on one or more distance criteria- one or more distance threshold).  
Claim 3, Sridhar in view of Bui discloses wherein said system comprises a named-entity recognition system and said instances of text items comprise instances of named entities. (Sridhar: Section 0252, lines 3-16- thus based on the input sentence “You’re beautiful” a poem by William Shakespeare is retrieved from the database)
Claim 4, Sridhar in view of Bui discloses wherein said system comprises a machine learning model, (Sridhar: Section 0273, lines 17-23 “Supervised Machine learning”)  the method including for text identified as a said potential instance of a text item storing at least one fragment of the text corpus which includes that text as a text sample to be annotated. (Sridhar: Section 0274, lines 18- 20- thus the training sentences which predicts the semantic relationship)
Claim 5, Sridhar in view of Bui discloses including in response to annotation of a set of said text samples training the model on the set of annotated text samples. (Sridhar: Section 0275, lines 8-10- thus the training corpus includes a plurality of ordered pairs of sentences) 
Claim 6, Sridhar in view of Bui discloses  including extracting from the text corpus a plurality of seed text fragments each comprising a fragment of the text corpus which includes an instance corresponding to said seed token; (Sridhar Section 0245 lines 2-4 within a vector space of a language model to generate a vector for the received sentence) 
 generating a vector representation of each seed text fragment based on a word-embedding of the fragment in an embedding space; (Sridhar: Section 0245, lines 5-6 a sentence vector is generated, section 0026, lines 19-21 – thus embedding model) 
extracting from the text corpus a plurality of candidate text fragments each comprising a fragment of the text corpus which includes text corresponding to a neighboring token; (Sridhar: Section 0259, lines 18-30 semantic relationship between the vector embedding  of the first sentence and the sentence) 
generating a said vector representation of each candidate text fragment; computing, for each candidate text fragment, (Sridhar: Section 0268, lines 1-3 “word embedding model” reads on the vector representation) 
 a distance value indicative of distance between the vector representation of that candidate text fragment and the vector representations of the seed text fragments; (Sridhar: section 0262, lines 10-12 “Euclidean distance or cosine distance”  reads on the distance value)
and identifying the text corresponding to the neighboring token in each candidate text fragment with less than a threshold distance value as a said potential instance of a text item to be annotated. (Sridhar: Section 0309, lines 1-5- the selection of a sentence to be its corresponding second pair is based on one or more distance criteria- one or more distance threshold).  

Claim 7, Sridhar in view of Bui (Section 0061, lines 10-30 threshold similarity) discloses wherein said system provides a confidence value for each instance of a text item recognized by the system, (Section 0309, lines 4-6 “distance criteria (e.g one or more distance threshold”))
 the method including selecting instances of text items having a confidence value above a first threshold for inclusion in said set of instances; (Sridhar: Section 0309, lines 2-3 “the second sentence vector may be selected from the plurality of other sentence vectors – thus from the corpus of vector) 
 and selecting said candidate text fragments from text fragments which include text, corresponding to a neighboring token, (Bui: Section 0025, lines 2-8) that the system recognized as an instance of a text item with a confidence value between the first threshold and a second lower threshold. (Sridhar: Section 0309, lines 2-8 “second sentence vector are selected from plurality of other sentence vectors based on one or more distance criteria (one or more distance threshold) and a distance between the first and second sentence vectors)) 
Claim 8, Sridhar in view of Bui discloses wherein said system comprises a machine learning model, the method including storing each candidate text fragment with less than a threshold distance value as a text sample to be annotated. (Sridhar: Section 0308, lines corpus of sentences means the text of sentences are stored in a database for it to be selected based on one or more distance between first sentence vector and second sentence) 
Claim 9, Sridhar in view of Bui discloses including, in response to annotation of a set of said text samples, training the model on the set of annotated text samples. (Sridhar: Section 0255, lines 1-3 “pre-trained word or token embedding model” which is the sentence vector also refer to as language model) 
Claim 10, Sridhar in view of Bui  (Section 0032, lines 1-3- “remove duplicate tokens) discloses including generating said vector representation of each seed text fragment after removing the instance corresponding to the seed token in that fragment;  (Sridhar: Section 0281, loss function reads on the text that are considered to below the threshold and therefore a sentence embeddings are generated in section 0281 lines 1-3) 
and generating said vector representation of each candidate text fragment after removing the text corresponding to the neighboring token in that fragment. (Sridhar: Section 0282, lines 1-3 “generating sentence embedding and employing those sentence”)
Claim 11, Sridhar in view of Bui discloses including generating said vector representation of a fragment by supplying that fragment to a pretrained word-embedding model. (Sridhar: Section 0255, lines 1-3 “pre-trained word or token embedding model” which is the sentence vector also refer to as language model) 

Claim 12, Sridhar in view of Bui discloses including providing a user interface for user-selection of said seed token via selection of an instance from said set of instances of text items which the system recognized. (Sridhar: Section 0233, lines 1-3 a second sentence (from the corpus of sentence may be selected for recognition)  
Claim 13, Sridhar in view of Bui discloses including, in response to user- selection of a plurality of seed tokens via the user interface, (Bui: Section 0063, lines 1-4 User Interface shown in Fig. 6A and 6B) 
performing said nearest-neighbor search, and identifying text corresponding to at least said subset of the neighboring tokens identified by that search, for each of those seed tokens. (Sridhar: Section 0309, lines 6-7 “nearest neighbors” in the vector space when considering all the sentence vector in the plurality of other sentence vectors) 
Claim 14, Sridhar in view of Bui discloses including after performing the nearest-neighbor search for the seed token to identify said set of neighboring tokens, (Sridhar: Section 0309, lines 6-7 “nearest neighbors” in the vector space when considering all the sentence vector in the plurality of other sentence vectors)
displaying in the interface the text corresponding to each neighboring token for user-selection of a restricted set of neighboring tokens. (Bui: Section 0063, lines 1-4 User Interface shown in Fig. 6A and 6B) 
Claim 15, Sridhar discloses a  computer program product for managing a text-item recognition system, (Sridhar: Section 0245, lines 5-6 encoding the received sentence received sentence reads on text item recognition)
 the computer program product comprising a computer readable storage medium having program instructions embodied therein, (CPU (S) 410 shown in Fig. 4 includes a program instructions) 
the program instructions being executable by a computing apparatus to cause the computing apparatus to:
 apply said system to a text corpus containing instances of text items to be recognized by the system; (Sridhar teaches in Section 0273, lines 9-11 a corpus that includes pair of sentences acquired for training a machine model) 
(Bui the secondary reference also teaches a semantic text searching (determining associations between semantic meanings of words reads on text item recognition system))  
a set of instances of text items which the system recognized; (Section 0273, lines 8-9- thus “in the semantic tasks separate training corpus may be employed” this means the pair of sentences from the corpus is acquired for training the model – see section 0273, lines 38-40) 
tokenize the text corpus such that each instance in said set is encoded as a single token; (Section 0244, lines 5-7 “number of tokens” and lines 15-21 shows how the sentence breaks down into single tokens “number of N”) 
 process the tokenized text via a word embedding scheme to generate a word embedding matrix (Section 0246, lines 18-20 metrics such cosine similarity, cosine distance Euclidean distance defines vector space) comprising vectors which indicate locations of respective tokens in a word embedding space; (Section 0245, lines 10-13- “token vector is embedded in a vector space”)
 in response to selection of a seed token corresponding to an instance in said set, (Section 0260, lines 9-10- link or transition between a pair of sentences- one sentence is the token and the other is set sentence)

 perform a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token; (Section 0259, lines 21-24 approximate boundary (within the vector space) is determined – thus determining approximate boundary reads on “neighboring tokens”) 
and for at least a subset of the neighboring tokens, identify the text corresponding to each neighboring token as a potential instance of a text item to be annotated for improving operation of the system. (Section 0277, lines 19-25- the indicated pair of sentences reads on each neighboring tokens such as “A man and woman are driving down the street in a jeep” and “A man and a woman are driving down the road in an open air vehicle”- See Section 0309, lines 1-2)
(Section 0028 of the secondary reference (Bui et al. US20210326371) also discloses semantic relationship between two or more words which reads on identifying set of neighboring tokens: for example for “hill”
Neighboring tokens such as mountain and valley are identified) 
Sridhar does not disclose selecting from the text corpus a text item. 
Bui discloses a semantic text searching solution uses a machine learning system. (Section 0025, lines 2-8 “return (Select) portions of a document (text item) that directly match  the string entered by a user” or returning each instance of “turn”).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of picking a portion of the collection of text. The motivation is that the system will not rely on made up data but or other unreliable text instead use a tested text data.  
Claim 15, Sridhar discloses A computer program product for managing a text-item recognition system, (Sridhar: Section 0245, lines 5-6 encoding the received sentence received sentence reads on text item recognition)
 the computer program product comprising a computer readable storage medium having program instructions embodied therein, (CPU (S) 410 shown in Fig. 4 includes a program instructions) 
 the program instructions being executable by a computing apparatus to cause the computing apparatus to apply said system to a text corpus containing instances of text items to be recognized by the system; (Sridhar teaches in Section 0273, lines 9-11 a corpus that includes pair of sentences acquired for training a machine model) 
(Bui the secondary reference also teaches a semantic text searching (determining associations between semantic meanings of words reads on text item recognition system))  

a set of instances of text items which the system recognized; (Section 0273, lines 8-9- thus “in the semantic tasks separate training corpus may be employed” this means the pair of sentences from the corpus is acquired for training the model – see section 0273, lines 38-40) 
tokenize the text corpus such that each instance in said set is encoded as a single token; (Section 0244, lines 5-7 “number of tokens” and lines 15-21 shows how the sentence breaks down into single tokens “number of N”) 
process the tokenized text via a word embedding scheme to generate a word embedding matrix (Section 0246, lines 18-20 metrics such cosine similarity, cosine distance Euclidean distance defines vector space)  comprising vectors which indicate locations of respective tokens in a word embedding space; (Section 0245, lines 10-13- “token vector is embedded in a vector space”)
in response to selection of a seed token corresponding to an instance in said set, (Section 0260, lines 9-10- link or transition between a pair of sentences- one sentence is the token and the other is set sentence)

 perform a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token and for at least a subset of the neighboring tokens, (Section 0259, lines 21-24 approximate boundary (within the vector space) is determined – thus determining approximate boundary reads on “neighboring tokens”) 
 identify the text corresponding to each neighboring token as a potential instance of a text item to be annotated for improving operation of the system. (Section 0277, lines 19-25- the indicated pair of sentences reads on each neighboring tokens such as “A man and woman are driving down the street in a jeep” and “A man and a woman are driving down the road in an open air vehicle”- See Section 0309, lines 1-2)
(Section 0028 of the secondary reference (Bui et al. US20210326371) also discloses semantic relationship between two or more words which reads on identifying set of neighboring tokens: for example for “hill”
Neighboring tokens such as mountain and valley are identified) 
Sridhar does not disclose selecting from the text corpus a text item. 
Bui discloses a semantic text searching solution uses a machine learning system. (Section 0025, lines 2-8 “return (Select) portions of a document (text item) that directly match  the string entered by a user” or returning each instance of “turn”).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of picking a portion of the collection of text. The motivation is that the system will not rely on made up data but or other unreliable text instead use a tested text data.
Claim 16, Sridhar in view of Bui discloses wherein said system provides a confidence value for each instance of a text item recognized by the system, said program instructions being adapted to cause the computing apparatus to select instances of text items having confidence values above a threshold for inclusion in said set of instances. (Sridhar: Section 0309, lines 1-5- the selection of a sentence to be its corresponding second pair is based on one or more distance criteria- one or more distance threshold).  

Claim 17, Sridhar in view of Bui discloses wherein said program instructions are adapted to cause the computing apparatus to:
 extract from the text corpus a plurality of seed text fragments each comprising a fragment of the text corpus which includes an instance corresponding to said seed token; (Sridhar Section 0245 lines 2-4 within a vector space of a language model to generate a vector for the received sentence) 
(Bui the secondary reference teaches in Section 0025, lines 2-8 “return (Select) portions of a document (text item” that the entered text string is selected from portions of an electronic document)
generate a vector representation of each seed text fragment based on a word-embedding of the fragment in an embedding space; (Sridhar: Section 0245, lines 5-6 a sentence vector is generated, section 0026, lines 19-21 – thus embedding model) 
extract from the text corpus a plurality of candidate text fragments each comprising a fragment of the text corpus which includes text corresponding to a neighboring token; (Sridhar: Section 0259, lines 18-30 semantic relationship between the vector embedding  of the first sentence and the sentence) 

generate a said vector representation of each candidate text fragment; compute, for each candidate text fragment, (Sridhar: Section 0268, lines 1-3 “word embedding model” reads on the vector representation) 
a distance value indicative of distance between the vector representation of that candidate text fragment and the vector representations of the seed text fragments; (Sridhar: section 0262, lines 10-12 “Euclidean distance or cosine distance”  reads on the distance value)
and identify the text corresponding to the neighboring token in each candidate text fragment with less than a threshold distance value as a said potential instance of a text item to be annotated. (Sridhar: Section 0309, lines 1-5- the selection of a sentence to be its corresponding second pair is based on one or more distance criteria- one or more distance threshold).  

Claim 18, Sridhar in view of Bui (Section 0061, lines 10-30 threshold similarity)  discloses wherein said system provides a confidence value for each instance of a text item recognized by the system, (Sridhar: Section 0309, lines 4-6 “distance criteria (e.g one or more distance threshold”))
said program instructions being further adapted to cause the computing apparatus to select instances of text items having a confidence value above a first threshold for inclusion in said set of instances; (Sridhar: Section 0309, lines 2-3 “the second sentence vector may be selected from the plurality of other sentence vectors – thus from the corpus of vector) 
and select said candidate text fragments from text fragments which include text, corresponding to a neighboring token (Bui: Section 0025, lines 2-8) that the system recognized as an instance of a text item with a confidence value between the first threshold and a second, lower threshold. (Sridhar: Section 0309, lines 2-8 “second sentence vector are selected from plurality of other sentence vectors based on one or more distance criteria (one or more distance threshold) and a distance between the first and second sentence vectors)) 
Claim 19, Sridhar in view of Bui discloses wherein said system comprises a machine learning model, said program instructions being further adapted to cause the computing apparatus to store each candidate text fragment with less than a threshold distance value as a text sample to be annotated; (Sridhar: Section 0308, lines corpus of sentences means the text of sentences are stored in a database for it to be selected based on one or more distance between first sentence vector and second sentence) 
 and in response to annotation of a set of said text samples, train the model on the set of annotated text samples. (Sridhar: Section 0255, lines 1-3 “pre-trained word or token embedding model” which is the sentence vector also refer to as language model) 

Claim 20, Sridhar discloses a computing apparatus for managing a text-item recognition system, (Sridhar: Section 0245, lines 5-6 encoding the received sentence received sentence reads on text item recognition) the apparatus comprising memory for storing a text corpus containing instances of text items to be recognized by the system, (Sridhar teaches in Section 0273, lines 9-11 a corpus that includes pair of sentences acquired for training a machine model) 
(Bui the secondary reference also teaches a semantic text searching (determining associations between semantic meanings of words reads on text item recognition system))
and control logic adapted to apply said system to the text corpus; (CPU (S) 410 shown in Fig. 4 includes a program instructions) 

 a set of instances of text items which the system recognized; (Section 0273, lines 8-9- thus “in the semantic tasks separate training corpus may be employed” this means the pair of sentences from the corpus is acquired for training the model – see section 0273, lines 38-40) 

tokenize the text corpus such that each instance in said set is encoded as a single token; (Section 0244, lines 5-7 “number of tokens” and lines 15-21 shows how the sentence breaks down into single tokens “number of N”) 

 process the tokenized text via a word embedding scheme to generate a word embedding matrix (Section 0246, lines 18-20 metrics such cosine similarity, cosine distance Euclidean distance defines vector space) comprising vectors which indicate locations of respective tokens in a word embedding space; (Section 0245, lines 10-13- “token vector is embedded in a vector space”)

in response to selection of a seed token corresponding to an instance in said set, (Section 0260, lines 9-10- link or transition between a pair of sentences- one sentence is the token and the other is set sentence)
 perform a nearest-neighbor search of the embedding space to identify a set of neighboring tokens for the seed token and for at least a subset of the neighboring tokens, (Section 0259, lines 21-24 approximate boundary (within the vector space) is determined – thus determining approximate boundary reads on “neighboring tokens”) 
 identify the text corresponding to each neighboring token as a potential instance of a text item to be annotated for improving operation of the system. (Section 0277, lines 19-25- the indicated pair of sentences reads on each neighboring tokens such as “A man and woman are driving down the street in a jeep” and “A man and a woman are driving down the road in an open air vehicle”- See Section 0309, lines 1-2)
(Section 0028 of the secondary reference (Bui et al. US20210326371) also discloses semantic relationship between two or more words which reads on identifying set of neighboring tokens: for example for “hill”
Neighboring tokens such as mountain and valley are identified) 
Sridhar does not disclose selecting from the text corpus a text item. 
Bui discloses a semantic text searching solution uses a machine learning system. (Section 0025, lines 2-8 “return (Select) portions of a document (text item) that directly match  the string entered by a user” or returning each instance of “turn”).
Therefore it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to include the teaching of picking a portion of the collection of text. The motivation is that the system will not rely on made up data but or other unreliable text instead use a tested text data.  

	Cited Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Gandhi (US20190325029) discloses a computer-implemented method for processing a document, where the method includes: a computer-implemented method for processing a document, where the method includes accessing a digital representation of the document; converting the digital representation into a document vector; applying a transformation matrix to the document vector to produce a transformed document vector, wherein the transformation matrix is determined by comparing a set of document vectors generated from a corpus of documents to a corresponding set of transformed document vectors, and further, wherein for each document vector in the set of document vectors.
Danielyan (US20180267958) discloses an example method for information extraction from logical document parts using ontology-based micro-models may comprise: identifying, in a natural language text, a logical part associated with a pre-defined category; performing a lexical analysis of a plurality of words comprised by the logical part of the natural language text to produce a plurality of lexical structures representing the logical part of the natural language text, wherein each lexical structure identifies a lexical meaning and a semantic class associated with a referenced word of the plurality of words; identifying an information extraction micro-model associated with the pre-defined category, the information extraction micro-model comprising a set of production rules associated with an ontology;

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Akwasi M Sarpong whose telephone number is (571)270-3438. The examiner can normally be reached Mon-Fri. 8:00am-4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, KING D POON can be reached on 571-272-7440. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





	
/AKWASI M SARPONG/Primary  Examiner, Art Unit 2675                                                                                                                                                                                                        12/14/2022