DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	This non-final action is responsive to the application filed on 5/22/20.
	Claims 1-20 are pending. 

Allowable Subject Matter
Claims 5 and 15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-4, 12-14, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal et al. (US 20200311473, Herein “Agarwal”) in view of Dumchev et al. (US 20160117954, Herein “Dumchev”) in view of Levit et al. (US 20150325235, Herein “Levit”).
Regarding claim 1, Agarwal teaches A system comprising:
a set of processing units; and
anon-transitory machine-readable medium storing instructions that when executed by at least one processing unit in the set of processing units cause the at least one processing unit (computer system ([0017] and [0018]; [0071])) to:
receive a set of data for training a transformer model (training using tokenization [0085]) for training of sentence dat ([0017] to [0028] and [0082]), the set of data comprising a sequence of tokens and a set of position values (coordinates [0085]. 

However, Agarwal fails to specifically teach wherein each position value in the set of position values represents a position of a token in the sequence of tokens relative to other tokens in the sequence of tokens.
Yet, in a related art, Dumchev discloses metadata including the position of each token in each sentence ([0007] and [0039]).
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the positioning of token of Dumchev with the token analysis of Agarwal to have wherein each position value in the set of position values represents a position of a token in the sequence of tokens relative to other tokens in the sequence of tokens. The combination would allow for, according to the motivation of Dumchev, using token position values [0040] for better determining the position of the various individual tokens within sentences, such as is the case when displaying tokens of sentences to the users, and further for identifying for processing specific tokens within respective sentences, such as used by way of start and end offsets of tokens within the respective sentences [0039].   

However, Agarwal in view of Dumchev fails to specifically teach replace each token in the subset of the sequence of tokens with a first defined value to form a first set of defined value.
Yet, in a related art, Levit discloses token/phrase subsets of a docups such as “I’d+like+to” and “amc sixteen,” each determined to be tokens of the corpus [0019], such that a determined subset including “up in the air” and “amc sixteen” determined to represent replaceable data, such as replacing, respectively, by “<movie>” and “<theater>” [0019].
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the replacing using token analysis of Levit with the token sequence processing of Agarwal in view of Dumchev to have replace each token in the subset of the sequence of tokens with a first defined value to form a first set of defined value. The combination would allow for, according to the motivation of Levit, better processing text sequences based on token modeling for the purpose of having control over the text sequences, such as by rendering alternative representations of the parsed, tokenized text, performed using position data for replacing certain content with annotated or additional content (i.e., tokens) ([0001] to [0004]). 
Furthermore, Levit teaches or makes abundantly clear:
receive a set of data for training a transformer model (training corpus (fig. 1); [0054]), the set of data comprising a sequence of tokens and a set of position values (determined tokens such as “I’d+like+to” and “amc+theater” each representing specific positions within the corpus (e.g., respective sentence) [0019]), wherein each position value in the set of position values represents a position of a token in the sequence of tokens relative to other tokens in the sequence of tokens (using the replacement entity at the same position as the original, training data of the corpus, such as replacing “<theater>” for “amc theater” at the corresponding position);
select a subset of the sequence of tokens in the set of data (tokens for replacement, such as “upin the air” and “amc sixteen” [0019]).
select a subset of the set of position values in the set of data (position value associated with the respective token, such as the position value associated with “up in the air” in a given sentence [0019]);
replace each position value in the subset of the set of position values with a second defined value to form a second set of defined values (<theater”, for instance, is replaced for the position information associated with “up in the air,” and, as such, the word positions for each of the words of “up in the air” taking up 4 words is replaced for the wingle entity representing one position count for <movie> [0019]); and
train the transformer model using the set of data (iterative training [0017]).

Regarding claim 2, Agarwal in view of Dumchev in view of Levit teaches the limitations of claim 1, as above.
Furthermore, Levit teaches The system of claim 1, wherein training the transformer model comprises:
determining a token embedding for each token in the sequence of tokens (token sequences, each token including a specific word subset, such as “Angelina Jolie” or “Angelina” and “Jolie” (([0016] and [0017]; further, determined phrase, word, and entity tokens [0019])); and
determining a position embedding for each position value in the set of position values (positions of each token with respect to other tokens [0019] such as token proximity or distance ([0020] and [0039]); even further, positions within specific sequences of tokens ([0016] to [0019], [0028], and [0036] to [0053])).

Regarding claim 3, Agarwal in view of Dumchev in view of Levit teaches the limitations of claims 1 and 2, as above.
Furthermore, Levit teaches The system of claim 2, wherein the set of data further comprises a set of sentence values, where each sentence value in the set of sentence values represents a sentence to which a token in the sequence of tokens belongs (sentence identification for identifying respective parsed tokens, such as may be used in identifying phrases/tokens with respective sentences, such as with respect to corresponding probabilities ([0051] and [0052])), wherein training the transformer model comprises:
determining a sentence embedding for each sentence value in the set of sentence values (a probability associated with a respective token embedded within a given sentence ([0050] to [0052])); and
for each token in the sequence of tokens, adding together the token embedding associated with the token, the position embedding associated with the token, and the sentence embedding associated with the token to form an aggregate embedding for the token (aggregated tokens forming a sentence within a respective sentence of the various sentence variations for the respective sentence [0049], the aggregation including the various statistics and values, such as with respect to the aggregated tokens of “I’d+like+to follow <actor>” [0053]).

 Regarding claim 4, Agarwal in view of Dumchev in view of Levit teaches the limitations of claims 1-3, as above.
Furthermore, Dumchev teaches The system of claim 3, wherein the instructions further cause the at least one processing unit to add a set of labels to the set of data, wherein each label in the set of labels comprises a position value in the subset of position values replaced by the second defined value (metadata for token position value ([0007] and [0039])).

Regarding claim 11, the claim recites similar limitations as claim 1 – see above. 

Regarding claim 12, the claim recites similar limitations as claim 2 – see above.

Regarding claim 13, the claim recites similar limitations as claim 3 – see above.

Regarding claim 14, the claim recites similar limitations as claim 4 – see above.

Regarding claim 20, Agarwal teaches A non-transitory machine-readable medium storing a program executable by at least one processing unit of a computer system, the program comprising sets of instructions (compute ([0004] and [0071])) for:
The claim recites similar limitations as claim 1 – see above.


Claim(s) 6, 7, 16, and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Dumchev in view of Levit and further in view of Pringle et al. (US 6,470,306, Herein “Pringle”).
Regarding claim 6, Agarwal in view of Dumchev in view of Levit teaches the limitations of claim 1, as above.
Furthermore, Levit teaches The system of claim 1, wherein the instructions further cause the at least one processing unit to: group the subset of the sequence of tokens and the position values associated with the subsequence of tokens together; and rearrange the grouped subset of the sequence of tokens and the position values associated with the subsequence of tokens within the set of data (rearranging tokens such as by replacing “amc sixteen” with the “<theater>” and aligning the adjacent words “in” and “and” with the rearranged “<theater>” [0019]).
	
	However, Agarwal in view of Dumchev in view of Levit fails to specifically teach rearrange the grouped subset of the sequence of tokens and the position values associated with the subsequence of tokens within the set of data.
	Yet, in a related art, Pringle discloses rearranging tokens based on token positions, such as with respect to a plriality of sentences (col. 4, lines 53-67; col. 5).
	It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the rearrange grouped subset and position values of Pringle with the token processing based on positioning of Agarwal in view of Dumchev in view of Levit to have rearrange the grouped subset of the sequence of tokens and the position values associated with the subsequence of tokens within the set of data. The combination would allow for, according to the motivation of Pringle, based on a trained data such as a document in a first language, creating a second configuration of tokens by allowing the opportunity to rearrange the tokens so that, for instance, additional data may be inserted with respect to the determined positions of the various tokens so that a more meaningful (e.g., translated and annotated) text can be created for the user based on known positions of the respective tokens (cols. 4 and 5); further, tokenization and positioning providing enhanced control over formatting of tokens (col. 1, lines 22-67; col. 2, lines 1-34)).   

Regarding claim 7, Agarwal in view of Dumchev in view of Levit in view of Pringle teaches the limitations of claims 1 and 6, as above.
Furthermore, Levit teaches The system of claim 6, wherein the instructions further cause the at least one processing unit to:
group the subset of position values and the tokens associated with the subset of position values together (positions of tokens determined within respective sentence [0019]); and
rearrange the grouped subset of position values and the tokens associated with the subset of position values within the set of data (rearrange such as by inserting new, generic tokens such as <movie> [0019]).

	Furthermore, Pringle makes abundantly clear rearranging the tokens such as by inserting certain content at determined positions within the grouped subset of tokens and corresponding positions associated with the adjusted tokens and corresponding positions (col. 3, lines 15-67)). 

Regarding claim 16, the claim recites similar limitations as claim 6 – see above.

Regarding claim 17, the claim recites similar limitations as claim 7 – see above.


Claim(s) 8, 9, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Dumchev in view of Levit and in view of O’Kane et al. (US 20150019662, Herein “O’Kane”). 
Regarding claim 8, Agarwal in view of Dumchev in view of Levit teaches the limitations of claim 1, as above.
However, Agarwal in view of Dumchev in view of Levit fails to specifically teach The system of claim 1, wherein a number of tokens in the subsequence of tokens is a first defined percentage of a number tokens in the sequence of tokens, wherein a number of position values in the subset of position values is a second defined percentage of a number of position values in the set of position values.
Yet, in a related art, O’Kane discloses a percentage of tokens for replacement [0127] and further a position value controlling position information such as for transforming into shorter tokens of transform just specifically positional token data ([0128] to [0131]). 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the percentage processing of tokens of O’Kane with the token processing of subsets of Agarwal in view of Dumchev in view of Levit to have wherein a number of tokens in the subsequence of tokens is a first defined percentage of a number tokens in the sequence of tokens, wherein a number of position values in the subset of position values is a second defined percentage of a number of position values in the set of position values. The combination would allow for, according to the motivation of O’Kane, controlling the extent to which tokens are used within the analysis such as with respect to a controlled minimum and maximum amount for the percentage of tokens, and further for controlling the token processing with respect to positions of tokens, such as for controlling transformation of certain tokens of various positions (e.g., all, first, variable positions, etc.) ([0119] to [0131]), thus allowing for finer control over token processing, such as for personalizing a token message ([0006] and [0007]).  

Regarding claim 9, Agarwal in view of Dumchev in view of Levit in view of O’Kane teaches the limitations of claims 1 and 8, as above.
Furthermore, O’Kane teaches The system of claim 8, wherein the first defined percentage is different than the second defined percentage (percentage token versus positional data ([0127] to [0131])).

Regarding claim 18, the claim recites similar limitations as claims 8 and 9 – see above.


Claim(s) 10 and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Agarwal in view of Dumchev in view of Levit and in view of Kumar (US 8,892,422).
Regarding claim 10, Agarwal in view of Dumchev in view of Levit teaches the limitations of claim 1, as above.
However, Agarwal in view of Dumchev in view of Levit fails to specifically teach The system of claim 1, wherein training the transformer model further comprises generating a P x H matrix of probabilities, wherein M is a total number of masked positions, H is a total number of tokens in the sequence of tokens, and an (Pi,H/) element in the P x H matrix stores a probability of a masked position  being in a position in the sequence.
Yet, in a related art, Kumar discloses  matrix for probabilities for each position (col. 9, lines 53-62) 
It would have been obvious to one of ordinary skill in the art at the time of the invention’s effective filing date to combine the matrix of Kumar with the token sequence analysis based on training of Agarwal in view of Dumchev in view of Levit to have wherein training the transformer model further comprises generating a P x H matrix of probabilities, wherein M is a total number of masked positions, H is a total number of tokens in the sequence of tokens, and an (Pi,H/) element in the P x H matrix stores a probability of a masked position  being in a position in the sequence. The combination would allow for, according to the motivation of Kumar, using a training set to determine a word or phrase likelihood based on knowledge of positions of the respective words or phrases within the dataset, thus providing for a better ability to identify phrases in a document based on the positional matrix (col. 1, lines 6-67; col. 2, lines 1-67; col. 3, lines 1-67; col. 4, lines 1-44).  

Regarding claim 19, the claim recites similar limitations as claim 10 – see above.

Conclusion
Other art made of record: In view of Svyatkovskiy et al. (US 20210034335, Herein “Svyatkovskiy”).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JASON EDWARDS whose telephone number is (571) 272-5334. The examiner can normally be reached on Mon-Fri; 8am-5pm EST.
	If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Scott Baderman can be reached on 571-272-3644. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
	Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance form a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA or CANADA) or 571-272-1000.

	/JASON T EDWARDS/              Examiner, Art Unit 2144