DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
	Claims 1-20 are presented for examination.

Continued Examination under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on June 30, 2022 has been entered.
 
Response to Amendment
	Applicant’s amendment has obviated the remaining rejections under 35 USC § 101.  Therefore, those rejections are withdrawn.
	Applicant’s amendment has further obviated most, but not all, of the objections to the specification, drawings, and claims previously made.  To the extent that an objection or rejection appears in the previous Office Action(s) but not this Office Action, that objection or rejection is withdrawn.  To the extent that is appears both in a previous Office Action(s) and this Office Action, the objection or rejection is maintained.

Specification
The disclosure is objected to because of the following informalities:
In paragraph 3, “the software library” should be “the TENSORFLOW software library”.
In paragraph 34, “how similar or dissimilar are trainable models 141-142 are” should be “how similar or dissimilar trainable models 141-142 are”.
In paragraph 44, “what feature inputs do trainable models … expect” should be “what feature inputs trainable models … expect”; “trainable models … are” should be simply “trainable models”.
In paragraphs 128-29, “are their inferences … are” should be “their inferences … are”.
Appropriate correction is required.

Claim Objections
Claim 15 is objected to because of the following informalities: “probabilities and;” should be “probabilities; and”.  Appropriate correction is required.

Claim Rejections - 35 USC § 103
The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
Claims 1-3, 5, 13-16, and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Vajda et al. (US 20190172224) (“Vajda”) in view of Vantrease et al. (US 10990650) (“Vantrease”) and further in view of Grayson (US 10949432) (“Grayson”).
Regarding claim 1, Vajda discloses “[a] method for predicting online user behavior, the method comprising a trainable tensor transformer performing: 
in a training mode, hosting a plurality of trainable models (machine learning model may include several high-level components, including a backbone neural network, a region proposal network (RPN), a keypoint head, a detection head, and a segmentation head; each of these components may be configured as a neural network [trainable model] – Vajda, paragraph 35; see also paragraphs 40, 49 (describing the training of the keypoint head and the RPN, so that the components may be hosted in a training mode)); 
in the training mode, training the plurality of trainable models using at least one of a plurality of different training techniques or a plurality of different subsets of training data (during training, each training image sample may be used to train the RPN and the trunk to obtain candidate regions of interest (RoI); these regions may then be used to train the trunk and the various heads; for example, the detection head may be trained to select RoI candidates likely to contain the object of interest, whereas the segmentation head may process the feature map associated with each RoI, generate a segmentation mask, compare the generated mask with the ground truth, and use the computed errors to update the network via backpropagation [i.e., the detection head and the segmentation head are trained using different training techniques]; the different heads may be trained in parallel – Vajda, paragraph 56); 
in an inference mode, … (ii) by executing a mapping of input tensors to converted tensors (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., there is a mapping between the input tensor and the converted tensor, such that the dimensionality of the converted tensor has been reduced by one] – Vajda, paragraph 66), converting the plurality of input tensors of the input record into a plurality of converted tensors (to avoid performance issues, a technique that transforms [converts] a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor – Vajda, paragraph 66), wherein a tensor of the plurality of converted tensors represents a feature of a plurality of features in a different format from its corresponding input tensor (to avoid performance issues, a technique that transforms [converts] a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the converted tensor is in 3D format, which is different from the 4D format of the input tensor] – Vajda, paragraph 66), wherein the different format is capable of being processed by the plurality of trainable models (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the converted tensor represents the feature maps of the RoIs] – Vajda, paragraph 66; see also paragraph 56 (discussing the processing of the RoIs by the heads during training)); 
applying the plurality of trainable models to respective subsets of the plurality of converted tensors (after generating a combined regional feature map by combining the regional feature maps into one [i.e., converting the tensor], the combined regional feature map [subset of the plurality of converted tensors, since a separate combined tensor is produced for each image; see Fig. 6, ref. char. 610] is processed [i.e., each of the heads is applied] using one or more convolutional layers to generate another combined regional feature map – Vajda, paragraphs 74-76); 
generating an inference for the input record that is a collective prediction of the plurality of trainable models (after performing convolution on the combined feature map, the system generates information associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) based on a portion of the second combined regional feature map; for each RoI, the system may identify a region in the second combined regional feature map that corresponds to that RoI; that region may be used to generate the desired output [inference], such as an instance segmentation mask, a keypoint mask, a bounding box, or a classification – Vajda, paragraph77 and Fig. 6, ref. chars. 660-670; machine learning model is configured to output an object detection indicator (coordinates of a bounding box surrounding a person), keypoints (representing the pose of a detected person), and/or segmentation mask (identifying pixels that correspond to the detected person) [i.e., the collective prediction of the models is that the person is within a certain bounding box, is making a certain pose, and/or corresponds to certain pixels of the image] – id. at paragraph 34); [and]
converting the inference into a prediction tensor (after performing convolution on the combined feature map, the system generates information associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) based on a portion of the second combined regional feature map; for each RoI, the system may identify a region in the second combined regional feature map that corresponds to that RoI; that region may be used to generate the desired output [inference], such as an instance segmentation mask, a keypoint mask, a bounding box, or a classification – Vajda, paragraph 77 and Fig. 6, ref. chars. 660-670; the segmentation mask may be represented as a matrix or grid of binary values [i.e., a two-dimensional tensor] that indicate whether a corresponding pixel belongs to a detected instance of the object – id. at paragraph 56)….”
Vajda appears not to disclose explicitly the further limitations of the claim.  However, Vantrease discloses “storing the prediction tensor and the plurality of input tensors into a plurality of output tensors of a respective output record for the input record (memory may be configured to store instructions, input data sets, and weights, and may also be configured to store outputs of the neural network processor that may be used by the host device to make predictions about an input image – Vantrease, col. 13, l. 57-col. 14, l. 4; computations may be performed for multi-dimensional arrays or tensors with multi-dimensional paddings – id. at col. 12, ll. 33-52 [i.e., the output and input data stored may be in the form of tensors]).”
Vantrease and the instant application both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vajda to store the output and the input tensors into an output record, as disclosed by Vantrease, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the results of the models are not lost so that they may be used as inputs in subsequent, downstream processing.  See Vantrease, col. 13, l. 57-col. 14, l. 4 and col. 6, ll. 9-28.
Neither Vajda nor Vantrease appears to disclose explicitly the further limitations of the claim.  However, Grayson discloses “(i) determining a plurality of input tensors associated with an input record, wherein the plurality of input tensors comprises at least one user tensor that represents profile data of at least one user of an online system (in a method for presenting recommended content to a user of a software application, a system requests, from a predictive model, predictive scores for a set of content items; the predictive model may be trained using vectors [input tensors] that include, inter alia, user activity history within the software application [profile data of the user] – Grayson, col. 2, ll. 38-63), at least one artifact tensor that represents at least one artifact available to the at least one user through the online system (in a method for presenting recommended content to a user of a software application, a system requests, from a predictive model, predictive scores for a set of content items; the predictive model may be trained using vectors [input tensors] that include, inter alia, content interaction history recorded for a plurality of users of the software application [content = artifact] – Grayson, col. 2, ll. 38-63), and at least one event tensor that represents an interaction by the at least one user in response to an artifact (user vector may comprise a pair of vectors, one of which represents temporally ordered user activity prior to execution of a triggering event and the other of which represents information about the content viewed subsequent to the triggering event; the user vector may map the list of events executed prior to the triggering event to content items viewed subsequent to execution of the triggering event [the mapping, which is part of the user vector, represents an interaction by the user in response to a content item] – Grayson, col. 19, ll. 16-31), … [and]
generating an inference … regarding the user’s online behavior (system requests, from a predictive model, predictive scores for a set of content items based on the obtained user activity history prior to execution of the triggering event; the predictive scores indicate a likelihood that each content item would be relevant to the user based on the obtained user activity history prior to execution of the triggering event [i.e., a high predictive score represents an inference that the content item is relevant to the user based on the user’s prior behavior] – Grayson, col. 2, ll. 38-63; see also col. 1, ll. 18-38 (disclosing that the software application may be deployed as a web application accessible over the internet, such that the user’s interaction with respect to the software application represents online behavior))….”
Grayson and the instant application both relate to machine learning-based prediction of user behavior and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease to allow the system to generate an inference about online user behavior based on user tensors, artifact tensors, and event tensors, as disclosed by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would make the most relevant content items for the user more readily identifiable and allow the system to deliver relevant content to the user based on a specific user context.  See Grayson, col. 2, ll. 27-34.

Regarding claim 2, Vajda, as modified by Vantrease and Grayson, discloses “converting, by the trainable tensor transformer, for each training record of a plurality of training records, a plurality of training tensors of the training record into a second plurality of converted tensors, wherein each converted tensor of the second plurality of converted tensors represents a respective feature of the plurality of features (to improve performance, a technique may be used that transforms a 4D tensor (i.e., the feature maps of the RoIs) into a 3D tensor so that a single optimized convolution operation can be performed on all the feature maps [i.e., the tensors representing the RoIs each represent a set of features]; for instance, three three-dimensional feature maps of the RoIs may be tiled together to form on large 3D tensor – Vajda, paragraph 66; see also Fig. 5B (showing the concatenation of feature maps 501-503 into one map 550; note that feature maps 501, 502, and 503 can each be regarded as individual tensors, such that they constitute a plurality of tensors both before and after conversion)); and 
applying, by the trainable tensor transformer, the plurality of trainable models to the second plurality of converted tensors to train the plurality of trainable models (for each training image, a training system may generate a feature map, identify RoIs, generate regional feature maps for the RoIs, and combine the regional feature maps into a larger combined regional feature map [plurality of converted tensors]; the neural network being trained may then process the combined regional feature map to generate a second combined regional feature map, then compare the results to the ground truths and use the comparison results to update [i.e., train] the neural network – Vajda, paragraph 79).”  

Regarding claim 3, Vajda, as modified by Vantrease and Grayson, discloses that “said training of the plurality of trainable models comprises simultaneously applying at least two trainable models of the plurality of trainable models (detection head, keypoint head, and segmentation head may perform their respective operations in parallel [i.e., simultaneously] – Vajda, paragraph 35).”  

Regarding claim 5, Vajda, as modified by Vantrease and Grayson, discloses that “said converting the plurality of input tensors comprises: 
associating each trainable model of the plurality of trainable models with respective one or more converted tensors of the plurality of converted tensors (after combining regional feature maps and performing convolution on the combined feature map [converted tensor], for each RoI, information is generated that is associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) [i.e., each converted feature map tensor is associated with the models such as the detection head, keypoint head, and segmentation head by being processed by each] – Vajda, paragraph 77 and Fig. 6, ref. chars. 650-670); 
associating each tensor of the plurality of converted tensors with respective one or more input tensors of the plurality of input tensors (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the association between the input tensor and the converted tensor is that the dimensionality of the converted tensor has been reduced by one] – Vajda, paragraph 66); [and]
Inventor(s): Ma, YimingExaminer: Iqbal, QamarApplication No.: 16/370,156-3/22-Art Unit: 2123generating the plurality of converted tensors based on said associating each trainable model and said associating each tensor (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the tensors are converted based on the necessity of reducing their dimensionality, or of associating the converted tensors with the original tensors] – Vajda, paragraph 66; after combining regional feature maps and performing convolution on the combined feature map [converted tensor], for each RoI, information is generated that is associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) [i.e., the tensors are converted based on a necessity of reducing their dimensionality for subsequent input into the plurality of models] – id. at paragraph 77 and Fig. 6, ref. chars. 650-670).”  

Regarding claim 13, Vajda, as modified by Vantrease and Grayson, discloses that “the inference comprises a probability that a particular user will manipulate a particular online artifact (machine learning model may include a long short-term memory neural network model the output of whose training may be a probabilistic model that generates a probability value [inference] that a user will interact with [manipulate] a given content item [online artifact] in a set of content items – Grayson, col. 14, l. 60-col. 15, l. 2).”  
Grayson and the instant application both relate to automated systems to determine user interaction with items and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease such that the system performs inferences representing probabilities that a user will manipulate an online artifact, as disclosed by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to present more relevant content to the user that is not obfuscated by typical but not useful content.  See Grayson, col. 2, ll. 27-34.

Regarding claim 14, Vajda, as modified by Vantrease and Grayson, discloses that “the particular online artifact comprises a hyperlink or an advertisement banner (content recommender can select a number n of content items for display to the user of an application; for example, content recommender can generate a result data set from the top n content items in the sorted list of content items and display links [hyperlink] to the n highest-ranked content items to a user of an application – Grayson, col. 13, l. 21-42).”   It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease such that the system performs inferences representing probabilities that a user will manipulate a hyperlink, as disclosed by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to present more relevant hyperlinks to the user that is not obfuscated by typical but not useful content.  See Grayson, col. 2, ll. 27-34.

Regarding claim 15, Vajda discloses “generating, by the trainable tensor transformer, a plurality of inferences (trainable neural network employing a technique that transforms a 4D tensor into a 3D tensor [i.e., is a trainable tensor transformer] outputs an object detection indicator, keypoints, and/or segmentation mask [plurality of inferences] – Vajda, paragraphs 34, 53, 66)….”
Grayson discloses “generating … a plurality of inferences, wherein each inference of the plurality of inferences comprises a respective probability that the particular user will manipulate a respective online artifact of a plurality of online artifacts (content recommender can sort content items into an ordered list based on predictive scores generated by content recommender; content recommender can select a number of content items [online artifacts] for display to a user of an application; for example, if content recommender sorts the content items from highest predictive score to lowest, the content recommender can generate a result data set from the top n [plurality of] content items and display links to the n highest-ranked content items to a user of an application – Grayson, col. 13, ll. 21-42; predictive scores indicate a likelihood [probability] that a user will interact with each content item – id. at col. 12, l. 59-col. 13, l. 20); 
ranking the plurality of online artifacts based on their respective probabilities (content recommender can select a number of content items [online artifacts] for display to a user of an application; for example, if content recommender sorts the content items from highest predictive score to lowest, the content recommender can generate a result data set from the top n content items and display links to the n highest-ranked content items to a user of an application – Grayson, col. 13, ll. 21-42; predictive scores indicate a likelihood [probability] that a user will interact with each content item – id. at col. 12, l. 59-col. 13, l. 20) …; [and]
selecting at least one online artifact of the plurality of online artifacts to present to the particular user based on said ranking (content recommender can select a number of content items [online artifacts] for display to a user of an application; for example, if content recommender sorts the content items from highest predictive score to lowest, the content recommender can generate a result data set from the top n content items and display links to [select for presentation] the n highest-ranked content items to a user of an application – Grayson, col. 13, ll. 21-42).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease to rank the online artifacts and present them to the user based on the ranking, as modified by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to present more relevant content to the user that is not obfuscated by typical but not useful content.  See Grayson, col. 2, ll. 27-34.
  
Regarding claim 16, Vajda, as modified by Vantrease and Grayson, discloses that “the inference comprises a probability that a particular search result or a particular employment opportunity is suited for a particular user (content recommender can present a set of content items determined to be relevant to the user based on user activity history within an application; the set of content items may be a “pre-search” result presented to the user in connection with a search interface that allows the user to execute a search for content if the content items identified in the “pre-search” result are not relevant to the user [i.e., the set of content items are search results with the highest probability of being relevant to a user] – Grayson, col. 7, ll. 29-55).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease with such that the system performs inferences representing probabilities that a user will take an interest in a search result, as disclosed by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to present more relevant results to the user that are not obfuscated by typical but not useful content.  See Grayson, col. 2, ll. 27-34.

Regarding claim 18, Vajda, as modified by Vantrease and Grayson, discloses that “the plurality of input tensors comprises: 
one or more user tensors that represent at least one user, 
one or more artifact tensors that represent at least one online artifact, and/orInventor(s): Ma, YimingExaminer: Iqbal, Qamar 
Application No.: 16/370,156-6/22-Art Unit: 2123one or more event tensors that represent at least one event that occurred between a user and an artifact (a technique may be utilized that transforms a 4D tensor into a single 3D tensor so that a single optimized convolution operation may be performed on all the feature maps – Vajda, paragraph 66; process of optimizing convolutional operations on feature maps of RoIs may be performed on a still image posted on a social network [i.e., the image represented by the tensor may be an online artifact or a user, see Fig. 1B] – id. at paragraph 70).”  

Regarding claim 19, Vajda, as modified by Vantrease, discloses that “the plurality of input tensors comprises: 
a first one or more tensors that represent data of first user in a computer system and/or events that involved the first user interacting with the computer system (technique may be employed that transforms a 4D tensor into a 3D tensor to perform a single optimized convolution operation on all feature maps – Vajda, paragraph 66; the method for optimizing convolutional operations on the feature maps begins by accessing an image of interest – id. at paragraph 70 [so the feature map tensor represents the image]; image may contain a detected person [data of a first user in a computer system] – id. at paragraph 33), and 
a second one or more tensors that represent data of a second user in the computer system and/or events that involved the second user interacting with the computer system (technique may be employed that transforms a 4D tensor into a 3D tensor to perform a single optimized convolution operation on all feature maps – Vajda, paragraph 66; the method for optimizing convolutional operations on the feature maps begins by accessing an image of interest – id. at paragraph 70 [so the feature map tensor represents the image]; image may contain a detected person [second user] – id. at paragraph 33; see also Fig. 1B [showing that there may be multiple people/users in the image]); [and]
the plurality of trainable models are applied to converted tensors representing the first one or more tensors and the second one or more tensors (to avoid performance issues, a technique that transforms [converts] a 4D tensor (i.e., the feature maps of the N or M RoIs) into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor – Vajda, paragraph 66; system may then process the combined regional feature map using one or more convolutional layers [i.e., apply a trainable model] to generate another combined regional feature map [converted tensor] – id. at paragraph 76)….”
Grayson discloses that “the inference represents a probability that the first user and the second user have a defined degree of user similarity or that preferences of the first user and preferences of the second user have a defined degree of preferences similarity (content items with higher predictive scores may be content items that are predicted to be relevant to the user based on similar content selections by users who have executed similar activities in the application – Grayson, col. 16, ll. 9-16 [i.e., the prediction score represents a probability that the instant user has the requisite degree of similarity to another user who executed similar activities]).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease such that the model generates an inference representing a probability that two users are similar, as disclosed by Grayson, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the system to present more relevant content to the user that is not obfuscated by typical but not useful content.  See Grayson, col. 2, ll. 27-34.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease.
Regarding claim 20, Vajda discloses “[o]ne or more non-transitory computer-readable media storing instructions that, when executed by one or more computers, cause for each input record of a plurality of input records, a trainable tensor transformer to perform: 
training a plurality of trainable models using at least one of a plurality of different training techniques or a plurality of different subsets of training data (a temporary trunk and a temporary region proposal network (RPN) may be trained together to generate a temporary functional model for generating region of interest (RoI) candidates; the training dataset at this stage may include image samples having a corresponding ground truth or label which may include bounding boxes or other indicators for RoIs – Vajda, paragraph 54; a trunk and downstream heads may then be trained; the training dataset for this stage may include image samples with labels that indicate known bounding boxes, known keypoints for object instances of interest, and known segmentation masks for object instances of interest [i.e., the training data at this stage are a different subset from those used by the temporary trunk and temporary RPN] – id. at paragraph 55);
by executing a mapping of input tensors to converted tensors, converting a plurality of input tensors of the input record into a plurality of converted tensors (to avoid performance issues, a technique that transforms [converts] a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., there is a mapping between the input tensor and the converted tensor, such that the dimensionality of the converted tensor has been reduced by one] – Vajda, paragraph 66), wherein a tensor of the plurality of converted tensors represents a feature of a plurality of features in a different format from its corresponding input tensor, wherein the different format is capable of being processed by a plurality of trainable models (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the converted tensor represents the feature maps of the RoIs and is in 3D format, which is different from the 4D format of the input tensor] – Vajda, paragraph 66; see also paragraph 56 (discussing the processing of the RoIs by the heads during training)); 
determining associations between input tensors and converted tensors (to avoid performance issues, a technique that transforms a 4D tensor (i.e., the feature maps of the N or M RoIs) [input tensor] into a single 3D tensor [converted tensor] so that the processing engine may be used to perform a single optimized convolution operation on all the feature maps; for example, three three-dimensional feature maps of the RoIs may be tiled together to form one large 3D tensor [i.e., the association between the input tensor and the converted tensor is that the dimensionality of the converted tensor has been reduced by one] – Vajda, paragraph 66); 
based on the associations, applying each of the plurality of trainable models to corresponding subsets of the plurality of converted tensors (after generating a combined regional feature map by combining the regional feature maps into one [i.e., converting the tensor], the combined regional feature map [subset of the plurality of converted tensors, since a separate combined tensor is produced for each image; see Fig. 6, ref. char. 610] is processed [i.e., each of the heads is applied] using one or more convolutional layers to generate another combined regional feature map – Vajda, paragraphs 74-76) to generate an inference for the input record (after performing convolution on the combined feature map, the system generates information associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) based on a portion of the second combined regional feature map; for each RoI, the system may identify a region in the second combined regional feature map that corresponds to that RoI; that region may be used to generate the desired output [inference], such as an instance segmentation mask, a keypoint mask, a bounding box, or a classification – Vajda, paragraph 77 and Fig. 6, ref. chars. 660-670; machine learning model is configured to output an object detection indicator (coordinates of a bounding box surrounding a person), keypoints (representing the pose of a detected person), and/or segmentation mask (identifying pixels that correspond to the detected person) [i.e., the collective prediction of the models is that the person is within a certain bounding box, is making a certain pose, and/or corresponds to certain pixels of the image] – id. at paragraph 34); [and]
converting the inference into a prediction tensor (after performing convolution on the combined feature map, the system generates information associated with object instances (e.g., segmentation masks, bounding boxes, keypoints) based on a portion of the second combined regional feature map; for each RoI, the system may identify a region in the second combined regional feature map that corresponds to that RoI; that region may be used to generate the desired output [inference], such as an instance segmentation mask, a keypoint mask, a bounding box, or a classification – Vajda, paragraph 77 and Fig. 6, ref. chars. 660-670; the segmentation mask may be represented as a matrix or grid of binary values [i.e., a two-dimensional tensor] that indicate whether a corresponding pixel belongs to a detected instance of the object – id. at paragraph 56)….” Inventor(s): Ma, YimingExaminer: Iqbal, Qamar
Vajda appears not to disclose explicitly the further limitations of the claim.  However, Vantrease discloses “Application No.: 16/370,156 -7/22-Art Unit: 2123storing the prediction tensor and the plurality of input tensors into a plurality of output tensors of a respective output record for the input record (memory may be configured to store instructions, input data sets, and weights, and may also be configured to store outputs of the neural network processor that may be used by the host device to make predictions about an input image – Vantrease, col. 13, l. 57-col. 14, l. 4; computations may be performed for multi-dimensional arrays or tensors with multi-dimensional paddings – id. at col. 12, ll. 33-52 [i.e., the output and input data stored may be in the form of tensors]); and
providing the output record to one or more downstream processors of a data stream (output Yout of a processing element (PE) may be provided to a downstream neighboring PE – Vantrease, col. 6, ll. 9-28).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vajda to store both the output and the input of the model and provide the result to a downstream processor, as disclosed by Vantrease, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the stability of the system by ensuring that output records are not lost and can be used in subsequent processing.  See Vantrease, col. 6, ll. 9-28 and col. 13, l. 57-col. 14, l. 4.

Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson further in view of Theis (US 10623775) (“Theis”).
Regarding claim 4, Vajda, as modified by Vantrease, Grayson, and Theis, discloses that “the plurality of trainable models comprises a decision tree, a second-order optimization, an additive model, or an autoencoder (in an autoencoder system for image compression that includes an encoder, a decoder, and a learning module, a sub-pixel convolution layer of the decoder can include a reorganization of coefficients that can convert a tensor with many channels into a tensor of the same dimensionality but with fewer channels and larger spatial extent – Theis, col. 8, l. 57-col. 9, l. 8).”  
Theis and the instant application both relate to autoencoders and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease to include an autoencoder among the models, as disclosed by Theis, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would ensure that the system is capable of learning a more compact representation of the input data, thereby potentially saving computational resources. See Theis, col. 4, ll. 4-20.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson and further in view of Nair et al. (US 20190042094) (“Nair”).
Regarding claim 6, Vajda, as modified by Vantrease, Grayson, and Nair, discloses that “said converting the plurality of input tensors of the input record into the plurality of converted tensors comprises obtaining the input record from a queue (conversion of the numeric representation of tensor data may be “down-conversion,” where the incoming number of meaningful bits per element is greater than the outgoing number of meaningful bits, per element, and “up-conversion,” in which the opposite is true; in such a conversion, a tensor is stored in a memory away; a temporary storage such as a queue may be used to improve performance – Nair, paragraphs 146-48).”  
Nair and the instant application both relate to conversion of tensor data and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Grayson, and Vantrease to obtain the inputs to the model from a queue, as disclosed by Nair, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve the performance of the system relative to a system in which the input data must be retrieved directly from the permanent memory.  See Nair, paragraph 148.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson and further in view of Zawada et al. (US 20180192265) (“Zawada”).
Regarding claim 7, Vajda, as modified by Vantrease, Grayson, and Zawada, discloses “applying a second trainable tensor transformer to each respective output record (in a neural network for making predictions of a user, the output of a convolution-nonlinearity step may be a third-tensor output; this output may be converted [transformed] to a first-order tensor – Zawada, paragraph 59).”  
Zawada and the instant application both relate to neural networks and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease to apply a tensor transformer to the output, as disclosed by Zawada, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the tensor to be used in subsequent calculations, thereby enhancing the utility of the tensor.  See Zawada, paragraph 59 (output may be converted into a first-order tensor and input into recurrent steps of a neural network).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease, Grayson, and Zawada and further in view of Brownlee, A Gentle Introduction to the Gradient Boosting Algorithm for Machine Learning (2016), https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/ (“Brownlee”).
Regarding claim 8, Vajda, as modified by Vantrease, Grayson, and Zawada, discloses “training, by the trainable tensor transformer, the plurality of trainable models with a plurality of training records to generate a training inference with each output record of a plurality of output records (a method of training the model may involve pre-training the trunk; training together a temporary trunk and a temporary RPN; training the trunk and the heads using the RoIs from the temporary trunk and RPN; training the RPN with the trunk fixed; and training the heads with the trunk and PN fixed [i.e., all heads/models are trained]; each training image sample [training record] in the training dataset during the final training of the heads has known ground-truth boxes, keypoints, and segmentation mask; errors between the generated masks [training inferences] and the ground truth may be used to update the network via backpropagation – Vajda, paragraphs 53-58; see also Fig. 4) ….”
Neither Vajda, Vantrease, nor Grayson appears to disclose explicitly the further limitations of the claim.  However, Brownlee discloses “hypothesis boosting by, for each output record of the plurality of output records: 
increasing a weight of the output record when the training inference comprises a metric that indicates inaccuracy or nonconfidence of the training inference (in the AdaBoost algorithm, observations [output records] that a weak learner can handle are left and new weak learners are developed to handle the remaining difficult observations; the observations are weighted, putting more weight on difficult to classify instances [i.e., those associated with a metric that indicates inaccuracy or nonconfidence] and less on those already handled well [i.e., on those associated with a metric that indicates accuracy or confidence – Brownlee, sections entitled “The Origin of Boosting” and “AdaBoost the First Boosting Algorithm”), and 
decreasing the weight of the output record when said metric indicates accuracy or confidence of the training inference (in the AdaBoost algorithm, observations [output records] that a weak learner can handle are left and new weak learners are developed to handle the remaining difficult observations; the observations are weighted, putting more weight on difficult to classify instances [i.e., those associated with a metric that indicates inaccuracy or nonconfidence] and less on those already handled well [i.e., on those associated with a metric that indicates accuracy or confidence – Brownlee, sections entitled “The Origin of Boosting” and “AdaBoost the First Boosting Algorithm”); [and] 
training the second [model] based on said hypothesis boosting (in the AdaBoost algorithm new weak learners [second models] are added sequentially that focus their training on the more difficult patterns – Brownlee, section entitled “AdaBoost the First Boosting Algorithm”).”  
Brownlee and the instant application both relate to boosting algorithms and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Zawada, Grayson, and Vantrease to train the model based on hypothesis boosting, as disclosed by Brownlee, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the accuracy of the models by focusing the training on poorly classified samples to improve the models’ performance.  See Brownlee, section entitled “AdaBoost the First Boosting Algorithm”.

Claims 9-10 are rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson and further in view of Kiers et al. (WO 2019162204) (“Kiers”).
Regarding claim 9, Vajda, as modified by Vantrease, Grayson, and Kiers, discloses “applying a second trainable tensor transformer to the plurality of input records to generate a second inference (in a deep learning model that performs convolution with dilated kernels, an activation function non-linearly maps elements of an input tensor to an output tensor [i.e., transforms the tensor] – Kiers, paragraph 143; a predicted image [inference] is generated by applying the deep learning model using the input image [input record] of the training data – id. at paragraph 146); 
converting, by the second trainable tensor transformer, the second inference into a second prediction tensor (a predicted image is generated by applying the deep learning model using the input image of the training data; the predicted image may be an output tensor [i.e., the predicted image inferred is converted into a tensor form] – Kiers, paragraph 146); [and]
storing, by the second trainable tensor transformer, the second prediction tensor into said plurality of output tensors of said respective output record (a predicted image is generated by applying the deep learning model using the input image of the training data; the predicted image may be an output tensor – Kiers, paragraph 14; storage device may be used for storing information [such as the output tensor] and instructions to be executed by a processor – id. at paragraph 194).”  
Kiers and the instant application both relate to machine learning algorithms and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Grayson, and Vantrease to generate a second inference and store it in tensor form, as disclosed by Kiers, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the predictive power of the system by allowing it to make multiple inferences and allow the output data to be used for later purposes.  See Kiers, paragraphs 143, 146, 194.

Regarding claim 10, Vajda, as modified by Vantrease, Grayson, and Kiers, discloses that “said applying the second trainable tensor transformer to the plurality of input records comprises applying the second trainable tensor transformer to a subset of the plurality of input records that is based on sample bootstrap aggregating (bagging) (supervised learning is the machine learning task of inferring a function from labeled training data [input records]; the training data include a set of training examples, each of which comprises an input object represented in tensor form and a desired output value; supervised learning algorithm [trainable tensor transformer] analyzes the training data and produces an inferred model, which can be used for mapping new examples; one example of supervised learning is bagging – Kiers, paragraphs 129-30).”  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda and Vantrease to apply the model to input records based on bagging, as disclosed by Kiers, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would improve the accuracy of the models while helping to avoid overfitting.  See Kiers, paragraphs 129-30.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease, Grayson, and Kiers and further in view of Sandmann et al. (US 20180328904) (“Sandmann”).
Regarding claim 11, Vajda, as modified by Vantrease, Kiers, Grayson, and Sandmann, discloses that “the inference and the second inference are simultaneously generated (in a system and method for classifying a sample using a classification algorithm such as a neural network, multiple samples may be classified simultaneously – Sandmann, paragraph 24 and claim 4).”  
Sandmann and the instant application both relate to machine learning and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Kiers, Grayson, and Vantrease to generate both inferences simultaneously, as disclosed by Sandmann, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would increase the efficiency of the system and increase convenience for the user by ensuring that multiple records can be processed at the same time.  See Sandmann, paragraph 24 and claim 4.

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson and further in view of Arthur et al. (US 20190303740) (“Arthur”).
Regarding claim 12, Vajda, as modified by Vantrease, Grayson, and Arthur, discloses that “said converting the plurality of input tensors comprises receiving the plurality of input records from a first stream of individual records (in a system for transfer of neuron output values through data memory for neurosynaptic processors, a layer of the neural network may be represented as a tensor; a physical architecture for execution of logical cores of the network includes an interface that reads from and writes to data memory, processing an input stream and generating an output stream – Arthur, paragraphs 24, 32; see also Fig. 6, ref. chars. 608-11 [showing input records 610 and output records 611 organized as streams]); [and]
said storing into the plurality of output tensors of said respective output record comprises sending each said respective output record to a second stream of individual records (in a system for transfer of neuron output values through data memory for neurosynaptic processors, a layer of the neural network may be represented as a tensor; a physical architecture for execution of logical cores of the network includes an interface that reads from and writes to [stores] data memory, processing an input stream and generating an output stream – Arthur, paragraphs 24, 32; see also Fig. 6, ref. chars. 608-11 [showing input records 610 and output records 611 organized as streams]).” 
Arthur and the instant application both relate to machine learning processing and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Grayson, and Vantrease to receive the input to a first stream and send the output to a second stream, as disclosed by Arthur, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would allow the data to be handled sequentially, thereby reducing the need for processing power in any given clock cycle.  See Arthur, paragraph 32. 

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Vajda in view of Vantrease and Grayson and further in view of Shang et al., “Wisdom of the Crowd: Incorporating Social Influence in Recommendation Models,” in IEEE 17th Int’l Conf. Parallel and Distributed Sys. 835-40 (2011) (“Shang”).
Regarding claim 17, Vajda, as modified by Vantrease, Grayson, and Shang, discloses that “the inference represents a probability that a generalized user would manipulate a particular online artifact (social influence network theory is used to develop a recommendation model for groups; an N x 1 vector can be calculated that describes users’ final preference prediction under social influence; the average rating can then be calculated to form a recommendation list for the group [generalized user] – Shang, sec. IV(A) [i.e., the item with the highest rating is the one with which the group has the highest probability of interacting]; recommenders may give customized recommendations to online users on books, movies, and commodities according to their previous preference data – id. at sec. I, first paragraph [i.e., the recommendation may represent a likelihood that the user will interact with the online listing for the book, movie, or commodity]), [and]
the generalized user is based on multiple users (social influence model may be used to describe, for instance, three users’ opinion evolution based on their discussion of, for instance, which movie to watch [i.e., the recommendation is based on multiple users] – Shang, sec. IV, first paragraph).”  
Shang and the instant application both relate to recommender systems and are analogous.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Vajda, Grayson, and Vantrease to use the model to make inferences about generalized users comprising multiple users, as modified by Shang, and an ordinary artisan could reasonably expect to have done so successfully.  Doing so would provide a generic recommendation that takes the potentially differing opinions of multiple users into account.  See Shang, abstract.

Response to Arguments
Applicant's arguments filed June 30, 2022 (“Remarks”) have been fully considered but they are, except insofar as rendered moot by the withdrawal of a rejection or the entry of a new ground of rejection, not persuasive. 
Applicant’s arguments that claim 20 as amended is eligible under 35 USC § 101, Remarks 24-27, are persuasive.  That rejection has been withdrawn.
Applicant argues that the individual feature maps of the regions of interest of Vajda cannot be regarded as a plurality of converted tensors because Vajda tiles feature maps together to form a single 3D tensor.  Applicant further argues that this is merely an arrangement of the feature maps, which is not the same as a conversion thereof.  Remarks at 29-30.  However, first of all, Applicant never explicitly defines the term “convert” in either the specification or the claims, meaning that the plain meaning of the term as understood by an ordinary artisan before the effective filing date applies.  MPEP § 2111.01.  The Oxford English Dictionary defines “convert” as “to turn or change into something of different form or properties; to transform”.  Oxford Eng. Dictionary, definition 11 of “convert (v.),” https://www.oed.com/view/Entry/40777?rskey=YW7008&result=2&isAdvanced=false#eid.  Moreover, as explained further below, the claim does not require that the features themselves be converted, but rather only that the tensors be converted.  Using the above definition of “convert,” the system of Vajda does convert the feature maps comprising the original 4D tensor into a plurality of converted tensors.  Paragraph 66 indicates that the feature maps are tiled into a 3D tensor, with padding placed in between them.  That is, before conversion, the tensors/feature maps have the property that they are arranged in four dimensions, whereas after conversion they have the property of being arranged in three dimensions.  That is manifestly to change the tensors “into something of different form or properties”.
Applicant then argues that the individual feature maps making up the converted 3D tensor in Vajda cannot be “in a different format” from the corresponding input tensor because the individual feature maps are each in the same format after tiling as before tiling.  Remarks at 30-31.  However, the claim does not require that the feature maps themselves be reformatted, as Applicant appears to argue.  Rather, the claim as amended states that “a tensor of the plurality of converted tensors represents a feature of a plurality of features in a different format from its corresponding input tensor”.  In other words, according to the claim, only the representation of the feature must be reformatted into a converted tensor.  Vajda does this.  In particular, according to paragraph 66, three three-dimensional feature maps of the regions of interest are tiled together to form one large 3D tensor.  Thus, the individual feature maps are initially represented as constituent parts of a 4D tensor, but after conversion are represented as constituent parts of a 3D tensor.
Applicant then argues that Grayson allegedly does not disclose the use of at least one user tensor, at least one artifact tensor, and at least one event tensor because Grayson allegedly applies a single probabilistic model to an aggregated set of user activity records.  Remarks at 28. However, in the absence of a special definition of the term “tensor,” the term “tensor” may be construed in accordance with its plain meaning to refer to any n-dimensional array of numerical values, including a scalar (0th order tensor), vector (1st order tensor), matrix (2nd order tensor), and higher-dimensional tensors.  Grayson discloses a predictive model trained using a set of vectors each of which includes user activity history data (thereby rendering them user tensors), content interaction history data (thereby rendering them artifact tensors), and information that maps the user activity data to the content interaction history data (thereby rendering them event tensors, the “event” being the likelihood of the user interacting with certain items of content).  Thus, as the term is most broadly reasonably construed in light of the specification, Grayson discloses “input tensors” with the claimed properties.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to RYAN C VAUGHN whose telephone number is (571)272-4849. The examiner can normally be reached M-R 7:50a-5:50p ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar, can be reached at 571-272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/RYAN C VAUGHN/Examiner, Art Unit 2125