DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
 (A) the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B) the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as "configured to" or "so that"; and 
(C) the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function.
For the case of the claims in the instant application, that second presumption above stands unrebutted and the claims have been found not to invoke 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph accordingly.  Regarding that first rebuttable presumption above, Examiner notes MPEP 2181 fails to identify the term ‘model’ for example (see claims 9-17) as a generic placeholder/nonce term as identified in (A) above, but more importantly, instances of the language ‘model’ for the claims in question are preceded by structural modifiers ‘machine-learned encoding/decoding/feature extraction’ which serve to further limit the term ‘model’ structurally and in a manner consistent with/sufficient for the corresponding function (prong (C) above remaining unsatisfied for even instances wherein the term ‘model’ is understood to be a generic placeholder).  A person of ordinary skill in the art would also understand the limitations in question to be equivalents to those as identified in [0048-0049], [0063], etc. of Applicant’s specification as filed, and at the minimum structurally limited by housing hardware/memory disclosed therein, and definite accordingly.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-12 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lee et al (US Pat. Pub. No. 2019/0311202) in view of Tsai et al (US Pat. Pub. No. 2019/0158884).
Regarding claim 1, Lee et al discloses a computer-implemented method for localization of objects, the computer- implemented method comprising: accessing, by a computing system comprising one or more computing devices (Fig. 19 [computing system 1900]), source data and target data, the source data comprising a source representation of an environment comprising a source object (see at least paragraph 31-32 and fig. 2 discloses training image data and training video data); generating, by the computing system, a source feature representation based at least in part on the source representation and the one or more machine-learned feature extraction models (see at least paragraph 31 and fig. 7 discloses encoders extracts features a target video); and determining, by the computing system, a localized state of the source object with respect to the environment based at least in part on the source feature representation and the compressed target feature representation (see at least paragraph 57 and 59 and fig. 7).  
Lee et al fails to explicitly disclose the target data comprising a compressed target feature representation of the environment, wherein the compressed target feature representation is based at least in part on compression of a target feature representation of the environment produced by one or more machine-learned feature extraction models.  However, in the same field of endeavor, Tsai et al discloses the target data comprising a compressed target feature representation of the environment, wherein the compressed target feature representation is based at least in part on compression of a target feature representation of the environment produced by one or more machine-learned feature extraction models; (see at least paragraphs 15).  Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify to incorporate above mention feature as taught by Tsai et al into the system of Lee et al for purpose of a compression enables the transfer of large and/or high quality video/image information over limited network bandwidths enabling a system with a distributed architecture to more efficiently perform that object segmentation and related tasks.
Regarding claim 2, Lee et al discloses the determining, by the computing system, the localized state of the source object with respect to the environment based at least in part on the source feature representation and the compressed target feature representation comprises: generating, by the computing system, a reconstructed target feature representation based at least in part on the compressed target feature representation and a machine-learned reconstruction model (see at least fig. 6 [target mask 660 based on decoder 650 output] and paragraph 70), wherein the reconstructed target feature representation is a reconstruction of the target feature representation; and determining, by the computing system, the localized state of the source object based at least in part on one or more comparisons of the source feature representation to the reconstructed target feature representation (see at least paragraph 57 and 59 and fig. 7).  
Regarding claim 3, Lee et al discloses the determining, by the computing system, the localized state of the source object based at least in part on one or more comparisons of the source feature representation to the reconstructed target feature representation comprises: determining, by the computing system, one or more correlations between the reconstructed target feature representation and the source feature representation based at least in part on a probabilistic inference model configured to encode agreement between the source feature representation and the reconstructed target feature representation indexed at the localized state of the source object (see at least paragraph 44 and 59).  
Regarding claim 4, Lee et al discloses the compressed target feature representation is based at least in part on an encoding of the target feature representation using one or more lossless compression operations, and wherein the generating, by the computing system, the reconstructed target feature representation based at least in part on the compressed target feature representation and the machine-learned reconstruction model, wherein the reconstructed target feature representation is a reconstruction of the target feature representation comprises: generating, by the computing system, a decoded target feature representation of the compressed target feature representation based at least in part on the one or more lossless compression operations, wherein the one or more lossless compression operations comprise one or more lossless binary encoding operations (see at least paragraph 66); and generating, by the computing system, the target feature representation based at least in part on the decoded target feature representation and the machine-learned reconstruction model (see at least paragraph 70; fig. 6 [630, 615]).  
Regarding claim 5, Lee et al discloses the determining, by the computing system, the localized state of the source object with respect to the environment based at least in part on the source feature representation and the compressed target feature representation comprises: rotating, by the computing system, the source feature representation to a plurality of candidate angles; and determining, by the computing system, at each of the plurality of candidate angles, whether the source feature representation matches the compressed target feature representation (see at least paragraph 74).  
Regarding claim 6, Lee et al discloses the compressed target feature representation of the environment is based at least in part on an attended feature representation of the target feature representation generated by a machine-learned attention model configured to mask one or more portions of the target feature representation (see at least paragraph 66).  
Regarding claim 7, Lee et al discloses the source data is based at least in part on one or more sensor outputs from one or more sensors comprising at least one of: one or more light detection and ranging (LiDAR) devices, one or more sonar devices, one or more radar devices, or one or more cameras (see at least fig. 2 [250]).  
Regarding claim 8, Lee et al discloses the one or more machine- learned feature extraction models comprise a first machine-learned extraction model configured to generate the source feature representation and a second machine-learned model configured to generate the target feature representation (see at least paragraph 31 and fig. 7).  
Regarding claim 9, Lee et al discloses a computing system comprising: one or more processors; one or more machine-learned feature extraction models configured to access training data comprising one or more representations of a training environment and generate one or more feature extracted representations of the training environment (see at least paragraph 31-32 and fig. 2 discloses training image data and training video data); and one or more tangible non-transitory computer-readable media storing computer-readable instructions that when executed by one or more processors cause the one or more processors to perform operations, the operations comprising: accessing training data comprising a source representation of the training environment and a target representation of the training environment (see at least paragraph 31-32 and fig. 2 discloses training image data and training video data), wherein the source representation is associated with a ground-truth state of a source object in the training environment (see at least paragraph 31-32 and fig. 2 discloses training image data and training video data); generating a source feature representation and a target feature representation based at least in part on the one or more machine-learned feature extraction models accessing the source representation and the target representation respectively (see at least paragraph 31 and fig. 7 discloses encoders extracts features a target video); determining a localized state of the source object within the target representation of the environment based at least in part on the source feature representation and the compressed target feature representation (see at least paragraph 57 and 59 and fig. 7); determining a loss based at least in part on one or more comparisons of the localized state of the source object to the ground-truth state of the source object (see at least paragraph 31 and 44 and fig. 11); and adjusting one or more parameters of the one or more machine-learned compression models based at least in part on the loss (see at least paragraph 48 and 82-83).  
Lee et al fails to explicitly disclose generating a compressed target feature representation of the target feature representation based at least in part on one or more machine-learned compression models.  However, in the same field of endeavor, Tsai et al discloses generating a compressed target feature representation of the target feature representation based at least in part on one or more machine-learned compression models (see at least paragraphs 15).  Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify to incorporate above mention feature as taught by Tsai et al into the system of Lee et al for purpose of a compression enables the transfer of large and/or high quality video/image information over limited network bandwidths enabling a system with a distributed architecture to more efficiently perform that object segmentation and related tasks.
Regarding claim 10, Lee et al discloses the generating the compressed target feature representation of the target feature representation based at least in part on the one or more machine-learned compression models comprises: generating an encoded target feature representation based at least in part the target feature representation and a machine-learned encoding model (see at least paragraph 66); generating the compressed target feature representation based at least in part on use of one or more lossless binary encoding operations on the encoded target feature representation (see at least paragraph 66); and wherein adjusting the one or more parameters of the one or more machine-learned compression models based at least in part on the loss comprises adjusting the one or more parameters of the machine-learned encoding model based at least in part on the loss (see at least paragraph 48 and 82-83).  
Regarding claim 11, Lee et al discloses the generating the compressed target feature representation of the target feature representation based at least in part on the one or more machine-learned compression models comprises: generating an attention feature representation based at least in part on the target feature representation and a machine-learned attention model; and generating an attended target feature representation based at least in part on masking the target feature representation with the attention feature representation, wherein the compressed target feature representation is based at least in part on the attended target feature representation (see at least paragraph 36 and 43 and fig. 6).  
Regarding claim 12, Lee et al discloses the determining the localized state of the source object within the target representation of the environment based at least in part on the source feature representation and the compressed target feature representation comprises: determining one or more correlations between the source feature representation and the attended feature representation (see at least paragraph 57 and 59 and fig. 7).  
Regarding claim 16, Lee et al discloses the determining the loss based at least in part on one or more comparisons of the localized state of the source object to the ground-truth state of the source object comprises:  determining the loss based at least in part on an entropy of the compressed target feature representation, wherein the entropy is based at least in part on a data size of the compressed target feature representation, and wherein the entropy is positively correlated with the data size (see at least paragraph 57 and 59 and fig. 7).  
Regarding claim 17, Lee et al discloses the determining the loss based at least in part on one or more comparisons of the localized state of the source object to the ground-truth state of the source object comprises: determining the loss based at least in part on an accuracy of the localized state of the source object with respect to the ground-truth state of the source object, wherein the accuracy is inversely correlated with the loss and a distance of the localized state of the source object from the ground-truth state of the source object (see at least paragraph 31 and 44 and fig. 11).  
Regarding claim 18, Lee et al discloses a computing device comprising: one or more processors; a memory including one or more tangible non-transitory computer-readable media, the memory storing computer-readable instructions that when executed by the one or more processors cause the one or more processors to perform operations including: accessing source data including a representation of an environment comprising a source object (see at least paragraph 31-32 and fig. 2 discloses training image data and training video data); generating a source feature representation of the source data based at least in part on one or more machine-learned feature extraction models (see at least paragraph 31 and fig. 7 discloses encoders extracts features a target video); and determining a localized state of the source object with respect to the environment based at least in part on the source feature representation and the compressed target feature representation (see at least paragraph 57 and 59 and fig. 7).  
Lee et al fails to explicitly disclose accessing target data including a compressed target feature representation of the environment, wherein the compressed target feature representation is generated based at least in part on compression of a target feature representation of the environment produced by the one or more machine-learned feature extraction models.  However, in the same field of endeavor, Tsai et al discloses accessing target data including a compressed target feature representation of the environment, wherein the compressed target feature representation is generated based at least in part on compression of a target feature representation of the environment produced by the one or more machine-learned feature extraction models (see at least paragraphs 15).  Therefore, it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to modify to incorporate above mention feature as taught by Tsai et al into the system of Lee et al for purpose of a compression enables the transfer of large and/or high quality video/image information over limited network bandwidths enabling a system with a distributed architecture to more efficiently perform that object segmentation and related tasks.
Regarding claim 19, Lee et al discloses the target data including the compressed target feature representation is stored in the memory of the device (see at least fig. 2 [250]).  
Regarding claim 20, Lee et al discloses controlling, based at least in part on the localized state of the source object with respect to the environment, one or more autonomous vehicle systems associated with operation of an autonomous vehicle, wherein the one or more autonomous vehicle systems comprise one or more engine systems, one or more motor systems, one or more steering systems, one or more braking systems, one or more electrical systems, or one or more communications systems (see at least paragraphs 37).  
Allowable Subject Matter
Claims 13-15 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims. References of record fail to serve in any obvious combination teaching each and every limitation as required therein.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicant's disclosure. 
 	The following prior art are cited to show a method, which is considered pertinent to the claimed invention: 
Zhang et al (US Pat. Pub. No. 2010/0054615) directed toward system for generating image segmentation data using a multi-branch neural network.
Kim et al (US Pat. Pub. No. 2020/0162751) directed toward encoding/decoding image.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LITON MIAH whose telephone number is (571)270-3124. The examiner can normally be reached Mon - Fri 7:30am -5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Rafael Perez-Gutierrez can be reached on 571-272-7915. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/LITON MIAH/           Primary Examiner, Art Unit 2642