Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in response to amendments and remarks filed on 09/16/2022. In the current amendments, claims 1, 3-6, and 18 are amended, claims 2 and 19 are cancelled, and claims 21-22 are added. Claims 1, 3-18, and 20-22 are pending and have been examined.
In response to amendments and remarks filed on 09/16/2022, the 35 U.S.C. 101 rejection to claims 1, 10, 13-18, and 20 made in the previous Office Action have been withdrawn.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/21/2022 has been entered.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-4, 6-7, 10, 13, 15-16, 18, and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. (US 20200134377 A1) in view of Loxam et al. (US 20140225924 A1)
Regarding Claim 1,
Attorre et al. teaches a method comprising, by a server (Attorre et al., Para. [0046] and FIG. 6, “operations in flow chart 600 can be performed by one or more servers in a cloud computing environment as described below with respect to FIG. 7” teaches the server).
receiving, from a client computing device, one or more deep-learning (DL)-feature representations generated by a first machine learning model, wherein the one or more DL-feature representations are extracted from a region of interest detected by the client computing device within a first image of a real-world environment captured by the client computing device, the region of interest comprising a first depiction of a real-world object, and the one or more DL- feature representations are extracted at the client computing device by (Attorre et al., Para. [0005], “a method includes receiving a source image at one or more computing devices” teaches receiving from a computing device. Para. [0029] and FIG. 2, “A CNN in logo detection model 220 may then extract feature vectors from each candidate region and classify the candidate region based on the feature vectors” teaches a CNN in the logo detection model (corresponds to the machine learning model) that extracts feature vectors (corresponds to deep-learning feature representations) from each candidate region (corresponds to the region of interest detected). Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches determining identification of candidate regions (corresponds to the region of interest) of an image (corresponds to the first image of a real-world environment captured) from the logo detection model when an image is received from the computing device).
accessing the first image (Attorre et al., Para. [0007], “the first reference feature vector extracted from a first image of a first target logo in the set of target logos” teaches accessing the first image).
generating, by the first machine learning model, an initial feature map associated with the first image (Attorre et al., Para. [0029] and FIG. 2, “the input image may be fed to the CNN to generate a convolutional feature map” teaches utilizing a convolutional neural network (corresponds to the first machine learning model) to generate a feature map associated with the input image (corresponds to the first image)).
identifying the region of interest within the initial feature map (Attorre et al., Para. [0029] and FIG. 2, “candidate regions may be identified from the convolutional feature map” teaches identifying a candidate region (corresponds to region of interest) within the convolutional feature map).
wherein the region of interest is associated with at least a first real-world-object type, and wherein the region of interest is associated with a portion of the first image corresponding to the first depiction of the real-world object (Attorre et al., Para. [0029], “using selective search techniques and may be reshaped to a predetermined size using, for example, a region-of-interest (ROI) pooling layer” teaches utilizing a selective search technique to select the region of interest of the image that depicts the real-world object).  
extracting, from the region of interest, the one or more DL-feature representations, wherein each extracted DL-feature representation is an output of a second machine learning model that is trained to detect at least objects of the first real- world-object type (Attorre et al., Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches a second model trained to determine candidate regions and detect logos (corresponds to objects of the first real-world-object type) in a source image. Feature vectors are then extracted from the candidate regions).
generating, based on the patch, one or more local feature descriptors wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches the logo detection model generating sub-images or identifications of candidate regions (corresponds to the one or more local feature descriptors corresponds to the patch within the region of interest) based on the detected generic logo patches or regions of the received image (corresponds to the first image)). 
identifying a set of matching DL-feature representations based on a comparison of the received one or more DL-feature representations with a plurality of stored DL-feature representations associated with a plurality of augmented-reality (AR) targets, the comparison resulting in a determination that the set of matching DL-feature representations and the received one or more DL-feature representations are within a threshold region in a vector space (Attorre et al., Para. [0003], “Logo detection or recognition in images and videos can be used in many applications, such as copyright or trademark infringement detection, contextual advertise placement, intelligent traffic control based on vehicle logos, automated computation of brand-related statistics, augmented reality, and the like” teaches the technique being used in many applications such as augmented reality. Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches comparing feature vectors (corresponds to DL-feature representations) with the reference feature vectors stored in the database that is associated to a target logo (corresponds to AR target)  to identify potential matches. Para. [0020], “The target logo associated with the best matching reference feature vector is determined as present in the candidate region in the source image if the best matching score is greater than a threshold value” teaches determining a threshold value (corresponds to the threshold region in a vector space) for a set of matching reference feature vector (corresponds to deep-learning feature representations)). 
determining, from a set of matching AR targets associated with the set of matching DL- feature representations, a matching AR target based on a comparison of the received one or more local-feature descriptors with stored local-feature descriptors associated with the set of matching AR targets, wherein the stored local-feature descriptors are extracted from the set of matching AR targets (Attorre et al., Para. [0036], “The features extracted from each of sub-images 442, 444, and 446 may be compared with reference features stored in database 450 by a comparator 485 to determine if there is a match between any reference features stored in databased 450 and features extracted from sub-image 442, 444, or 446” teaches determining matching features by comparing features extracted from sub images (corresponds to local-feature descriptors) with reference features stored in a database).
Attorre et al. does not appear to explicitly teach selecting an AR effect associated with the determined matching AR target; sending, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object
However, Loxam et al. teaches selecting an AR effect associated with the determined matching AR target (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target)).
sending, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target). FIG.5-6 and Para. [0071], “The process described in FIGS. 5 and 6 may generally require the detect trigger item engine 370 to perform a one-to-many approach by positively matching the feature points of the one real world trigger item with the indexed feature points of the many known candidate trigger items” teaches the real world trigger item (corresponds to the real world object). Para. [0072], “When it is ready, the user can point the phone at the picture, and it will come to life. If the user was not told what picture to point the smart phone at, then the user can point the camera around the location and the augmented reality application will automatically detect the trigger items in view” teaches the augmented reality application automatically detecting the trigger items (corresponds to the real-world object) in view, which shows the trigger item is being tracked in real-time).
Attorre et al. in view of Loxam et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image recognition”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. with Loxam et al., with motivation of selecting an AR effect associated with the determined matching AR target; sending, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object. “The systems and methods allow mobile computing devices to identify real world trigger items and to cause augmented reality scenarios associated with a real world trigger item to be presented on a display of the mobile computing device” (Loxam et al., Abstract). The proposed teaching is beneficial in that it helps identify real world trigger items and cause augmented reality scenarios associated with a real work trigger item.
Regarding Claim 3,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches wherein the one or more local- feature descriptors are extracted at the client computing device by: 
The combination, as described in the rejection of claim 1, further teaches extracting, from a portion of the first image associated with the region of the interest, one or more local-feature descriptors associated with one or more detected points of interest, wherein each local-feature descriptor is generated based on information associated with a spatially bounded patch within the first image, the spatially bounded patch comprising a respective detected point of interest (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches determining patches/regions, sub-images, and identifications (corresponds to local-feature descriptors and the coordinates from the identifications corresponds to one or more detected points of interest) from the logo detection model when an image is received).
Regarding Claim 4,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein detecting the region of interest comprises (Attorre et al., Para. [0005], “detecting, in the source image and using a first logo detection model implemented by the one or more computing devices, a candidate region for determining a logo in the source image” teaches detecting the candidate region (corresponds to region of interest)).
calculating, for one or more portions of the first image, a confidence score based on a third machine learning model (Attorre et al., Para. [0046], “the one or more processing devices may implement one or more neural networks for one or more machine learning-based models” teaches the embodiment consisting of multiple machine learning models. Para. [0005], “extracting, from the candidate region and by a neural network implemented using the one or more computing devices, a feature vector of the candidate region, and determining, for each reference feature vector from a set of reference feature vectors stored in a database, a respective matching score” teaches determining a matching score (corresponds to confidence score) based on the neural network implemented). 
selecting the region of interest, one or more of the portions of the first image having a confidence score greater than a threshold confidence score (Attorre et al., Para. [0020], “The target logo associated with the best matching reference feature vector is determined as present in the candidate region in the source image if the best matching score is greater than a threshold value” teaches selecting a candidate region (corresponds to region of interest) in the source image (corresponds to the first image) if the best matching score (corresponds to confidence score) is greater than a threshold value).
Regarding Claim 6,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1, wherein the stored DL- feature representations are determined by a process comprising: 
The combination, as described in the rejection of claim 1, further teaches passing a plurality of second images comprising second depictions of the real-world object, wherein each of the plurality of second images comprises a variation of the first depiction of the real-world object (Attorre et al., Para. [0034], “Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420” teaches sub-images (corresponds to second images) that is a cropped variation of the first image depiction. The model depicting the brand logos (corresponds to real-world object) in each of the images. Para. [0036], “Each of sub-images 442, 444, and 446 extracted from image 420 may be passed to a feature extractor 480” teaches passing a plurality of sub-images to the feature extractor.
extracting, from each of the plurality of second images, one or more DL-feature representations (Attorre et al., Para. [0036], “Each of sub-images 442, 444, and 446 extracted from image 420 may be passed to a feature extractor 480 to extract features from each of sub-images 442, 444, and 446 using a feature extractor” teaches utilizing a feature extractor to extract feature vectors from the sub-images).
Regarding Claim 7,
The Attorre et al. in view of Loxam et al. combination of claim 6 teaches the method of Claim 6, further comprising:
The combination, as described in the rejection of claim 6, further teaches representing the DL-feature representations extracted from the plurality of second images as vector representations (Attorre et al., Para. [0036], “The features extracted from each of sub-images 442, 444, and 446 may be compared with reference features stored in database 450 by a comparator 485… Comparator 485 may compare features (e.g., represented by feature vectors) to determine matching scores between features” teaches the features extracted from the sub-images (corresponds to seconds images) represented by feature vectors).
based on the respective vector representations, associating the DL-feature representations with respective AR targets (Attorre et al., Para. [0005], “extracting, from the candidate region and by a neural network implemented using the one or more computing devices, a feature vector of the candidate region, and determining, for each reference feature vector from a set of reference feature vectors stored in a database, a respective matching score between the reference feature vector and the feature vector of the candidate region, where each reference feature vector in the set of reference feature vectors is extracted from a respective image of a target logo in a set of target logos” teaches associating the respective reference feature vector with the respective image of a target logos (corresponds to AR target)).
Regarding Claim 10,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the comparison of the received one or more DL-feature representation with the plurality of stored DL-feature representations comprises a nearest-neighbor search (Attorre et al., Para. [0024], “To detect logos in a new source image that likely includes an image of the new target logo, candidate regions that likely embody a logo are determined by the agnostic logo detection model, a feature vector is extracted from each candidate region and compared with each reference feature vector in the database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos can be detected by existing models or networks without retraining such models or networks using images of the new target logos” teaches a comparison process to identify potential matching (corresponds to a nearest-neighbor search) between feature vectors).
Regarding Claim 13,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the real-world object is continuously tracked in real-time (Loxam et al., Para. [0072], “When it is ready, the user can point the phone at the picture, and it will come to life. If the user was not told what picture to point the smart phone at, then the user can point the camera around the location and the augmented reality application will automatically detect the trigger items in view” teaches the augmented reality application automatically detecting the trigger items (corresponds to the real-world object) in view, which shows the trigger item is being tracked in real-time).
Regarding Claim 15,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the AR effect is a filter effect (Loxam et al., Para. [0095], “The object recognition engine distributed across the IDOL server set applies a hierarchical set of filters to the transmitted identified points of interest and their associated major within each frame of a video stream to determine what that one or more potential trigger item are within that frame. Since this is a video feed of a series of closely related frames both in time and in approximate location, the pattern of identified major features of potential trigger item within each frame of a video stream helps to narrow down the matching known object stored in the object database” teaches applying a set of filters).  
Regarding Claim 16,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1, further comprising:
The combination, as described in the rejection of claim 1, further teaches authorizing a user of the client device to receive the AR effect associated with the determined matching AR target based on information associated with the user (Loxam et al., Para. [0036], “The augmentation engine 316 is also configured to allow a user to create augmented reality content from stock locations including any combination of 1) off of the local memory of the smart mobile computing device 300, 2) from Internet sources, 3) from an augment information database 360 maintained at the backend server, 4) from a links database 350, or 5) similar source. The augmentation engine 316 then also allows the user to associate that augmented reality content with at least one trigger item from the trigger item engine 314/330” teaches allowing the user to create augmented reality content (corresponds to AR effect) and associate the content with the trigger item (corresponds to AR target). Para. [0044], “The augmentation engine 375 may select the augmented reality information that is most relevant to the user” teaches the authorization being based off information that is most relevant (corresponds to associated) to the user).
Regarding Claim 18,
Attorre et al. teaches one or more computer-readable non-transitory storage media embodying software that is operable when executed to (Attorre et al., Para. [0004], “Various inventive embodiments are described herein, including methods, systems, non-transitory computer-readable storage media storing programs, code, or instructions executable by one or more processors” teaches the embodiment comprising of a non-transitory computer-readable storage media that stores programs, code, or instructions executable by one or more processors)
receive, from a client computing device, one or more deep-learning (DL)-feature representations generated by a first machine learning model, wherein the one or more DL-feature representations are extracted from a region of interest detected by the client computing device within a first image of a real-world environment captured by the client computing device, the region of interest comprising a first depiction of a real-world object, and the one or more DL- feature representations are extracted at the client computing device by (Attorre et al., Para. [0005], “a method includes receiving a source image at one or more computing devices” teaches receiving from a computing device. Para. [0029] and FIG. 2, “A CNN in logo detection model 220 may then extract feature vectors from each candidate region and classify the candidate region based on the feature vectors” teaches a CNN in the logo detection model (corresponds to the machine learning model) that extracts feature vectors (corresponds to deep-learning feature representations) from each candidate region (corresponds to the region of interest detected). Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches determining identification of candidate regions (corresponds to the region of interest) of an image (corresponds to the first image of a real-world environment captured) from the logo detection model when an image is received from the computing device).
accessing the first image (Attorre et al., Para. [0007], “the first reference feature vector extracted from a first image of a first target logo in the set of target logos” teaches accessing the first image).
generating, by the first machine learning model, an initial feature map associated with the first image (Attorre et al., Para. [0029] and FIG. 2, “the input image may be fed to the CNN to generate a convolutional feature map” teaches utilizing a convolutional neural network (corresponds to the first machine learning model) to generate a feature map associated with the input image (corresponds to the first image)).
identifying the region of interest within the initial feature map (Attorre et al., Para. [0029] and FIG. 2, “candidate regions may be identified from the convolutional feature map” teaches identifying a candidate region (corresponds to region of interest) within the convolutional feature map).
wherein the region of interest is associated with at least a first real-world-object type, and wherein the region of interest is associated with a portion of the first image corresponding to the first depiction of the real-world object (Attorre et al., Para. [0029], “using selective search techniques and may be reshaped to a predetermined size using, for example, a region-of-interest (ROI) pooling layer” teaches utilizing a selective search technique to select the region of interest of the image that depicts the real-world object). 
extracting, from the region of interest, the one or more DL-feature representations, wherein each extracted DL-feature representation is an output of a second machine learning model that is trained to detect at least objects of the first real- world-object type (Attorre et al., Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches a second model trained to determine candidate regions and detect logos (corresponds to objects of the first real-world-object type) in a source image. Feature vectors are then extracted from the candidate regions).
generate, based on the patch, one or more local feature descriptors, wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches the logo detection model generating sub-images or identifications of candidate regions (corresponds to the one or more local feature descriptors corresponds to the patch within the region of interest) based on the detected generic logo patches or regions of the received image (corresponds to the first image)).
identify a set of matching DL-feature representations based on a comparison of the received one or more DL-feature representations with a plurality of stored DL-feature representations associated with a plurality of augmented-reality (AR) targets, the comparison resulting in a determination that the set of matching DL-feature representations and the received one or more DL-feature representations are within a threshold region in a vector space (Attorre et al., Para. [0003], “Logo detection or recognition in images and videos can be used in many applications, such as copyright or trademark infringement detection, contextual advertise placement, intelligent traffic control based on vehicle logos, automated computation of brand-related statistics, augmented reality, and the like” teaches the technique being used in many applications such as augmented reality. Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches comparing feature vectors (corresponds to DL-feature representations) with the reference feature vectors stored in the database that is associated to a target logo (corresponds to AR target)  to identify potential matches. Para. [0020], “The target logo associated with the best matching reference feature vector is determined as present in the candidate region in the source image if the best matching score is greater than a threshold value” teaches determining a threshold value (corresponds to the threshold region in a vector space) for a set of matching reference feature vector (corresponds to deep-learning feature representations)).
determine, from a set of matching AR targets associated with the set of matching DL- feature representations, a matching AR target based on a comparison of the received one or more local-feature descriptors with stored local-feature descriptors associated with the set of matching AR targets, wherein the stored local-feature descriptors are extracted from the set of matching AR targets (Attorre et al., Para. [0036], “The features extracted from each of sub-images 442, 444, and 446 may be compared with reference features stored in database 450 by a comparator 485 to determine if there is a match between any reference features stored in databased 450 and features extracted from sub-image 442, 444, or 446” teaches determining matching features by comparing features extracted from sub images (corresponds to local-feature descriptors) with reference features stored in a database).
Attorre et al. does not appear to explicitly teach select an AR effect associated with the determined matching AR target; send, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object.
However, Loxam et al. teaches select an AR effect associated with the determined matching AR target (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target)).
send, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target). FIG.5-6 and Para. [0071], “The process described in FIGS. 5 and 6 may generally require the detect trigger item engine 370 to perform a one-to-many approach by positively matching the feature points of the one real world trigger item with the indexed feature points of the many known candidate trigger items” teaches the real world trigger item (corresponds to the real world object). Para. [0072], “When it is ready, the user can point the phone at the picture, and it will come to life. If the user was not told what picture to point the smart phone at, then the user can point the camera around the location and the augmented reality application will automatically detect the trigger items in view” teaches the augmented reality application automatically detecting the trigger items (corresponds to the real-world object) in view, which shows the trigger item is being tracked in real-time).
Regarding Claim 20,
Attorre et al. a system comprising: one or more processors; and one or more computer-readable non-transitory storage media coupled to one or more of the processors and comprising instructions operable when executed by one or more of the processors to cause the system to (Attorre et al., Para. [0058], “The depicted example of a computing system 800 includes a processor 802” teaches the system comprising of a processor. Para. [0004], “Various inventive embodiments are described herein, including methods, systems, non-transitory computer-readable storage media storing programs, code, or instructions executable by one or more processors, and the like” teaches the embodiment comprising of a non-transitory computer-readable storage media that stores programs, code, or instructions executable by one or more processors). 
receive, from a client computing device, one or more deep-learning (DL)-feature representations generated by a first machine learning model, wherein the one or more DL-feature representations are extracted from a region of interest detected by the client computing device within a first image of a real-world environment captured by the client computing device, the region of interest comprising a first depiction of a real-world object, and the one or more DL- feature representations are extracted at the client computing device by (Attorre et al., Para. [0005], “a method includes receiving a source image at one or more computing devices” teaches receiving from a computing device. Para. [0029] and FIG. 2, “A CNN in logo detection model 220 may then extract feature vectors from each candidate region and classify the candidate region based on the feature vectors” teaches a CNN in the logo detection model (corresponds to the machine learning model) that extracts feature vectors (corresponds to deep-learning feature representations) from each candidate region (corresponds to the region of interest detected). Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches determining identification of candidate regions (corresponds to the region of interest) of an image (corresponds to the first image of a real-world environment captured) from the logo detection model when an image is received from the computing device).
accessing the first image (Attorre et al., Para. [0007], “the first reference feature vector extracted from a first image of a first target logo in the set of target logos” teaches accessing the first image). 
generating, by the first machine learning model, an initial feature map associated with the first image (Attorre et al., Para. [0029] and FIG. 2, “the input image may be fed to the CNN to generate a convolutional feature map” teaches utilizing a convolutional neural network (corresponds to the first machine learning model) to generate a feature map associated with the input image (corresponds to the first image)).
identifying the region of interest within the initial feature map (Attorre et al., Para. [0029] and FIG. 2, “candidate regions may be identified from the convolutional feature map” teaches identifying a candidate region (corresponds to region of interest) within the convolutional feature map).
wherein the region of interest is associated with at least a first real-world-object type, and wherein the region of interest is associated with a portion of the first image corresponding to the first depiction of the real-world object (Attorre et al., Para. [0029], “using selective search techniques and may be reshaped to a predetermined size using, for example, a region-of-interest (ROI) pooling layer” teaches utilizing a selective search technique to select the region of interest of the image that depicts the real-world object).
extracting, from the region of interest, the one or more DL-feature representations, wherein each extracted DL-feature representation is an output of a second machine learning model that is trained to detect at least objects of the first real- world-object type (Attorre et al., Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches a second model trained to determine candidate regions and detect logos (corresponds to objects of the first real-world-object type) in a source image. Feature vectors are then extracted from the candidate regions).
generate, based on the patch, one or more local feature descriptors, wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches the logo detection model generating sub-images or identifications of candidate regions (corresponds to the one or more local feature descriptors corresponds to the patch within the region of interest) based on the detected generic logo patches or regions of the received image (corresponds to the first image)).
identify a set of matching DL-feature representations based on a comparison of the received one or more DL-feature representations with a plurality of stored DL-feature representations associated with a plurality of augmented-reality (AR) targets, the comparison resulting in a determination that the set of matching DL-feature representations and the received one or more DL-feature representations are within a threshold region in a vector space (Attorre et al., Para. [0003], “Logo detection or recognition in images and videos can be used in many applications, such as copyright or trademark infringement detection, contextual advertise placement, intelligent traffic control based on vehicle logos, automated computation of brand-related statistics, augmented reality, and the like” teaches the technique being used in many applications such as augmented reality. Para. [0053], “To detect logos in a new source image that may include an image of the new target logo, candidate regions in the new source image that may embody a logo may be determined by the second model, and a feature vector may be extracted from each candidate region and compared with each reference feature vector in the embedding database (including the reference feature vectors extracted from the images of the new target logo) to find a match. As such, new target logos may be detected by existing models or networks without retraining such models or networks” teaches comparing feature vectors (corresponds to DL-feature representations) with the reference feature vectors stored in the database that is associated to a target logo (corresponds to AR target)  to identify potential matches. Para. [0020], “The target logo associated with the best matching reference feature vector is determined as present in the candidate region in the source image if the best matching score is greater than a threshold value” teaches determining a threshold value (corresponds to the threshold region in a vector space) for a set of matching reference feature vector (corresponds to deep-learning feature representations)).
determine, from a set of matching AR targets associated with the set of matching DL- feature representations, a matching AR target based on a comparison of the received one or more local-feature descriptors with stored local-feature descriptors associated with the set of matching AR targets, wherein the stored local-feature descriptors are extracted from the set of matching AR targets (Attorre et al., Para. [0036], “The features extracted from each of sub-images 442, 444, and 446 may be compared with reference features stored in database 450 by a comparator 485 to determine if there is a match between any reference features stored in databased 450 and features extracted from sub-image 442, 444, or 446” teaches determining matching features by comparing features extracted from sub images (corresponds to local-feature descriptors) with reference features stored in a database).
Attorre et al. does not appear to explicitly teach select an AR effect associated with the determined matching AR target; send, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object.
However, Loxam et al. teaches select an AR effect associated with the determined matching AR target (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target)).
send, to the client computing device, the AR effect associated with the determined matching AR target, wherein the AR effect is rendered by the client computing device so that the AR effect is anchored to the real-world object (Loxam et al., FIG. 3A and Para. [0042], “the augmentation engine 375 can start transmitting to the mobile computing device 300 the potential large augmented reality content files such as video files, and advertisements while the object recognition engine 320 determines what the object is. Thus, at approximately at the same time as the object recognition engine 320 is hierarchically filtering or narrowing down the possible known matching images/object to the transmitted features, the augmentation engine 375 can be preparing and selecting augmented reality content to be transmitted back to the video processing module on the mobile computing device 300 for display. Note, similarly, the augmentation engine 316 can be preparing and selecting augmented reality content to be overlaid onto the video frames while the trigger item identification is performing its operations” teaches an augmentation engine that prepares and selects an augmentation reality content to be overlaid (corresponds to the AR effect) onto the video frame based on the determined matching images or object (corresponds to the AR target). FIG.5-6 and Para. [0071], “The process described in FIGS. 5 and 6 may generally require the detect trigger item engine 370 to perform a one-to-many approach by positively matching the feature points of the one real world trigger item with the indexed feature points of the many known candidate trigger items” teaches the real world trigger item (corresponds to the real world object). Para. [0072], “When it is ready, the user can point the phone at the picture, and it will come to life. If the user was not told what picture to point the smart phone at, then the user can point the camera around the location and the augmented reality application will automatically detect the trigger items in view” teaches the augmented reality application automatically detecting the trigger items (corresponds to the real-world object) in view, which shows the trigger item is being tracked in real-time).
Regarding Claim 21,
The Attorre et al. in view of Loxam et al. combination of claim 18 teaches the media of Claim 18,
The combination, as described in the rejection of claim 18, further teaches wherein the software is further operable when executed to extract the one or more DL-feature representations at the client computing device by (Attorre et al., Para. [0005], “a method includes receiving a source image at one or more computing devices” teaches receiving from a computing device. Para. [0068], “Suitable computing devices include multi-purpose microprocessor-based computer systems accessing stored software that programs or configures the computing system from a general purpose computing apparatus to a specialized computing apparatus implementing one or more embodiments of the present subject matter” teaches the software is further operable when executed. Para. [0029] and FIG. 2, “A CNN in logo detection model 220 may then extract feature vectors from each candidate region and classify the candidate region based on the feature vectors” teaches extracting feature vectors (corresponds to deep-learning feature representations) from each candidate region).
extracting, from a portion of the first image associated with the region of the interest, one or more local-feature descriptors associated with one or more detected points of interest, wherein each local-feature descriptor is generated based on information associated with a spatially bounded patch within the first image, the spatially bounded patch comprising a respective detected point of interest (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches determining patches/regions, sub-images, and identifications (corresponds to local-feature descriptors) from the identified candidate region (corresponds to the region of interest) when an image is received (corresponds to the first image)).
Regarding Claim 22,
The Attorre et al. in view of Loxam et al. combination of claim 18 teaches the media of Claim 18, wherein detecting the region of interest comprises
The combination, as described in the rejection of claim 18, further teaches calculating, for one or more portions of the first image, a confidence score based on a third machine learning model (Attorre et al., Para. [0046], “the one or more processing devices may implement one or more neural networks for one or more machine learning-based models” teaches the embodiment consisting of multiple machine learning models. Para. [0005], “extracting, from the candidate region and by a neural network implemented using the one or more computing devices, a feature vector of the candidate region, and determining, for each reference feature vector from a set of reference feature vectors stored in a database, a respective matching score” teaches determining a matching score (corresponds to confidence score) based on the neural network implemented).
selecting the region of interest, one or more of the portions of the first image having a confidence score greater than a threshold confidence score (Attorre et al., Para. [0020], “The target logo associated with the best matching reference feature vector is determined as present in the candidate region in the source image if the best matching score is greater than a threshold value” teaches selecting a candidate region (corresponds to region of interest) in the source image (corresponds to the first image) if the best matching score (corresponds to confidence score) is greater than a threshold value).
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. in view of Loxam et al. and in further view of Rao et al. (“A Mobile Outdoor Augmented Reality Method Combining Deep Learning Object Detection and Spatial Relationships for Geovisualization”)
Regarding Claim 5,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
The combination, as described in the rejection of claim 1, further teaches wherein the second machine learning model is a convolutional neural network (Attorre et al., Para. [0049], “the one or more processing devices may implement a second model that is trained to detect each candidate region in an input image that is likely to embody a logo. The second model may be, for example, a convolutional neural network, such as a Fast R-CNN or any other variation of a R-CNN network” teaches the second model being a convolutional neural network).
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein each extracted DL-feature representation is an output of an average pooling layer of the convolutional neural network
However, Rao et al. teaches wherein each extracted DL- feature representation is an output of an average pooling layer of the convolutional neural network (Rao et al., Figure 4, “This architecture follows a design similar to that of the original SSD. The main differences are that it takes a 224 × 224 pixel image as input and then uses a truncated SqueezeNet (rather than VGG-16) and a series of additional layers (at lower depths than the original) to extract features from the image. The features it uses for detection are selected from 5 layers: fire9 (the last fire module in the SqueezeNet), Ex1_2, Ex2_2, Ex3_2 (three convolutional layers) and GAP (a global average pooling layer)” teaches the output of the GAP (corresponding to the average pooling layer) of the convolutional neural network being extracted features (corresponds to DL-feature representation) from the image).
Attorre et al. in view of Loxam et al. in view of Rao et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image detection” and “convolutional neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. and Loxam et al. with Rao et al., with motivation wherein each extracted DL-feature representation is an output of an average pooling layer of the convolutional neural network. “To significantly reduce the computational cost of the proposed lightweight SSD approach, we use a truncated SqueezeNet architecture (with conv10 and the softmax classifier removed) as the base network and append several additional feature layers (at lower depths than the original) with decaying spatial resolution” (Rao et al., Section 3.1). The proposed teaching is beneficial in that it helps significantly reduce the computational cost of the proposed lightweight Single Shot Detector approach.
Claims 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. in view of Loxam et al. and in further view of Sladojevic et al. (“Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification”)
Regarding Claim 8,
The Attorre et al. in view of Loxam et al. combination of claim 6 teaches the method of Claim 6, further comprising:
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein one or more of the plurality of second images are synthetically generated using a data augmentation process that automatically varies one or more conditions in the first image to generate one or more second images
 However, Sladojevic et al., teaches wherein one or more of the plurality of second images are synthetically generated using a data augmentation process that automatically varies one or more conditions in the first image to generate one or more second images (Sladojevic et al., Section 3.3 and Figure 2, “Transformations applied in augmentation process are illustrated in Figure 2, where the first row represents resulting images obtained by applying affine transformation on the single image; the second row represents images obtained from perspective transformation against the input image and the last row visualizes the simple rotation of the input image. The process of augmentation was chosen to fit the needs; the leaves in a natural environment could vary in visual perspective” teaches applying augmentation process that generates images based on different transformations).
Attorre et al. in view of Loxam et al. in view of Sladojevic et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image recognition” and “convolutional neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. and Loxam et al. with Sladojevic et al., with motivation wherein one or more of the plurality of second images are synthetically generated using a data augmentation process that automatically varies one or more conditions in the first image to generate one or more second images. “The main purpose of applying augmentation is to increase the dataset and introduce slight distortion to the images which helps in reducing overfitting during the training stage” (Sladojevic et al., Section 3.3). The proposed teaching is beneficial in that it helps reduce overfitting during the training stage.
Regarding Claim 9,
The Attorre et al. in view of Loxam et al. in view of Sladojevic et al. combination of claim 8 teaches the method of Claim 8,
The combination, as described in the rejection of claim 8, further teaches wherein the one or more conditions comprise one or more of: perspectives, orientations, sizes, locations, and lighting conditions (Sladojevic et al. Section 3.3, “The image augmentation contained one of several transformation techniques including affine transformation, perspective transformation, and simple image rotations… Affine transformations were applied to express translations and rotations (linear transformations and vector addition, resp.) where all parallel lines in the original image are still parallel in the output image. To find a transformation matrix, three points from the original image were needed as well as their corresponding locations in the output image. For perspective transformation, a transformation matrix was required. Straight lines would remain straight even after the transformation. For the augmentation process, simple image rotations were applied, as well as rotations on the different axis by various degrees” teaches the one or more condition, one of which comprises of perspective transformation).
Claims 11-12 are rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. in view of Loxam et al. and in further view of Ribo et al. (“Hybrid tracking for outdoor augmented reality application”)
Regarding Claim 11,
The Attorre et al. in view of Loxam et al. combination of claim 3 teaches the method of Claim 3,
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein the one or more detected points of interest are corners detected within the first image
However, Ribo et al., teaches wherein the one or more detected points of interest are corners detected within the first image (Ribo et al., Section 7, “we used a complete georeferenced 3D model of a city section, shown in Figure 10a, to derive the most significant corners” teaches detecting significant corners in the 3D model (corresponds to first image). Figure 10a, “3D model with points of interest, roof lines, and camera positions (22 calibrated reference images)” teaches the detected corners being points of interest).
Attorre et al. in view of Loxam et al. in view of Ribo et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image detection” and “convolutional neural network”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. and Loxam et al. with Ribo et al., with motivation wherein the one or more detected points of interest are comers detected within the first image. “Spatial subpixel analysis aims to estimate model parameters by analyzing the gray levels of the involved pixels within a small neighborhood. Our approach extends this work from edges to corners. Because corners are intersections of two or more edges that border different areas, we can use this approach to improve corner localization accuracy” (Ribo et al., Section 3). The proposed teaching is beneficial in that it helps improve corner localization accuracy.
Regarding Claim 12,
The Attorre et al. in view of Loxam et al. combination of claim 3 teaches the method of Claim 3,
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein one or more of the detected points of interest are associated with the real-world object within the first image
However, Ribo et al., teaches wherein one or more of the detected points of interest are associated with the real-world object within the first image (Ribo et al., Figure 10, “Georeferenced 3D model. (a) 3D model with points of interest, roof lines, and camera positions (22 calibrated reference images). (b) Some of the images used to compute the 3D model. The model was provided by the VRVis Research Center for Virtual Reality and Visualization” teaches a 3D model of a city section (corresponds to real-world object) with detected point of interests).
Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. in view of Loxam et al. and in further view of Bae et al. (“Fast and scalable structure-from-motion based localization for high-precision mobile augmented reality systems”)
Regarding Claim 14,
The Attorre et al. in view of Loxam et al. combination of claim 1 teaches the method of Claim 1,
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein the AR effect is configured to scale itself based on a location and orientation of the client computing device 
However, Bae et al. teaches wherein the AR effect is configured to scale itself based on a location and orientation of the client computing device (Bae et al., Para. 15, “Once the 3D physical model is available, a user can take a photo with a mobile device at a random location. HD4AR uses a new image-based localization approach, which takes advantage of a pre-constructed 3D point cloud of target scene to identify a mobile device’s relative location and orientation. The localization process compares the new photo to the generated 3D physical model and estimates the extrinsic camera parameters to find the relative position of the user’s camera. In addition, the HD4AR uses the client-server architecture to further increase the localization speed. The smartphone as the client uploads new photographs to the server for localization and the major image processing load is located on the server. The localization method using a direct 2D-to-3D matching algorithm takes at most few seconds to localize a photograph. After recovering a complete pose of the user’s camera, the server can decide what cyber-information should appear in the user’s photograph and send the cyber object and their associated information to the client. The client app will then draw cyber objects on top of the photograph” teaches determining what cyber information/object (corresponds to scaling AR effect) will be drawn on a photograph based on processing by HD4AR, which takes into a mobile device (computing device)’s location and orientation into consideration).
Attorre et al. in view of Loxam et al. in view of Bae et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image recognition” and “augmented reality”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. and Loxam et al. with Bae et al., with motivation wherein the AR effect is configured to scale itself based on a location and orientation of the client computing device. "The approach supports near real-time localization and information association regardless of size of physical objects, users location, and number of cyber-physical information items" (Bae et al., Conclusion). The proposed teaching is beneficial in that it helps support near real-time localization and information association.
Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Attorre et al. in view of Loxam et al. and in further view of Sanches et al. (“Aspects of User Profiles That Can Improve Mobile Augmented Reality Usage”)
Regarding Claim 17,
The Attorre et al. in view of Loxam et al. combination of claim 16 teaches the method of Claim 16,
Attorre et al. in view of Loxam et al. does not appear to explicitly teach wherein the information associated with the user comprises user affinity information, wherein the user affinity information comprises an affinity coefficient between the user and the AR effect.
However, Sanches et al., teaches wherein the information associated with the user comprises user affinity information, wherein the user affinity information comprises an affinity coefficient between the user and the AR effect (Sanches et al., Conclusion, “The results showed that the age of the user may be related to their performance in the application and this relation occurs due to the greater interest of certain age groups by the use of this type of applications. The affinity factor with games, in this case, may be implicit in the age factor. Factors related to aging were not relevant for performance reduction when the user touches the device screen to interact with an AR application. Young users, in general, performed the task faster than older users. However, the age factor has a higher correlation with the performance of the users when considering only male users” teaches how age factors into how users interact and perform on augmented reality application, based on their interest. Conclusion, “In applications whose target audience are young males, tasks may require more effort, since these users tend to perform well. On the other hand, if the target audience of the application is older users, the complexity of the task to be performed in the AR environment must be reduced so that interest in the application is maintained” teaches the affinity factor of age helping in predicting the probability that a user will perform a particular action based on the user's interest in the action and reducing the AR environment (corresponds to AR effect) based on this factor (corresponds to affinity coefficient)).
Attorre et al. in view of Loxam et al. in view of Sanches et al. are analogous art because they are from the same field of endeavor and are from the same problem solving area. Namely, they pertain to the field of “image recognition” and “augmented reality”. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to modify the invention of Attorre et al. and Loxam et al. with Sanches et al., with motivation wherein the information associated with the user comprises user affinity information, wherein the user affinity information comprises an affinity coefficient between the user and the AR effect. “The results of this research may help developers to use AR technology in their applications” (Sanches et al., Conclusion). The proposed teaching is beneficial in that it helps in future use for developer utilizing augmented reality technology in their applications.

	Response to Arguments
Applicant's arguments filed 09/16/2022 with respect to the 35 U.S.C. 103 rejection to claims 1, 3-18, and 20-22 have been fully considered but they are not persuasive. Applicant asserts that “As an example, the proposed Attorre-Loxam combination fails to disclose, teach, or suggest generating, based on the patch, one or more local feature descriptors, wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch, as independent Claim 1 recites. As discussed above, Attorre merely discloses "receiv[ing] an image 420, which may include images of one or more logos, and identify[ing] candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434." Id. The Examiner asserts these "identifications (e.g., coordinates)" correspond to local feature descriptors. Office Action at 32. Applicant respectfully disagrees. Attore merely discloses that these "identifications" are generated via the logo detection model (Attore at [0034]), which uses images (sequentially or in parallel) as inputs. Attore at [0028]. Attore further discloses merely that feature vectors may be extracted from candidate regions, and that candidate regions may be used to classify the image. Attore at [0029]. Loxam does not make up for the deficiencies of Attorre, either alone or in combination, and the Examiner does not assert otherwise. In contrast, Claim 1 as amended recites generating, based on the patch, one or more local feature descriptors, wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch.” (Remarks, pg. 11-12).
Examiner’s Response:
The Examiner respectfully disagrees. Attorre et al. teaches “generating, based on the patch, one or more local feature descriptors, wherein each of the one or more local feature descriptors corresponds to the patch within the region of interest within the first image and comprises information that encodes one or more visual features present in the patch” (Attorre et al., Para. [0034] and FIG. 4, “logo detection model 410 may detect generic logo patches or regions (i.e., regions that are likely to embody a logo). For example, in the example shown in FIG. 4, logo detection model 410 may receive an image 420, which may include images of one or more logos, and identify candidate regions 430, 432, and 434 that likely embody a logo from image 420. Outputs 440 of logo detection model 410 may thus include sub-images 442, 444, and 446 that may be cropped out from image 420 or may include identifications (e.g., coordinates) of candidate regions 430, 432, and 434” teaches the logo detection model generating sub-images or identifications of candidate regions (corresponds to the one or more local feature descriptors corresponds to the patch within the region of interest) based on the detected generic logo patches or regions of the received image (corresponds to the first image). The Specification, Pg. 2 and Para. 4, “Local- feature descriptors may also be extracted from each of the likely regions of interest in the image. These local-feature descriptors may correspond to points of interest with a likely region of interest, and may be generated based on spatially bounded patches within the likely region of interest” discloses that the local-feature descriptors corresponds to points of interest with a likely region of interest, based on the patches within the region of interest. In Attorre et al., the output of the logo detection model consist of generating these sub-images or identifications, which can be compared to the points of interest with a likely region of interest, of the detected generic logo patches or region, which would corresponds to spatially bounded patches within the likely region of interest).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Henry T Nguyen whose telephone number is (571)272-8860. The examiner can normally be reached Monday-Friday 8:00am-5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on (571) 272-7796. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000

/HENRY TRONG NGUYEN/
Examiner, Art Unit 2125
/BRIAN M SMITH/Primary Examiner, Art Unit 2122