Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
This is in response to applicant’s amendment/response filed on 06/27/2022, which has
been entered and made of record.  Claim 1-2, 6-9, 11-16, 20 are amended. Claims 1-20 are pending in the application.
		
Response to Arguments
Applicant arguments regarding claim rejections under 103 are considered, but are not persuasive. 
Applicant argues:

    PNG
    media_image1.png
    452
    792
    media_image1.png
    Greyscale


    PNG
    media_image2.png
    92
    789
    media_image2.png
    Greyscale


Examiner disagrees: Marino ([0077]) teaches training a neural network. The training process includes receiving a set of digital image contents; finding out a bounding box by recognizing coordinates of a polygon. The coordinates represent points on the polygon bounding box. However, in this embodiment, Marino does not explicitly teach the bounding box predicted by the trained neural network includes dimension data about the bounding box. On the other hand, Marino [0076] teaches the predicted bounding box corresponds to the host region. [0081] teaches representing the host region using data structure including dimension (e.g., a height dimension, a width dimension). It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the two parts of teachings from Marino to train the neural network to predict the bounding box using points coordinates and the dimensional data, so the predicted bounding box information can be directly used to virtual content overlay without the need to further calculate the dimension data. The benefit would be to improve system performance.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Marino et al. (US 2017/0278289 A1).
Regarding claim 1, Marino teaches:
A computer implemented method for detecting space suitable for overlaying media content onto an image, ([0064], “the content integration system can retrieve an advertisement, retrieve one or more frames from a video, identify a region within one or more of those frames for integrating the advertisement, and integrate the advertisement onto the identified region.”) the method comprising: 
receiving by a media content insertion system a candidate image for a media content overlay; ([0063], “The content integration system is configured to retrieve a source digital content, retrieve a target digital content, identify a region within the target digital content for integrating the source digital content, and integrate the source digital content onto the identified region of the target digital content.”)
inputting the candidate image into a neural network, wherein the neural network has been trained with training data comprising a plurality of images and, for each image of the plurality of images, one or more corresponding bounding boxes; ([0076], “In some embodiments, the host region identification module is configured to identify host regions through the use of a machine learning system. The host region identification module can include, for example, one or more machine learning-based classifiers, such as a convolutional neural network, support vector machine, or random forest classifier, that are configured to determine whether a texture of a region in a target digital content is sufficiently bland and/or uniform so that the region could be classified as a host region.” [0077], “In some embodiments, the machine learning system can be trained using a training set of samples of digital content reflecting textures which are deemed as suitable for hosting a source digital content. Such samples of digital content can include samples of digital content reflecting brick walls, painted walls, and/or sky textures. This dataset can be collected by manually collecting samples of digital content that feature these textures and then manually demarcating the location of the texture in the content (either by cropping the digital content to those regions or capturing the location—e.g., to yield the best result, with the coordinates of a polygon bounding the location—as a feature).”)
the neural network trained to receive as input a candidate image and predict one or more coordinates representing a point associated with a shape and one or more dimensions of the shape, wherein the shape represents a bounding box.( [0077] teaches training a neural network. The training process includes receiving a set of digital image contents; finding out a bounding box by recognizing coordinates of a polygon. The coordinates represents points on the polygon bounding box. The bounding box is used as host region. “In some embodiments, the machine learning system can be trained using a training set of samples of digital content reflecting textures which are deemed as suitable for hosting a source digital content. Such samples of digital content can include samples of digital content reflecting brick walls, painted walls, and/or sky textures. This dataset can be collected by manually collecting samples of digital content that feature these textures and then manually demarcating the location of the texture in the content (either by cropping the digital content to those regions or capturing the location—e.g., to yield the best result, with the coordinates of a polygon bounding the location—as a feature).” However, in this embodiment, Marino does not explicitly teach the bounding box predicted by the trained neural network includes dimension data about the bounding box. On the other hand, Marino [0076] teaches the predicted bounding box corresponds to the host region. [0081] teaches representing the host region using data structure including dimension (e.g., a height dimension, a width dimension). It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the two parts of teachings from Marino to train the neural network to predict the bounding box using a point coordinate and the dimensional data, so the predicted bounding box information can be directly used to virtual content overlay. The benefit would be improve system performance.)
receiving, from the neural network, coordinates and one or more dimensions representing one or more bounding boxes for inserting media content into the candidate image; ([0077], “In some embodiments, the machine learning system can be trained using a training set of samples of digital content reflecting textures which are deemed as suitable for hosting a source digital content. Such samples of digital content can include samples of digital content reflecting brick walls, painted walls, and/or sky textures. This dataset can be collected by manually collecting samples of digital content that feature these textures and then manually demarcating the location of the texture in the content (either by cropping the digital content to those regions or capturing the location—e.g., to yield the best result, with the coordinates of a polygon bounding the location—as a feature).”)
transmitting a request for a media content item to be displayed in a bounding box of the one or more bounding boxes, the request comprising the one or more dimensions of the one or more bounding boxes; ([0140]” directs the client-side web browser or other client-side digital content viewing application to send, to the source digital content selection module 118, a request for selection of source digital content that includes the host region defining data, transformation objects, host region object, and/or data about the particular impression or about the particular viewer of the target digital content, including but not limited to time and location of impression or individual viewer ID, demographics, or prior content-viewing habits (“impression data”), and to receive the resulting selection.”.[0412], “In some embodiments, at any point during or after host region identification, a procedure, function, process, application, computer, or device that is sitting in the network and is dedicated to the selection of source digital content to be placed upon the host region (source digital content selection module 118) receives the host region defining data, transformation objects, host region object, the target digital content, metadata about the target digital content, and/or impression data regarding one or more requested or anticipated views of the target digital content.” [0274] teaches host region defining data is coordinates data of bounding boxes: “In some embodiments, a convolutional neural network model is trained on frames from examples of target digital content whose labels are positive and/or negative examples of host region defining data (e.g., the coordinates the corners of the bounding box of a host region in that particular frame, or a list of the pixels it includes)”)
 receiving the media content item in response to the request; ([0140]” directs the client-side web browser or other client-side digital content viewing application to send, to the source digital content selection module 118, a request for selection of source digital content that includes the host region defining data, transformation objects, host region object, and/or data about the particular impression or about the particular viewer of the target digital content, including but not limited to time and location of impression or individual viewer ID, demographics, or prior content-viewing habits (“impression data”), and to receive the resulting selection.”) and 
causing a display of the candidate image and the media content item overlaid on top of the candidate image within the bounding box.([0412], “In some embodiments, at any point during or after host region identification, a procedure, function, process, application, computer, or device that is sitting in the network and is dedicated to the selection of source digital content to be placed upon the host region (source digital content selection module 118) receives the host region defining data, transformation objects, host region object, the target digital content, metadata about the target digital content, and/or impression data regarding one or more requested or anticipated views of the target digital content.” ([0064], “the content integration system can retrieve an advertisement, retrieve one or more frames from a video, identify a region within one or more of those frames for integrating the advertisement, and integrate the advertisement onto the identified region.”)
Applicant may challenge that the above citation of Marino are from different embodiments. However, it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the different embodiments to generate a method for detecting space suitable for overlaying media content onto an image to achieve an aesthetically-pleasing, unobtrusive, engaging image integration effects.

Regarding claim 2, Marino teaches:
The method of claim 1, further comprising: receiving the plurality of images; ([0138], “dataset includes a set of RGB images”)receiving for each of the plurality of images one or more vectors, each vector comprising a set of coordinates and a set of dimensions, wherein each vector of the one or more vectors represents a particular bounding box; ([0233], “Step 1402 illustrates starting at the first row of the frame, going in the vertical direction and for each element (e.g., pixel), counting the number of elements in the horizontal direction that satisfy the threshold and inserting this number into a histogram for the row. Step 1404 illustrates finding, once the histogram for the row is complete, the minimum value in the row, storing it for future use and then subdividing the row into any strings of non-zero histogram bars (sub-histograms), putting each histogram into an array representing the heights of the bars. Step 1406 illustrates finding the largest rectangle within each sub histogram by: (1) Initializing an empty integer stack and a variable for the maximum rectangle (“max_rectangle”) with area of 0; (2) Creating a pointer (“i”) to the first position of the bar height array and initializing it at 0; (3) Pushing the first array index to the stack and increment i (i=1); (4) While i<length of the histogram: (a) If the current height (histogram[i]) is bigger than the height of the bar that the index on top of the stack points to, pushing the current index to the stack and incrementing i; (b) If the current height (histogram[i]) is not bigger than the height of the bar that the index on top of the stack points to, popping the top item of the stack and: (i) If the stack is empty, calculating the area of the current rectangle by multiplying the height of the index you just popped and the index itself (the index is the width) and, if the area is bigger than the max_rectangle, replacing max_rectangle with that; (ii) If the stack is not empty, the width of the current rectangle can be equal to the current index popped—top index in the stack and the height will still be the height of the current index that was just popped and, if the area is bigger than the max_rectangle, replacing max_rectangle with that; (c) If, by the end of the for loop, the stack is not empty, keep popping the elements and, for each element, perform step 4-b; (d) At the end, your max_rectangle will contain the height, the width and the current index of the max rectangle in that sub histogram. To transform that into coordinates: (i) The x coordinate can be starting index (split point) of the sub histogram+current index (i) of the max_rectangle; (ii) The y coordinate can be the current row of the sub-histogram we are using; (iii) The w coordinate can be the width of the max_rectangle; (iv) The h coordinate can be the height of the max_rectangle.”) training the neural network using each of the plurality of images and corresponding one or more vectors.([0153]…” Combining the output of the previous layer with the output of the coarse grained network by combining the channels of both outputs, resulting in a feature vector of dimensions with 160 channels. [0154] d) Inputting the combined feature vector into the second (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64. [0155] e) Inputting the previous input into the third (hidden) layer of the neural net, a 5×5 convolutional layer with a ReLU activation function and a learning rate of 0.01. The number of channels in the output is 64.”)

Regarding claim 3, Marino teaches:
The method of claim 1, wherein receiving the coordinates and the one or more dimensions representing the one or more bounding boxes for inserting the media content into the candidate image comprises receiving for each bounding box: 
a first coordinate representing a first offset along a horizontal axis of the candidate image and a second coordinate representing a second offset along a vertical axis of the candidate image, a first dimension extending from the first coordinate along the horizontal axis, a second dimension extending from the second coordinate along the vertical axis, ([0253], “In step 1506 the host region identification module 110 is configured to convolve the frame from the scene with a Prewitt, Sobel, combined Prewitt and Sobel, or other kernel in the horizontal (G.sub.x) and vertical directions (G.sub.y) and depositing the resulting frames onto frame buffers or memory areas.” Also [0233], FIG. 14) and a probability that a corresponding bounding box is located on the candidate image in an area suitable for inserting the media content into the candidate image.([0254], “In step 1514, in the resulting map, finding the rectangles that represent likely host regions by setting a threshold (for a satisfactory host region score) somewhere between 0 and 1, and then relying on the algorithm in FIG. 14.,” FIG. 14)

Regarding claim 4, Marino teaches:
The method of claim 3, further comprising: determining for each bounding box whether a corresponding probability meets a threshold probability; (FIG. 14, [0233], “Step 1412 illustrates sorting the list of rectangles by size. Step 1414 illustrates removing any rectangles that are below any minimum dimension thresholds for host regions.”) in response to determining that a probability for a particular bounding box does not meet the threshold probability, removing the particular bounding box from the request. (FIG. 14, [0233], “Step 1412 illustrates sorting the list of rectangles by size. Step 1414 illustrates removing any rectangles that are below any minimum dimension thresholds for host regions. Step 1416 illustrates eliminating overlapping rectangles by checking, for each rectangle in the list, if any of the corner coordinates lies between any of the corner coordinates of another rectangle and, if so, removing the smaller of the two rectangles from consideration.”)

Regarding claim 5, Marino teaches:
The method of claim 1, further comprising: in response to the request, receiving a plurality of media content items corresponding to the one or more bounding boxes; ([0412], “n some embodiments, at any point during or after host region identification, a procedure, function, process, application, computer, or device that is sitting in the network and is dedicated to the selection of source digital content to be placed upon the host region (source digital content selection module 118) receives the host region defining data, transformation objects, host region object, the target digital content, metadata about the target digital content, and/or impression data regarding one or more requested or anticipated views of the target digital content.”)identifying from the plurality of media content items, a particular media content item corresponding to a bounding box with the highest probability; and selecting the particular media content item as the media content item.([0424], “In some embodiments, the source digital content selection module 118 compares the target digital content's metadata with the source digital content's metadata with the existence of identical, similar, or compatible metadata used to accrue a score that is used to select or rank the source digital content that represents the best pairing with the host region.”)

Regarding claim 6, Marino teaches:
The method of claim 1, further comprising: determining that the candidate image is a video frame associated with a video content item; ([0098], “In some embodiments, the host region identification module 110 can track the host region across the duration of the source digital content (e.g., video frames across a video stream), create one or more transformation objects associated with the host region,”) retrieving a set of video frames of the video content item, wherein the set of video frames comprises video frames that are played subsequently to the candidate image; ([0117], “FIGS. 3A-3U illustrate exemplary source digital content, target digital content, and integrated digital content integrated using the content integration system in accordance with some embodiments. FIG. 3A illustrates a first frame of target digital content including a video. FIG. 3B illustrates a second frame of the target digital content, this one occurring sometime after the first frame illustrated in FIG. 3A in the sequence of frames comprising the video. FIG. 3C illustrates two host regions, demarcated by rectilinear bounding boxes, as identified in the first frame. FIG. 3D illustrates two host regions, defined and demarcated by rectilinear bounding boxes, as identified in the second frame.”) inputting each video frame of the set of video frames into the neural network; ([0077], “In some embodiments, the machine learning system can be trained using a training set of samples of digital content reflecting textures which are deemed as suitable for hosting a source digital content. Such samples of digital content can include samples of digital content reflecting brick walls, painted walls, and/or sky textures. This dataset can be collected by manually collecting samples of digital content that feature these textures and then manually demarcating the location of the texture in the content (either by cropping the digital content to those regions or capturing the location—e.g., to yield the best result, with the coordinates of a polygon bounding the location—as a feature).”)receiving, from the neural network for each video frame in the set of video frames, corresponding coordinates and corresponding one or more dimensions representing one or more bounding boxes; ([0117], “FIGS. 3A-3U illustrate exemplary source digital content, target digital content, and integrated digital content integrated using the content integration system in accordance with some embodiments. FIG. 3A illustrates a first frame of target digital content including a video. FIG. 3B illustrates a second frame of the target digital content, this one occurring sometime after the first frame illustrated in FIG. 3A in the sequence of frames comprising the video. FIG. 3C illustrates two host regions, demarcated by rectilinear bounding boxes, as identified in the first frame. FIG. 3D illustrates two host regions, defined and demarcated by rectilinear bounding boxes, as identified in the second frame.” [0077] teaches using the neural network to output host region bounding box.) identifying in each video frame of the set of video frames a bounding box matching a bounding box in each other video frame within the set of video frames; ([0117]: FIG. 3C and 3D.“ FIG. 3C illustrates two host regions, demarcated by rectilinear bounding boxes, as identified in the first frame. FIG. 3D illustrates two host regions, defined and demarcated by rectilinear bounding boxes, as identified in the second frame.”)and including the bounding box matching the bounding box of each other video frame in the request. ([0117]: FIG. 3C and 3D, [0140]” directs the client-side web browser or other client-side digital content viewing application to send, to the source digital content selection module 118, a request for selection of source digital content that includes the host region defining data, transformation objects, host region object, and/or data about the particular impression or about the particular viewer of the target digital content, including but not limited to time and location of impression or individual viewer ID, demographics, or prior content-viewing habits (“impression data”), and to receive the resulting selection.”)

Regarding claim 7, Marino teaches:
The method of claim 6, further comprising causing a display of the set of video frames and the media content item overlaid on top of each of a plurality of subsequent video frames within the bounding box.(FIG. 3H-FIG.3M)

Regarding claim 8, Marino teaches:
A system for detecting space suitable for overlaying media content onto an image, the system comprising: memory with instructions encoded thereon; and one or more processors that, when executing the instructions, are caused to perform operations ([0005], “Some embodiments of the disclosed subject matter include a non-transitory computer readable medium having executable instructions. The executable instructions are operable to cause a processor to receive source digital content, receive target digital content and host region defining data associated with the target digital content, wherein the host region defining data specifies a location of a host region within the target digital content for integrating source digital content into the target digital content, and integrate the source digital content into the host region within the target digital content identified by the host region defining data.”) comprising: the rest of claim 8 recites similar limitations of claim 1, thus are rejected using the same rationale.

Claims 9-14 recite similar limitations of claim2 -7 respectively, thus are rejected using the same rejection rationale respectively.

Regarding claim 15, Marino teaches:
A non-transitory computer readable medium storing instructions, the instructions when executed by one or more processors cause the one or more processors ([0005], “Some embodiments of the disclosed subject matter include a non-transitory computer readable medium having executable instructions. The executable instructions are operable to cause a processor to receive source digital content, receive target digital content and host region defining data associated with the target digital content, wherein the host region defining data specifies a location of a host region within the target digital content for integrating source digital content into the target digital content, and integrate the source digital content into the host region within the target digital content identified by the host region defining data.”) to: the rest of claim 15 recites similar limitations of claim 1, thus are rejected using the same rationale.

Claims 16-20 recite similar limitations of claim 2 -6 respectively, thus are rejected using the same rejection rationale respectively.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2611