DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claims 1-12 are pending.


Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Russia on 8/22/2015. It is noted, however, that applicant has not filed a certified copy of the RU 2018130482 application as required by 37 CFR 1.55.


Claim Rejections - 35 USC § 103
The following is a quotation of pre-AIA  35 U.S.C. 103(a) which forms the basis for all obviousness rejections set forth in this Office action:
(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth in section 102 of this title, if the differences between the subject matter sought to be patented and the prior art are such that the subject matter as a whole would have been obvious at the time the invention was made to a person having ordinary skill in the art to which said subject matter pertains.  Patentability shall not be negatived by the manner in which the invention was made.

Examiner’s notes: the corresponding text descriptions of any figure(s)  and table(s) cited from the prior art are incorporated herein for further details associated with the examiner’s review comments on the corresponding claims below.

Claim(s) 1-4 and 11-12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Levanon (US2018/0284091) in view of Chen et al (Less Is More-Picking Informative Frames, 2018) and further in view of Nomura et al (US2020/0184265).

Regarding claims 1 and 11-12, Levanon teaches a method, comprising:
receiving, by a processor, a continuous video stream from at least one camera position over a table configured to receive prepared pizzas;
collecting, by the processor, a plurality of pizza containing video frames of a particular pizza from the video stream;
(Levanon, Fig. 1, food product 20 (“prepare the food product, for example, the type of food (e.g., pizza, sushi, hamburger, etc.)”, [0033]) sits on top of thermometer 107 which may be on a table; imager 105 takes image or images (videos) of the pizza (“The holder may be designed to hold user device 10 such that the imager of user device 10 may be directed towards the surface of food product 20, thus allow the imager to capture images of food product 20”, [0032]); e.g., Chen, Figure 1, video)
	Levanon does not expressly disclose but Chen teaches:
applying, by the processor, a first CNN to select a set of best pizza containing video frames of the particular pizza from the plurality of pizza containing video frames;
(Chen, Figures 1 and 3; Use CNN to select video frames of a target object (Figure 1(b), information frames) from a series redundant video frames (Figure 1(a)); “CNN and RNN”, sec. 2.1, p2:c2 (=> “first CNN”); the target object may be the pizza 20 of Levanon; the PickNet with an Encoder-Decoder architecture is pretrained in a supervised fashion for frame selection based on truth previous information; in supervised training stage, “…the schedule sampling [ref 4: Bengio et al, “Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks”] procedure is used, which feeds back the model’s own predictions and slowly increases the feedback probability during training”, sec. 3.3, p5)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Chen into the system or method of Levanon in order to use a CNN for selecting relevant video frames of a target object for process automation and for better consistence. The combination of Levanon and Chen also teaches other enhanced capabilities.
The combination of Levanon and Chen further teaches:
applying, by the processor, a first CNN to identify a best pizza containing image of the particular pizza from the set of best pizza containing video frames;
(Chen, Figure 4; e.g., PickNet picks one image out of the last and current images, sec. 3.2.1, p4)
applying, by the processor, the first CNN to localize at least one pizza portion of the particular pizza in the identified best pizza containing image;
(Chen, Figure 1; the selected frames localize a particular object (hands, or a pizza of Levanon, Fig. 1); “Yao et al. [39] took into account both the local and global temporal structure of videos to produce descriptions, and their model learned to automatically select the most relevant temporal segments given the text-generating RNN”, sec. 2.1, p2; local segments represent objects with tight boundary such a tight boundary of a pizza image)
applying, by the processor, the first CNN to determine a type of the pizza of the particular pizza from the identified best pizza containing image;
(Chen, “perform informative frame picking in video captioning”, “reinforcement-learning”, [abstract]; video captioning is a process of video object classification or object type determination; Levanon, “a pizza coming out of the oven may be inspected automatically”, [0020], “a deep convolutional neural network algorithm (e.g., that may run on user device 10) may be used to identify food product 20 (e.g., a pizza) in one or more captured images”, [0036]; “identifying in the extracted prepared product data at least one of: the type of the food product, one or more ingredients that are visible on a surface of the food product and distribution of at least one ingredient on the surface of the food product”, [0037])
applying, by the processor, a second CNN to determine a map of pizza components of the particular pizza by automatically performing pizza image segmentation of the pizza portion based on at least the type of the pizza; and
(Levanon, “analyzing the received image may include identifying in the extracted prepared product data at least one of: the type of the food product, one or more ingredients that are visible on a surface of the food product and distribution of at least one ingredient on the surface of the food product.
The combination of Levanon and Chen does not expressly disclose but Nomura teaches:
applying, by the processor, the second CNN to automatically score the particular pizza based on the determined map of pizza components, comprising:
	dividing, by the processor, the pizza portion of the identified best image into a plurality of slices;
	grading, by the processor, one of the plurality of slices of the particular pizza;
	repeating, by the processor, the grading step to grade the remaining slices of the plurality of slices; and
	determining, by the processor, a final score of the particular pizza based on the grading of the plurality of slices.
(Nomura, Figs. 5-6; partitioning an image into multiple partitions (these partitions may be slices of a pizza of Levanon), [0078-0079]; using “a neural network model” ([0068]) (e.g., a deep CNN of Levanon) to determine an index value of abnormality on each partition (pizza slice); using the pizza of Levanon as an example of the image object under inspection, it can be expected that when n slices of pizza have their respective index values of abnormality greater than a predetermined threshold, the pizza quality can be scored as a pizza with n abnormal slices; similarly, when none of the slices shows the index value of abnormality higher than the threshold, the pizza is considered in good quality, [0076-0082])
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Nomura into the modified system or method of Levanon and Chen in order to automatically determine a pizza quality using an image portioning method for inspecting individual pizza slices with a quality score. The total numbers of the bad slices that have a poor quality score indicate an overall pizza quality. The combination of Levanon, Chen and Nomura also teaches other enhanced capabilities.

Regarding claim 2, the combination of Levanon, Chen and Nomura teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the video frames of the video stream are categorized into cases comprising:
i) a first case for images that have no pizza present;
(Chen, Figure 1(b), the hand object may a pizza; the 3rd picked image from left doesn’t show a hand (or pizza))
ii) a second case for images in which a pizza is present and off-centered;
(Chen, Figure 1(b); the hand (or pizza) in the 1st picked image from left is off center)  
iii) a third case for images in which the pizza is present and centered, and a pizza image has a resolution quality of X;
(Chen, Figure 5, right; the rubik’s cube (or pizza) with a pattern of color grids on the top in the picked images at location (row=1, column=2) is at the center location)
iv) a fourth case for images in which the pizza is present and centered, and the pizza image has the resolution quality of Y, where Y is better than X;
(Chen, Figure 5, right; the rubik’s cube (or pizza) in the picked images with a pattern of color grids on the top at location (1, 1) is at the center location; the color grids on the top has better resolution in the (1, 1) image than the (1, 2) image)
v) a fifth case for images in which the pizza is present, centered, and a first type, and the pizza image has a desired resolution quality; and
(Chen, Figure 5, right; the rubik’s cube (or pizza) in the picked images with a pattern of color grids on the top at location (1, 1) is at the center location and may be considered as having an acceptable resolution)
v) (vi) a sixth case for images in which the pizza is present, centered, and a second type, and the pizza image has a desired resolution quality.
(Chen, Figure 5, right; the rubik’s cube (or pizza) in the picked images with another pattern of color grids on the top at location (5, 5) is at the center location and may be considered as having an acceptable resolution; Note: this limitation should be labeled as “vi)” instead of the repeated “v)”) 

Regarding claim 3, the combination of Levanon, Chen and Nomura teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein selecting the set of best pizza containing video frames from the pizza containing video frames further comprises discounting each pizza containing video frame that has at least one of a motion blur or defocus blur.
(Chen, Figure 4, reinforcement-learning-based PickNet generally provides less rewards to any fussy or blurring images as expected)

Regarding claim 4, the combination of Levanon, Chen and Nomura teaches its/their respective base claim(s).
The combination further teaches the method of claim 1, wherein the number of the plurality of slices is 8.
(Levanon, Fig. 1; it is a common commercial practice that a large pizza is typically cut into 8 slices (see, e.g., https://www.dominos.com/en/about-pizza/how-many-slices-are-in-a-large-pizza/))

Claim(s) 5-9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Levanon (US2018/0284091) in view of Chen et al (Less Is More-Picking Informative Frames, 2018) and further in view of Nomura et al (US2020/0184265) and Lu et al (US2019/0223725).

Regarding claim 5, the combination of Levanon, Chen and Nomura teaches its/their respective base claim(s).
The combination does not expressly disclose but Lu teaches the method of claim 1, wherein the second CNN has a contraction path and an expansion path.
(Lu, Fig. 2, “A U-net architecture is shown. The network architecture includes an encoder 21 and a decoder 23. The encoder 21 and decoder 23 are formed from various units 22, 24, 25, 27, 28. The architecture is a fully convolutional network (FCN), such that input samples of any size may be used”, [0039]; the encoder 21 path is a contraction path because the output samples of each set of a convolutional layer 22 and a max pooling units 24 are down-sampled; the decoder 23 path is an expansion path because the output samples of each of transposed-convolutional layers 25 (deconvolutional layers) are up-sampled; the encoder-decoder of Chen ([abstract]) may be implemented using a U-net FCN encoder-decoder architecture to preserve the image spatial information)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Lu into the modified system or method of Levanon, Chen and Nomura in order to use a fully connected U-net architecture for an encoder-decoder neural network to preserve the image spatial information. The total numbers of the bad slices that have a poor quality score indicate an overall pizza quality. The combination of Levanon, Chen, Nomura and Lu also teaches other enhanced capabilities.

Regarding claim 6, the combination of Levanon, Chen, Nomura and Lu teaches its/their respective base claim(s).
The combination further teaches the method of claim 5, wherein the contraction path comprises a plurality of convolution and activation layers.
(Lu, Fig. 2, multiple convolutional units 22 in encoder 21; “For the convolution or encoder segment 21, convolutional units 22 (e.g., ReLU) and max pooling units 24 are used…Each convolutional or transposed-convolutional unit 22, 25, and 27 contains a batch normalization layer and a ReLU activation followed by a 3x3x3 or other size convolutional layer”, [0040])

Regarding claim 7, the combination of Levanon, Chen, Nomura and Lu teaches its/their respective base claim(s).
The combination further teaches the method of claim 6, wherein the contraction path further comprises a subsampling and batch normalization layer after a first convolution and activation layer.
(Lu, see comments on claim 6; Fig. 2, in encoder 21, convolutional layer 22 (1) => batch normalization layer + ReLU  + max pooling layer => convolutional layer 22 (2) => batch normalization layer + ReLU  + max pooling layer => convolutional layer 22 (3) =>…, [0040]; max pooling layer down-samples the image data)

Regarding claim 8, the combination of Levanon, Chen, Nomura and Lu teaches its/their respective base claim(s).
The combination further teaches the method of claim 6, wherein the contraction path further comprises a rectified linear unit (ReLU) layer and a pooling layer following each convolution and activation layer before proceeding to a subsequent convolution and activation layer.
(Lu, see comments on claim 6; Fig. 2, in encoder 21, convolutional layer 22 (1) => batch normalization layer + ReLU  + max pooling layer => convolutional layer 22 (2) => batch normalization layer + ReLU  + max pooling layer => convolutional layer 22 (3) =>…, [0040]; max pooling layer down-samples the image data)

Regarding claim 9, the combination of Levanon, Chen, Nomura and Lu teaches its/their respective base claim(s).
The combination further teaches the method of claim 5, wherein the expansive path comprises a sequence of up-convolutions and concatenations configured to combine feature spatial information with a predetermined resolution features from the contracting path.
(Lu, see comments on claim 5; Fig. 3, in decoder 23, transposed-convolutional layers 25 (deconvolutional layers) up-sample the image data; “The arrows 26 show this concatenation as skip connections. The skip connections skip one or more units. These skip connections at the same levels of abstraction are free of other units or include other units. Other skip connections from one level of abstraction to a different level of abstraction may be used”, [0042]; different lavels of skip connections using a concatenation connection mixes (combines) features at different resolutions since different levels of layers generally represent different resolutions)

Claim(s) 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Levanon (US2018/0284091) in view of Chen et al (Less Is More-Picking Informative Frames, 2018) and further in view of Nomura et al (US2020/0184265) and Huang et al (US2019/0311070).

Regarding claim 10, the combination of Levanon, Chen and Nomura teaches its/their respective base claim(s).
The combination does not expressly disclose but Huang teaches the method of claim 1, wherein the applying the first CNN to localize the at least one pizza portion of the particular pizza in the identified best pizza containing image further comprises:
defining a bounding box; and
utilizing one or more pre-determined binary masks.
(Huang, Figs. 2A-2C and 3A, “The visual intent process 212 accesses the visual classification/detection API 214 to determine a general classification of the objects in the image and to automatically draw bounding boxes around the objects (block 304)”, [0025]; “The object detection portion of the API 215, responsive to the same image, may output bounding boxes around the major objects in the image and confidence values for the bounding boxes. The detection portion of the API 214 may return the image shown in FIG. 2B with a bounding box 252 around the image 208 of the cheese pizza, a bounding box 254 around the image 206 of the pepperoni pizza and a bounding box 256 around the image 210 of the sandwich”, [0027]; the bounding box technique may be applied to the video frame picking operations of Chen to further base on the major objects in the video frames)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the teachings of Huang into the modified system or method of Levanon, Chen and Nomura in order to identify major objects in an image using bounding boxes for attention of other image processing. The total numbers of the bad slices that have a poor quality score indicate an overall pizza quality. The combination of Levanon, Chen, Nomura and Huang also teaches other enhanced capabilities.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIANXUN (JAMES) YANG whose telephone number is (571)272-9874. The examiner can normally be reached on MON-FRI: 8AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Nay Maung can be reached on (571)272-7882. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/JIANXUN YANG/Primary Examiner, Art Unit 2664                                                                                                                                                                                                        
9/22/2022