DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Applicant's amendments filed on 27 June 2022 have been entered.  Claims 12 and 19 have been amended.  No claims have been canceled.  No claims have been added.  Claims 1-25 (1-11 are withdrawn) are still pending in this application, with claims 12 and 19 being independent.

Response to Arguments
Applicant’s arguments with respect to claims 12-25 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 19-25 are rejected under 35 U.S.C. 103 as being unpatentable over Buibas et al. (US Pub. 2021/0124994), hereinafter Buibas, in view of Gao et al. (US Pub. 2020/0151692), hereinafter Gao.
Regarding claim 19, Buibas discloses a method of generating models (Paragraph [0021]: an image processor that calculates a 3D model of the item from the images captured by the cameras. The processor may calculate the item's shape, size, or volume from the 3D mode), comprising: placing an item with a first kind of position on a rotating platform (Fig. 1; Paragraph [0039]: item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system); taking a first set of images of the item with the first kind of position on the rotating platform, wherein multiple lighting levels and angles of the items are used to stimulate real store lighting conditions (Fig. 1; Fig. 4; Fig. 5; Paragraph [0039]: FIG. 1 shows an illustrative embodiment of the invention that may be used to capture and process images of three illustrative items 101, which may be offered for sale in an autonomous store. Stores may have thousands of items in their product catalogs, and representative images of every item must be captured to onboard a store for autonomous operation. Multiple images of each item may be needed for example to train a visual item classifier 130 that identifies items selected by shoppers when the store is in operation. Embodiments of the invention may greatly reduce the amount of time needed to capture these images. Each item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system; Paragraph [0050]: FIG. 4 is illustrative; one or more embodiments may place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item); placing the item with a second kind of position on the rotating platform (Fig. 1; Fig. 4; Fig. 5; Paragraph [0039]: FIG. 1 shows an illustrative embodiment of the invention that may be used to capture and process images of three illustrative items 101, which may be offered for sale in an autonomous store. Stores may have thousands of items in their product catalogs, and representative images of every item must be captured to onboard a store for autonomous operation. Multiple images of each item may be needed for example to train a visual item classifier 130 that identifies items selected by shoppers when the store is in operation. Embodiments of the invention may greatly reduce the amount of time needed to capture these images. Each item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system; Paragraphs [0050]-[0051]: FIG. 4 is illustrative; one or more embodiments may place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item… FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522); taking a second set of images of the item with the second kind of position on the rotating platform, multiple lighting levels and angles of the items are used to stimulate real store lighting conditions (Fig. 5; Paragraphs [0050]-[0053]: place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item…FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition); generating a set of training images by synthetically combining the first set of images, and the second series of images (Figs. 6-8; Paragraphs [0045]-[0047]: Images 120 of item 102 captured by cameras 114a through 114h are then used to train the visual item classifier 130 that may be used to recognize items from images captured during store operations. The classifier training system 125 may first process the item images 120 to generate training images of the item. Illustrative steps for image processing operation 124 are illustrated below with respect to FIGS. 6 and 7. Training images of all items 101 are labeled with the item identities as captured by input device 111. The labeled images are added to a training dataset 121. The training dataset is input into a training process 122 that trains the visual item classifier… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system…two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier 130. These components are illustrative; one or more embodiments may have different components, a subset of these components, or components organized with different connections; Paragraphs [0052]-[0055]: FIGS. 6 and 7 show illustrative steps to implement image processing step 124 that transforms images 523 and 524 into training data for the item classifier. These steps may be performed automatically by one or both of the imaging system controller or by the processor or processors of the training system. An initial processing step, illustrated in FIG. 6, may generate a mask of the item that may be used to separate the item image from the background. Variation of monitor screen background colors (in loop 505 of FIG. 5) facilitates this mask extraction step, since the item in the foreground can be identified as the portion of an image that does not change dramatically when the background color changes. An item mask may be generated for each camera… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf… Training dataset 121 containing labeled item images (transformed for example as shown in FIG. 6) may then be used to train the visual item classifier. One or more embodiments may use any type or types of classifier and any type or types of machine learning algorithms to train the classifier. FIG. 8 shows an illustrative architecture that may be used in one or more embodiments. The visual item classification system 130 may be structured in two stages: an initial feature extractor phase 801 that maps images 800 (as pixel arrays) into feature vectors 802, and a classifier phase 803 that classifies images based on the feature vector 802 generated by the first phase 801. The feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected); training a product recognition model by the set of training images on real time basis with a series of random augmentations, wherein the random augmentations comprise all of the following: brightness, color shift, and scales (Fig. 7; Paragraph [0042]: imaging system 110 may contain cameras and lights. The lights may for example be controllable to provide variable illumination conditions. Item images may be captured under different lighting conditions in order to make the training of the item classifier 130 more robust so that it works in the potentially varying conditions of an operating store. Illustrative lights 115a through 115e are shown mounted at different positions on the lower surface of the ceiling of imaging system 110. One or more embodiments may have any number of lights mounted in any positions and orientations. The lights 115a through 115e may support controllable variable illumination. Variations in illumination may consist of only on/off control, or in one or more embodiments the lights may be controllable for variable brightness, wavelengths, or colors. Variations in illumination may be discrete or continuous; Paragraphs [0051]-[0054]: item mask may be generated for each camera. For example, in FIG. 6, images 531 and 532 corresponding to a first camera with red and blue backgrounds, respectively, may be processed to generate item foreground mask 620. (For simplicity, this process is illustrated using only two images; one or more embodiments may use any number of images with different background colors to calculate an item mask for a camera). In the embodiment shown in FIG. 6, the mask is extracted by locating image areas where the hue of the image remains relatively fixed when the background color changes. Step 601 extracts the hue channel (for example in an HSV color space) from images 531 and 532, yielding images 611 and 612, respectively. Hues are shown as greyscale images, with the red background hue in image 531 corresponding to black (hue of 0), and the blue background hue in image 532 corresponding to a light grey (hue of 240). Differencing operation 613 on the hue channels 611 and 612 results in difference 614; the central black zone shows that the hue of the item foreground is very similar between images 531 and 532. Operation 615 then thresholds difference 614 (converting it to a binary image) and inverts the result, yielding binary image 616. Noise in this image is reduced in step 617 (for example using morphological operators or other filters), resulting in final item mask 62…these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf…feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected); and testing the product recognition model with another set of images of the item in various conditions (Fig. 5; Paragraph [0051]: FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522. Two inner loops are 505 and 508 are then performed to cycle through background colors and lighting conditions, respectively. In inner loop 505, step 506 sets the monitor screen or screens to the desired background color, and step 507 captures images from the cameras with this background. Images captured in this loop 505 may be represented for example as table 523, which has an image for each combination of camera and background color. Illustrative table 523 has images for four different background colors: red, blue, black, and white. One or more embodiments may use any set of any number of background colors, including for example colors of different hues (such as red and blue). Illustrative image 531 is an image from a first camera with a red monitor background, and image 532 is an image from the same camera with a blue monitor background. In inner loop 508, set 509 sets the lights to the desired lighting condition (which may set different lights to different outputs), and step 510 captures images from the cameras with this lighting condition. Images captured in this loop 508 may be represented for example as table 524, which has an image for each combination of camera and lighting condition. For example, row 525 in table 524 contains the images captured from the first camera under the various lighting conditions. The monitor screen background color may be set for example to a neutral color (or turned off entirely) for inner loop 508. In illustrative table 524, lighting conditions are represented by an intensity of “left” lights and “right” lights; in one or more embodiments any combination of light intensities and colors for the entire set of lights may represent a distinct lighting condition).
	Buibas does not explicitly disclose wherein the random augmentations comprises contrast, compression artifacts, Gaussian blur, translations, and flipping.
	However, Gao teaches neural network object detection and model generation (Fig. 4; Paragraph [0115]), further comprising wherein the random augmentations comprises contrast, compression artifacts, Gaussian blur, translations, and flipping (Fig. 6; Paragraph [0024]: the augmentation operations include modifying one or more properties of the captured image, the properties including: brightness, contrast, a hue for each RGB channel, rotation, blur, sharpness, saturation, size, and padding; or performing one or more operations on the captured image, the operations including: histogram equalization, embossing, flipping, adding random noise, adding random dropout, edge detection, piecewise affine, pooling, and channel shuffle; Paragraph [0113]: training data set comprising the normalized images of the selected merchandise item is extended by applying one or more augmentation operations to each normalized image. The augmentation process thereby yields at least one, and in many embodiments, multiple augmented images for each normalized image. The training data comprising the combination of the normalized images and augmented images for the selected merchandise item is therefore extended multiple times over in comparison to the training data set (of step 614) comprising only the normalized images. As mentioned previously, the image augmentation operations can include, but are not limited to: brightness adjustment, contrast adjustment, adding random noise, independently adjusting hue of RGB channels (or channels within various color spaces, including but not limited to sRGB, Adobe RGB, ProPhoto, DCI-P3, Rec 709 or various other color spaces as would be appreciated by one of ordinary skill in the art), random dropout, rotation, blurring, adjusting sharpness, adjusting saturation, embossing, flipping, edge detection, piecewise affine transformation, pooling, scaling, padding, channel shuffling, etc. By applying different combinations of one or more image augmentation operations to a normalized image, the effect or impact of different lighting conditions can be simulated without having to make physical lighting adjustments during the original process of obtaining the series of captured images). Gao teaches that this will allow for different lighting conditions to be simulated without having to make physical lighting adjustments (Paragraph [0113]) and that captured images may contain images of the merchandise item in a variety of different perspectives, angles, positions, lighting conditions, etc., which can ultimately assist in creating a more robust training data set (Paragraph [0059]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buibas with the features wherein the random augmentations comprises compression artifacts, Gaussian blur, contrast, translations, and flipping as taught by Gao so as to assist in creating a more robust training data set as presented by Gao.
Regarding claim 20, Buibas, in view of Gao teaches the method of claim 19, Buibas discloses wherein computer graphics technology is configured to change the multiple lighting levels and angles with software (Fig. 5; Paragraphs [0044]-[0047]: Imaging system 110 may contain or may be coupled to a controller 116, which may communicate with and control system components such as identification input device 111, operator terminal 112, monitor screen or screens 113, variable illumination lights 115a through 115e, and cameras 114a through 114h. This controller 116 may contain any type or types of processor, such as for example a microprocessor, microcontroller, or single board computer. In one or more embodiments the controller 116 may be a computer that is physically remote from but coupled to the physical imaging system 110. In one or more embodiments the operator terminal 112 may be a computer that also acts as controller 116. Controller 116 executes a sequence of operations, described below, to change the imaging environment and to capture images 120 of the item… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system 125…FIG. 2 shows an architectural block diagram of the embodiment of FIG. 1. The two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier; Paragraphs [0050]-[0053]: place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item…FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522).
Regarding claim 21, Buibas, in view of Gao teaches the method of claim 19, Buibas discloses wherein an object is placed near the item to achieve partial occultation (Fig. 7; Paragraph [0053]: item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf).
Regarding claim 22, Buibas, in view of Gao teaches a method of Claim 19, Buibas discloses wherein the item and the different backgrounds are composed to simulate images of real stores with occlusion and real store lighting condition (Fig. 7; Paragraph [0053]: item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf).
Regarding claim 23, Buibas, in view of Gao teaches a method of Claim 19, Gao discloses wherein the set of training images are mixed with real images in a real store in a randomized way (Paragraphs [0062]-[0063]: the dedicated image capture process is designed to introduce a similar or greater level of randomness or variance in the series of captured images 504, as compared to what would be seen in captured images obtained from shoppers. For example, the dedicated image capture process can require that the given merchandise item be rotated in front of the cameras for a pre-determined period of time and/or a pre-determined number of rotations, so as to better ensure that the training data includes views of the merchandise item from all angles…multiple ‘rounds’ of image capture might be performed for a single, given merchandise item. In order to better recreate the cart conditions expected when performing merchandise identification in a supermarket or retail environment, the series of captured images 504 can be framed such that the given merchandise item is located against a background consisting of a randomized or varying assortment of other merchandise items (e.g. other merchandise items from inventory 502). In this manner, the series of captured images 504 will include images of the given merchandise item that are taken from a variety of different angles, against a variety of different mixed backgrounds. In one embodiment, the dedicated image capture process can include five, five-second-long ‘rounds’ of image capture, where the background assortment of other merchandise items from inventory 502 is changed between each round).
Regarding claim 24, Buibas, in view of Gao teaches a method of Claim 19, Buibas discloses the set of training images are generated by a process of composition (Figs. 6-8; Paragraphs [0045]-[0047]: Images 120 of item 102 captured by cameras 114a through 114h are then used to train the visual item classifier 130 that may be used to recognize items from images captured during store operations. The classifier training system 125 may first process the item images 120 to generate training images of the item. Illustrative steps for image processing operation 124 are illustrated below with respect to FIGS. 6 and 7. Training images of all items 101 are labeled with the item identities as captured by input device 111. The labeled images are added to a training dataset 121. The training dataset is input into a training process 122 that trains the visual item classifier… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system…two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier 130. These components are illustrative; one or more embodiments may have different components, a subset of these components, or components organized with different connections; Paragraphs [0052]-[0055]: FIGS. 6 and 7 show illustrative steps to implement image processing step 124 that transforms images 523 and 524 into training data for the item classifier. These steps may be performed automatically by one or both of the imaging system controller or by the processor or processors of the training system. An initial processing step, illustrated in FIG. 6, may generate a mask of the item that may be used to separate the item image from the background. Variation of monitor screen background colors (in loop 505 of FIG. 5) facilitates this mask extraction step, since the item in the foreground can be identified as the portion of an image that does not change dramatically when the background color changes. An item mask may be generated for each camera… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf… Training dataset 121 containing labeled item images (transformed for example as shown in FIG. 6) may then be used to train the visual item classifier. One or more embodiments may use any type or types of classifier and any type or types of machine learning algorithms to train the classifier. FIG. 8 shows an illustrative architecture that may be used in one or more embodiments. The visual item classification system 130 may be structured in two stages: an initial feature extractor phase 801 that maps images 800 (as pixel arrays) into feature vectors 802, and a classifier phase 803 that classifies images based on the feature vector 802 generated by the first phase 801. The feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected).
Regarding claim 25, Buibas, in view of Gao teaches a method of Claim 19, Buibas discloses the set of training images is configured to train a deep learning model to recognize a new product that has not been seen in real stores (Paragraph [0004]: “onboarding” process to set up the item images for a store can be extremely time-consuming, particularly for stores with thousands of items and high item turnover as packaging for items changes over time and new items introduced. A typical workflow used in the art for this onboarding process is to manually capture images of each product from various angles and under various conditions. Further manual processing is typically required to crop and prepare item images for a training dataset. The process to onboard a single item may take 15 to 30 minutes. For stores with large numbers of items, onboarding the store's complete catalog may take multiple months, at which time many of the product's packaging may have changed. There are no known systems that automate the onboarding process so that multiple item images can be captured and prepared quickly and with minimal labor).

Claims 12-18 are rejected under 35 U.S.C. 103 as being unpatentable over Buibas, in view of Buibas et al. (US Pub. 2021/0067744), hereinafter Bapst, and further in view of Gao.
Regarding claim 12, Buibas discloses a method of generating models (Paragraph [0021]: an image processor that calculates a 3D model of the item from the images captured by the cameras. The processor may calculate the item's shape, size, or volume from the 3D mode), comprising: placing an item with a first kind of position on a rotating platform (Fig. 1; Paragraph [0039]: item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system); taking a first set of images of the item with the first kind of position on the rotating platform, wherein multiple lighting levels and angles of the items are used to stimulate real store lighting conditions (Fig. 1; Fig. 4; Fig. 5; Paragraph [0039]: FIG. 1 shows an illustrative embodiment of the invention that may be used to capture and process images of three illustrative items 101, which may be offered for sale in an autonomous store. Stores may have thousands of items in their product catalogs, and representative images of every item must be captured to onboard a store for autonomous operation. Multiple images of each item may be needed for example to train a visual item classifier 130 that identifies items selected by shoppers when the store is in operation. Embodiments of the invention may greatly reduce the amount of time needed to capture these images. Each item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system; Paragraph [0050]: FIG. 4 is illustrative; one or more embodiments may place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item); placing the item with a second kind of position on the rotating platform (Fig. 1; Fig. 4; Fig. 5; Paragraph [0039]: FIG. 1 shows an illustrative embodiment of the invention that may be used to capture and process images of three illustrative items 101, which may be offered for sale in an autonomous store. Stores may have thousands of items in their product catalogs, and representative images of every item must be captured to onboard a store for autonomous operation. Multiple images of each item may be needed for example to train a visual item classifier 130 that identifies items selected by shoppers when the store is in operation. Embodiments of the invention may greatly reduce the amount of time needed to capture these images. Each item may be placed successively into the image capture system 110, which controls the imaging environment and manages the image capturing process. In the example of FIG. 1, an operator places item 102 into the system 110. In one or more embodiments, movement of items successively into image capture system 110 may be automated or semi-automated; for example, items may be placed onto a conveyor belt or a rotating platform that moves items into and out of the system 110, or a robotic system may successively transport items into and out of the system; Paragraphs [0050]-[0051]: FIG. 4 is illustrative; one or more embodiments may place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item… FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522); taking a second set of images of the item with the second kind of position on the rotating platform, multiple lighting levels and angles of the items are used to stimulate real store lighting conditions (Fig. 5; Paragraphs [0050]-[0053]: place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item…FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition); taking a second series of images of different backgrounds (Fig. 3; Paragraph [0048]: described below with respect to FIG. 6, modifying the background color (or pattern) allows the system to extract a high-quality mask of the item being imaged. Any number of background colors (or patterns) may be used. After the background sequence (steps 301, 302, and similar steps for other backgrounds), controller 116 then cycles the lights through a sequence of lighting conditions, and captures images with each lighting condition; Paragraph [0056]: employ variations of the rapid onboarding system illustrated for example in FIG. 1 and FIG. 4. In particular, in one or more embodiments variably colored backgrounds may be provided using translucent panels illuminated from behind the panels with variably colored light, instead of (or in addition to) using monitor screens. In some situations these translucent panels may be more robust or less expensive than monitor screens. One or more embodiments may use backgrounds with any combination of monitor screens and translucent panels illuminated from behind with variably colored light); generating a set of training images by synthetically combining the first set of images, the second set of images, the first series of images and the second series of images, wherein the first set of images were segmented, wherein the second set of images were segmented, wherein the first series of images were segmented (Figs. 6-8; Paragraphs [0045]-[0047]: Images 120 of item 102 captured by cameras 114a through 114h are then used to train the visual item classifier 130 that may be used to recognize items from images captured during store operations. The classifier training system 125 may first process the item images 120 to generate training images of the item. Illustrative steps for image processing operation 124 are illustrated below with respect to FIGS. 6 and 7. Training images of all items 101 are labeled with the item identities as captured by input device 111. The labeled images are added to a training dataset 121. The training dataset is input into a training process 122 that trains the visual item classifier… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system…two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier 130. These components are illustrative; one or more embodiments may have different components, a subset of these components, or components organized with different connections; Paragraphs [0052]-[0055]: FIGS. 6 and 7 show illustrative steps to implement image processing step 124 that transforms images 523 and 524 into training data for the item classifier. These steps may be performed automatically by one or both of the imaging system controller or by the processor or processors of the training system. An initial processing step, illustrated in FIG. 6, may generate a mask of the item that may be used to separate the item image from the background. Variation of monitor screen background colors (in loop 505 of FIG. 5) facilitates this mask extraction step, since the item in the foreground can be identified as the portion of an image that does not change dramatically when the background color changes. An item mask may be generated for each camera… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf… Training dataset 121 containing labeled item images (transformed for example as shown in FIG. 6) may then be used to train the visual item classifier. One or more embodiments may use any type or types of classifier and any type or types of machine learning algorithms to train the classifier. FIG. 8 shows an illustrative architecture that may be used in one or more embodiments. The visual item classification system 130 may be structured in two stages: an initial feature extractor phase 801 that maps images 800 (as pixel arrays) into feature vectors 802, and a classifier phase 803 that classifies images based on the feature vector 802 generated by the first phase 801. The feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected); training a product recognition model by the set of training images on real time basis with a series of random augmentations, wherein the random augmentations comprise all of the following: brightness, color shift, and scales (Fig. 7; Paragraph [0042]: imaging system 110 may contain cameras and lights. The lights may for example be controllable to provide variable illumination conditions. Item images may be captured under different lighting conditions in order to make the training of the item classifier 130 more robust so that it works in the potentially varying conditions of an operating store. Illustrative lights 115a through 115e are shown mounted at different positions on the lower surface of the ceiling of imaging system 110. One or more embodiments may have any number of lights mounted in any positions and orientations. The lights 115a through 115e may support controllable variable illumination. Variations in illumination may consist of only on/off control, or in one or more embodiments the lights may be controllable for variable brightness, wavelengths, or colors. Variations in illumination may be discrete or continuous; Paragraphs [0051]-[0054]: item mask may be generated for each camera. For example, in FIG. 6, images 531 and 532 corresponding to a first camera with red and blue backgrounds, respectively, may be processed to generate item foreground mask 620. (For simplicity, this process is illustrated using only two images; one or more embodiments may use any number of images with different background colors to calculate an item mask for a camera). In the embodiment shown in FIG. 6, the mask is extracted by locating image areas where the hue of the image remains relatively fixed when the background color changes. Step 601 extracts the hue channel (for example in an HSV color space) from images 531 and 532, yielding images 611 and 612, respectively. Hues are shown as greyscale images, with the red background hue in image 531 corresponding to black (hue of 0), and the blue background hue in image 532 corresponding to a light grey (hue of 240). Differencing operation 613 on the hue channels 611 and 612 results in difference 614; the central black zone shows that the hue of the item foreground is very similar between images 531 and 532. Operation 615 then thresholds difference 614 (converting it to a binary image) and inverts the result, yielding binary image 616. Noise in this image is reduced in step 617 (for example using morphological operators or other filters), resulting in final item mask 62…these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf…feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected); and testing the product recognition model with another set of images of the item in various conditions (Fig. 5; Paragraph [0051]: FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522. Two inner loops are 505 and 508 are then performed to cycle through background colors and lighting conditions, respectively. In inner loop 505, step 506 sets the monitor screen or screens to the desired background color, and step 507 captures images from the cameras with this background. Images captured in this loop 505 may be represented for example as table 523, which has an image for each combination of camera and background color. Illustrative table 523 has images for four different background colors: red, blue, black, and white. One or more embodiments may use any set of any number of background colors, including for example colors of different hues (such as red and blue). Illustrative image 531 is an image from a first camera with a red monitor background, and image 532 is an image from the same camera with a blue monitor background. In inner loop 508, set 509 sets the lights to the desired lighting condition (which may set different lights to different outputs), and step 510 captures images from the cameras with this lighting condition. Images captured in this loop 508 may be represented for example as table 524, which has an image for each combination of camera and lighting condition. For example, row 525 in table 524 contains the images captured from the first camera under the various lighting conditions. The monitor screen background color may be set for example to a neutral color (or turned off entirely) for inner loop 508. In illustrative table 524, lighting conditions are represented by an intensity of “left” lights and “right” lights; in one or more embodiments any combination of light intensities and colors for the entire set of lights may represent a distinct lighting condition). 
	Buibas does not explicitly disclose taking a first series of images of hands from different individuals; or wherein the random augmentations comprises compression artifacts, Gaussian blur, contrast, translations, and flipping.
	However, Bapst teaches neural network object detection and model generation (Abstract; Paragraphs [0021]-[0023]), further comprising taking a first series of images of hands from different individuals (Fig. 42; Paragraphs [0295]-[0296]: camera 4231 on shelf 4212 observes items on shelf 4213. When user 4201 reaches for an item on shelf 4213, cameras on either or both of shelves 4212 and 4213 may detect entry of the user's hand into the shelf area, and may capture images of shelf contents that may be used to determine which item or items are taken or moved. This data may be combined with images from other store cameras, such as cameras 4231 and 4232, to track the shoppers and attribute item movements to specific shoppers…FIG. 43 shows an illustrative embodiment of a smart shelf 4212, viewed from the front. FIGS. 44 through 47 show additional views of this embodiment. Smart shelf 4212 has cameras 4301 and 4302 at the left and right ends, respectively, which face inward along the front edge of the shelf. Thus the left end camera 4301 is rightward-facing, and the right end camera 4302 is leftward-facing. These cameras may be used for example to detect when a user's hand moves into or out of the shelf area. These cameras 4301 and 4302 may be used in combination with similar cameras on shelves above and/or below shelf 4212 in a shelving unit (such as shelves 4211 and 4213 in FIG. 42) to detect hand events. For example, the system may use multiple hand detection cameras to triangulate the position of a hand going into a shelf. With two cameras observing a hand, the position of a hand can be determined from the two images. With multiple cameras (for example four or more) observing a shelf, the system may be able to determine the position of more than one hand at a time since the multiple views can compensate for potential occlusions. Images of the shelf just prior to a hand entry event may be compared to images of the shelf just after a hand exit event, in order to determine which item or items may have been taken, moved, or added to the shelf. In one or more embodiments other detection technologies may be used instead of or in addition to the cameras 4301 and 4302 to detect hand entry and hand exit events for the shelf; these technologies may include for example, without limitation, light curtains, sensors on a door that must be opened to access the shelf or the shelving unit, ultrasonic sensors, and motion detectors; Paragraph [0312]: FIG. 53B shows an item storage area before a shopper reaches into the shelf with hand 5302, and FIG. 53A shows this item storage area after the shopper interacts with the shelf to remove items. The entire item storage area 5320 is the volume between shelves 4213 and 4212. Detection of the interaction of hand 5302 with this item storage area may be performed for example by analyzing images from side-facing cameras 4301 and 4302 on shelf 4212. Side-facing cameras from other shelves may also be used, such as the cameras 5311 and 5312 on shelf 4213. In one or more embodiments other sensors may be used instead of or in addition to cameras to detect the interaction of the shopper with the item storage area. Typically the shopper interacts with an item storage area by reaching a hand 5302 into the area; however, one or more embodiments may track any type of interaction of a shopper with an item storage area, via any part of the shopper's body or any instrument or tool the shopper may use to reach into the area or otherwise interact with items in the area). Bapst teaches that this will allow for increasing accuracy of attribution of items with shoppers (Abstract). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buibas with the features of taking a first series of images of hands from different individuals as taught by Bapst so as to increase accuracy of attribution of items as presented by Bapst.
	Further, Gao teaches neural network object detection and model generation (Fig. 4; Paragraph [0115]), further comprising wherein the random augmentations comprises contrast, compression artifacts, Gaussian blur, translations, and flipping (Fig. 6; Paragraph [0024]: the augmentation operations include modifying one or more properties of the captured image, the properties including: brightness, contrast, a hue for each RGB channel, rotation, blur, sharpness, saturation, size, and padding; or performing one or more operations on the captured image, the operations including: histogram equalization, embossing, flipping, adding random noise, adding random dropout, edge detection, piecewise affine, pooling, and channel shuffle; Paragraph [0113]: training data set comprising the normalized images of the selected merchandise item is extended by applying one or more augmentation operations to each normalized image. The augmentation process thereby yields at least one, and in many embodiments, multiple augmented images for each normalized image. The training data comprising the combination of the normalized images and augmented images for the selected merchandise item is therefore extended multiple times over in comparison to the training data set (of step 614) comprising only the normalized images. As mentioned previously, the image augmentation operations can include, but are not limited to: brightness adjustment, contrast adjustment, adding random noise, independently adjusting hue of RGB channels (or channels within various color spaces, including but not limited to sRGB, Adobe RGB, ProPhoto, DCI-P3, Rec 709 or various other color spaces as would be appreciated by one of ordinary skill in the art), random dropout, rotation, blurring, adjusting sharpness, adjusting saturation, embossing, flipping, edge detection, piecewise affine transformation, pooling, scaling, padding, channel shuffling, etc. By applying different combinations of one or more image augmentation operations to a normalized image, the effect or impact of different lighting conditions can be simulated without having to make physical lighting adjustments during the original process of obtaining the series of captured images). Gao teaches that this will allow for different lighting conditions to be simulated without having to make physical lighting adjustments (Paragraph [0113]) and that captured images may contain images of the merchandise item in a variety of different perspectives, angles, positions, lighting conditions, etc., which can ultimately assist in creating a more robust training data set (Paragraph [0059]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Buibas, in view of Bapst with the features wherein the random augmentations comprises compression artifacts, Gaussian blur, contrast, translations, and flipping as taught by Gao so as to assist in creating a more robust training data set as presented by Gao. 
Regarding claim 13, Buibas, in view of Bapst, and further in view of Gao teaches the method of claim 12, Buibas discloses wherein computer graphics technology is configured to change the multiple lighting levels and angles with software (Fig. 5; Paragraphs [0044]-[0047]: Imaging system 110 may contain or may be coupled to a controller 116, which may communicate with and control system components such as identification input device 111, operator terminal 112, monitor screen or screens 113, variable illumination lights 115a through 115e, and cameras 114a through 114h. This controller 116 may contain any type or types of processor, such as for example a microprocessor, microcontroller, or single board computer. In one or more embodiments the controller 116 may be a computer that is physically remote from but coupled to the physical imaging system 110. In one or more embodiments the operator terminal 112 may be a computer that also acts as controller 116. Controller 116 executes a sequence of operations, described below, to change the imaging environment and to capture images 120 of the item… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system 125…FIG. 2 shows an architectural block diagram of the embodiment of FIG. 1. The two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier; Paragraphs [0050]-[0053]: place monitor screens, cameras, and lights in any locations and orientations, to support image capture from any angles under any desired background and lighting conditions. In one or more embodiments, the transparent platform 401 may be a one-way mirror so that cameras may be placed directly underneath the platform without interfering with images captured from the cameras above the item…FIG. 5 shows a flowchart of illustrative steps performed by one or more embodiments of the invention to capture item images under different orientations and conditions. Outer loop 500 is repeated for each item that needs to be recognized by the item classifier (for example, for all items in a store's catalog or inventory). In step 501, an item barcode or other identifier is read, for example by a barcode scanner or camera, which obtains the item identifier 521 (such as a SKU). Then loop 502 is repeated for each different pose into which the item must be placed for imaging. A prompt 503 may be generated to instruct the operator to place the item into the desired pose; the operator may perform step 504 to put the item into the imaging system in this pose 522).
Regarding claim 14, Buibas, in view of Bapst, and further in view of Gao teaches the method of claim 12, Buibas discloses wherein an object is placed near the item to achieve partial occultation (Fig. 7; Paragraph [0053]: item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf).
Regarding claim 15, Buibas, in view of Bapst, and further in view of Gao teaches a method of Claim 12, Buibas discloses wherein the item and the different backgrounds are composed to simulate images of real stores with occlusion and real store lighting condition (Fig. 7; Paragraph [0053]: item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf).
Regarding claim 16, Buibas, in view of Bapst, and further in view of Gao teaches a method of Claim 12, Bapst discloses wherein the set of training images are mixed with real images in a real store in a randomized way (Figs. 57A-57B; Paragraph; Paragraphs [0329]-[0330]: Projected before and after images may be compared to determine an approximate region in which items may have been removed, added, or moved. This comparison is illustrated in FIG. 57A. Projected before image 5701b is compared to projected after image 5701a; these images are both from the same camera, and are both projected to the same surface. One or more embodiments may use any type of image comparison to compare before and after images. For example, without limitation, image comparison may be a pixel-wise difference, a cross-correlation of images, a comparison in the frequency domain, a comparison of one image to a linear transformation of another, comparisons of extracted features, or a comparison via a trained machine learning system that is trained to recognize certain types of image differences…FIG. 57B illustrates image differencing on before projected image 5711b and after projected image 5711a captured from an actual sample shelf. The difference image 5712 has a noisy region 5713 that is filtered and bounded to identify a change region).
Regarding claim 17, Buibas, in view of Bapst, and further in view of Gao teaches a method of Claim 12, Buibas discloses the set of training images are generated by a process of composition (Figs. 6-8; Paragraphs [0045]-[0047]: Images 120 of item 102 captured by cameras 114a through 114h are then used to train the visual item classifier 130 that may be used to recognize items from images captured during store operations. The classifier training system 125 may first process the item images 120 to generate training images of the item. Illustrative steps for image processing operation 124 are illustrated below with respect to FIGS. 6 and 7. Training images of all items 101 are labeled with the item identities as captured by input device 111. The labeled images are added to a training dataset 121. The training dataset is input into a training process 122 that trains the visual item classifier… Training system 125 may include a processor or processors 123, which may for example perform image processing operation 124 and training operation 122. In one or more embodiments, controller processor 116 and training system process 123 may be identical or may share components. Processor or processors 123 may for example include GPUs to parallelize image processing and training operations. In one or more embodiments, processor or processors 123 and training dataset 121 may be remote from item imaging system 110, and images 120 may be transferred over a network connection to the training system…two major subsystems of the embodiment are item imaging system 110, and item classifier training system 125. Items 101 are placed into item imaging system 110; images and item identities are passed from the item imaging system to the item classifier training system. In item imaging system 110, controller 116 is coupled to and controls all other components, including monitor screen or screens 113, cameras 114, variable illumination lights 115, item identification input 111, and operator terminal 112. Item classifier training system 125 has a processor (or processors) 123, which is connected to training dataset 121 and to item classifier 130; processor 123 processes the images from cameras 114, builds the training dataset 121, and performs the training of the classifier 130. These components are illustrative; one or more embodiments may have different components, a subset of these components, or components organized with different connections; Paragraphs [0052]-[0055]: FIGS. 6 and 7 show illustrative steps to implement image processing step 124 that transforms images 523 and 524 into training data for the item classifier. These steps may be performed automatically by one or both of the imaging system controller or by the processor or processors of the training system. An initial processing step, illustrated in FIG. 6, may generate a mask of the item that may be used to separate the item image from the background. Variation of monitor screen background colors (in loop 505 of FIG. 5) facilitates this mask extraction step, since the item in the foreground can be identified as the portion of an image that does not change dramatically when the background color changes. An item mask may be generated for each camera… item foreground mask 620 (for each camera) may then be applied to the images 524 captured for each combination of camera and lighting condition. This process is illustrated in FIG. 7 for images 525 from the first camera. In step 701, mask 620 is applied to the images 525, yielding images 702 of the item alone (without a background). In one or more embodiments, these extracted item images 702 may be modified in various ways to generate training images that are added to training dataset 121. For example, any data augmentation techniques commonly applied to image data for machine learning may be applied to images 702. FIG. 7 shows illustrative examples of image rotation 711, scaling 712, color shifting 713, and adding occlusions 714. A background addition step 720 may then be applied to the transformed item foreground images, yielding for example images 721, 722, 723, and 724 that may be added to the training dataset 121 (labeled with the item identifier). Backgrounds may be selected randomly, or they may be selected to match possible backgrounds expected during store operations, such as patterns on store shelves or other items that may be placed on the same shelf… Training dataset 121 containing labeled item images (transformed for example as shown in FIG. 6) may then be used to train the visual item classifier. One or more embodiments may use any type or types of classifier and any type or types of machine learning algorithms to train the classifier. FIG. 8 shows an illustrative architecture that may be used in one or more embodiments. The visual item classification system 130 may be structured in two stages: an initial feature extractor phase 801 that maps images 800 (as pixel arrays) into feature vectors 802, and a classifier phase 803 that classifies images based on the feature vector 802 generated by the first phase 801. The feature extractor 801 may be for example any module that maps image pixels into a feature vector; examples include, without limitation, a neural network, a convolutional neural network, a color histogram vector, a histogram of oriented gradients, a bag of visual words histogram constructed from SURF or other traditional computer vision features, or a concatenation of any of the above. The classifier 803 may be for example, without limitation, a K-nearest neighbor classifier, logistic regression, a support vector machine, a random forest classifier, Adaboosted decision trees, and a neural network which may be for example fully connected).
Regarding claim 18, Buibas, in view of Bapst, and further in view of Gao teaches a method of Claim 12, Buibas discloses the set of training images is configured to train a deep learning model to recognize a new product that has not been seen in real stores (Paragraph [0004]: “onboarding” process to set up the item images for a store can be extremely time-consuming, particularly for stores with thousands of items and high item turnover as packaging for items changes over time and new items introduced. A typical workflow used in the art for this onboarding process is to manually capture images of each product from various angles and under various conditions. Further manual processing is typically required to crop and prepare item images for a training dataset. The process to onboard a single item may take 15 to 30 minutes. For stores with large numbers of items, onboarding the store's complete catalog may take multiple months, at which time many of the product's packaging may have changed. There are no known systems that automate the onboarding process so that multiple item images can be captured and prepared quickly and with minimal labor).
	
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW D SALVUCCI whose telephone number is (571)270-5748. The examiner can normally be reached M-F: 7:30-4:00PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached on (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW SALVUCCI/Primary Examiner, Art Unit 2613