DETAILED ACTION
Response to Amendment
The amendment was received 3/4/2022. Claims 1-8 are pending.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 






As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 




Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.











This application includes one or more claim limitations that use the word “means” or “step” but are nonetheless not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph because the claim limitation(s) recite(s) sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitation(s) is/are: 
“receiving…classifying…detecting…identifying…and identifying…by the computing device” in claim 1;

“determining…generating…and presenting…by the computing device” in claim 2;

“receiving…and generating….by the computing device” in claim 3; and

“receiving…verifying…and sending….by the computer device” in claim 4.

Because this/these claim limitation(s) is/are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If applicant intends to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitation(s) does/do not recite sufficient structure, materials, or acts to perform the claimed function. Accordingly:






Accordingly the following definitions are “taken” via MPEP 2111.01 III. "PLAIN MEANING" REFERS TO THE ORDINARY AND CUSTOMARY MEANING GIVEN TO THE TERM BY THOSE OF ORDINARY SKILL IN THE ART, 3rd paragraph, emphasis added:
“It is also appropriate to look to how the claim term is used in the prior art, which includes prior art patents, published applications, trade publications, and dictionaries. Any meaning of a claim term taken from the prior art must be consistent with the use of the claim term in the specification and drawings. Moreover , when the specification is clear about the scope and content of a claim term, there is no need to turn to extrinsic evidence for claim interpretation. 3M Innovative Props. Co. v. Tredegar Corp., 725 F.3d 1315, 1326-28, 107 USPQ2d 1717, 1726-27 (Fed. Cir. 2013) (holding that "continuous microtextured skin layer over substantially the entire laminate" was clearly defined in the written description, and therefore, there was no need to turn to extrinsic evidence to construe the claim).”

The claimed “method” (as in “A method for object detection and identification, the method comprising” in claim 1) is interpreted in light of applicant’s disclosure, as discussed in the below Suggestions, and definition thereof via Dictionary.com wherein “orderly or systematic…sequence” is “taken” as the meaning of the claimed “method” via MPEP 2111.01 III:
method
noun
4	orderly or systematic arrangement, sequence, or the like.







The claimed “by” (as in “receiving, by a computing device, an image” in claim 1) is interpreted under the broadest reasonable interpretation as one of skill in the art would given applicant’s disclosure and via definition thereof via Dictionary.com, wherein definitions 1-24 are equally applicable:
by	
preposition
1	near to or next to:
a home by a lake.
2	over the surface of, through the medium of, along, or using as a route:
He came by the highway. She arrived by air.
3	on, as a means of conveyance:
They arrived by ship.
4	to and beyond the vicinity of; past:
He went by the church.
5	within the extent or period of; during:
by day; by night.
6	not later than; at or before:
I usually finish work by five o'clock.
7	to the extent or amount of:
The new house is larger than the old one by a great deal. He's taller than his sister by three inches.
8	from the opinion, evidence, or authority of:
By his own account he was in Chicago at the time. I know him by sight.

9	according to; in conformity with:
This is a bad movie by any standards.
10	with (something) at stake; on:
to swear by all that is sacred.
11	through the agency, efficacy, work, participation, or authority of:
The book was published by Random House.
12	from the hand, mind, invention, or creativity of:
She read a poem by Emily Dickinson. The phonograph was invented by Thomas Edison.
13	in consequence, as a result, or on the basis of:
We met by chance. We won the game by forfeit.
14	accompanied with or in the atmosphere of:
Lovers walk by moonlight.
15	in treatment or support of; for:
He did well by his children.
16	after; next after, as of the same items in a series:
piece by piece; little by little.
17	(in multiplication) taken the number of times as that specified by the second number, or multiplier:
Multiply 18 by 57.
18	(in measuring shapes) having an adjoining side of, as a width relative to a length:
a room 10 feet by 12 feet.
19	(in division) separated into the number of equal parts as that specified by the second number, or divisor:
Divide 99 by 33.
20	in terms or amounts of; in measuring units of:
Apples are sold by the bushel. I'm paid by the week.
21	begot or born of:
Eve had two sons by Adam.
22	(of quadrupeds) having as a sire:
Equipoise II by Equipoise.
23	Navigation. (as used in the names of the 16 smallest points on the compass) one point toward the east, west, north, or south of N, NE, E, SE, S, SW, W, or NW, respectively:
He sailed NE by N from Pago Pago.
24	into, at, or to:
Come by my office this afternoon.

The claimed “screenshot”, an adjective, (as in “the image is screenshot captured by the user device from a display” of claim 1) is interpreted as one of skill in the art would in light of applicant’s disclosure and definition thereof via Dictionary.com:
screenshot
noun
1	Also called screen cap·ture , screen·cap. 
a copy or image of what is seen on a computer monitor or other screen at a given time:
Save the screenshot as a graphics file.
verb (used with object) screen·shot or screen·shot·ted, screen·shot·ting.
2	to take a screenshot of:
You can screenshot the error message and send it to me.

BRITISH DICTIONARY DEFINITIONS FOR SCREENSHOT
screenshot
noun
1	an image created by copying part or all of the display on a computer screen at a particular moment, for example in order to demonstrate the use of a piece of software

The claimed “detecting” (as in “detecting, by the computing device, one or more objects contained within the image” in claim 1) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com wherein “indicating the opposite of something” is “taken” as the meaning of the claimed “non-” via MPEP 2111.01 III:
BRITISH DICTIONARY DEFINITIONS FOR DETECT
detect
verb (tr)
1	to perceive or notice: to detect a note of sarcasm

The claimed “non-” (as in “the one or more objects is or are delaminated into retail objects and non-retail objects” in claim 1) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com wherein “indicating the opposite of something” is “taken” as the meaning of the claimed “non-” via MPEP 2111.01 III:
BRITISH DICTIONARY DEFINITIONS FOR NON-
non-
prefix
1	indicating negation: nonexistent

wherein “negation” is defined:
BRITISH DICTIONARY DEFINITIONS FOR NEGATION
negation
noun
1	the opposite or absence of something








The claimed “source” (as in “identifying…one or more sources” in claim 1) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com wherein any one of:
a.	any thing or place from which something comes, arises, or is obtained; origin; or
b.	a book, statement, person, etc., supplying information; or.
c.	a manufacturer or supplier

 is “taken” as the meaning of the claimed “source” via MPEP 2111.01 III:
source
noun
1	any thing or place from which something comes, arises, or is obtained; origin:
Which foods are sources of calcium?
3	a book, statement, person, etc., supplying information.
5	a manufacturer or supplier.

The claimed “the objects” in claim 2, last line is interpreted under the broadest reasonable interpretation in light of applicant’s disclosure in the context of “multiple objects to be identified and located within the same image” via applicant’s disclosure:
[0002] Humans are capable of looking at an image or watching a video and readily identifying, people, objects, scenes, and other visual details. Object recognition has become an ever increasingly important facet of modern technology. Object recognition, with respect to technology, is a computer vision technique for identifying objects in images or videos. Object recognition techniques may use various means to identify objects such as deep learning and machine learning algorithms. Further, object recognition techniques may be combined with object detection techniques. Object detection and object recognition are similar techniques for identifying objects, but they vary in their execution. Object detection is the process of finding instances of objects in images. In the case of deep learning, object detection is a subset of object recognition, where the object is not only identified but also located in an image. This allows for multiple objects to be identified and located within the same image.




The claimed “request” (as in “a request to acquire a retail object from the user device from one of the one or more sources” in claim 4) is interpreted in light of applicant’s disclosure and definition thereof via Dictionary.com, definitions 1-5 are equally applicable:
request, noun
1	the act of asking for something to be given or done, especially as a favor or courtesy; solicitation or petition:
At his request, they left.
2	an instance of this:
There have been many requests for the product.
3	a written statement of petition:
If you need supplies, send in a request.
4	something asked for:
to obtain one's request.
5	the state of being asked for; demand.














Response to Arguments
Applicant’s arguments, see remarks, pages 5-8, filed 3/4/2022, with respect to:
Claim objection of claims 1-8;
Double patenting rejection of claims 1-5,7 and 8;
Double patenting rejection of claim 6;
35 USC 103 rejection of claims 1,2,5,6 and 7
35 USC 103 rejection of claim 3;
35 USC 103 rejection of claim 4; and
35 USC 103 rejection of claim 8 
in the Office action of 12/9/2021have been fully considered and are persuasive.  
The:
Claim objection of claims 1-8;
Double patenting rejection of claims 1-5,7 and 8;
Double patenting rejection of claim 6;
35 USC 103 rejection of claims 1,2,5,6 and 7
35 USC 103 rejection of claim 3;
35 USC 103 rejection of claim 4; and
35 USC 103 rejection of claim 8 
in the Office action of 12/9/2021 has been withdrawn. 
	Thus all objections and rejections are withdrawn.



Allowable Subject Matter
Claims 1-8 are allowed.
The following is an examiner’s statement of reasons for allowance:
The claims are allowed for the same reasons as in applicant’s remarks, pages 6-8.
For example, the cited art does not teach the claimed image as delaminated in the context of the last limitation (claim 1, lines 14-17: analyzed as shown below).
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”












Claim 1 is/are analyzed in view of IDS cited Rhoads et al. (US Patent App. Pub. No.: US 2014/0080428 A1) in view of previously cited RODRIGUEZ et al. (US Patent App. Pub. No.: US 2019/0294932 A1) and Oramas et al. (MULTI-LABEL MUSIC GENRE CLASSIFICATION FROM AUDIO, TEXT, AND IMAGES USING DEEP FEATURES) further in view of Xue et al. (Deep Texture Manifold for Ground Terrain Recognition) and Jung et al. (US 2017/0206579 A1).
Regarding claim 1, Rhoads teaches a method for object detection and identification, the method comprising:
receiving (expressing the result of receive via “received”, cited below: [0566], as indicated in figs. 1 and 3:zig-zag lines, represented in figs. 41,42,43,46A as an image of a drill, under the Eiffel Tower, a house and the statue of Prometheus), by (as indicated in fig. 3) a computing device (fig. 3:antenna: “CROWN CASTLE AMERICAN TOWER SBA COMM…”), an image (or “image” of “Godzilla”, cited below: [0015]) from a user device (or “FIG 0”: box with buttons), wherein the image is screenshot captured by the user device (said fig. 0: box with buttons) from a display (comprised by said fig. 0: box with buttons); 






classifying (expressing the result of the verb class resulting in “categories” represented in fig. 41:top-right: “…Image eigenvalues; Image classifier concludes ‘drill’…” and fig. 41: “SEARCH ENGINE”: detailed view: fig. 46A:bottom: “DETERMINE WHICH IMAGE METRICS RELIABLY GROUP LIKE-CLASSED IMAGE TOGETHER, AND DISTINGUISH DIFFERENTLY-CLASSED IMAGES”), by (said as indicated in fig. 3) the computing device (said fig. 3:antenna: “CROWN CASTLE AMERICAN TOWER SBA COMM…”), the image (said or “image” of “Godzilla”), wherein the image (said or “image” of “Godzilla”) is classified (into said “categories”) based on features (or “image features/characteristics/metrics”) present in the image (said or “image” of “Godzilla”); 














detecting (expressing the result of detect via comparing via fig. 46B: “PERFORM SIMILARITY TESTING BETWEEN INPUT IMAGE AND EACH IMAGE IN SET 1” represented in fig. 41: “SEARCH ENGINE”), by (said as indicated in fig. 3) the computing device (said fig. 3:antenna: “CROWN CASTLE AMERICAN TOWER SBA COMM…”), one or more objects (comprised by “objects in the captured frame”) contained within the image (said or “image” of “Godzilla”), wherein the one or more objects is or are delaminated (or divided via classification into “other classes”, [0743], 1st S) into retail (via “bookstore”, [0302], 1st S regarding fig. 41: bottom: “… ‘Buy,’  ‘Sell,’… ” or “consumer product/non-consumer product”, [0743], 2nd S, comprising “at either wholesale or retail”1) objects (via said “other classes” such that “The book is quickly recognized”, [0302] 2nd S in a respective class at wholesale or retail) and non-retail (via fig. 41:bottom: “… ‘Buy,’ ‘Sell,’… ” or “consumer product/non-consumer product”a, [0743], 2nd S, comprising “at either wholesale or retail”) objects (via said “other classes” regarding said “… ‘Buy,’ ‘Sell,’… ” at either wholesale or retail); and wherein each of the one or more objects (said comprised by “objects in the captured frame”) is a salient (via “salient” “feature” “metrics”) object (comprised by “objects in the captured frame” measured via said “salient” “feature” “metrics”); 






identifying (expressing the result of identify such that “each object is identified” via “classification techniques…used to identify”, [0474], 2nd S), by (said as indicated in fig. 3) the computing device (said fig. 3:antenna: “CROWN CASTLE AMERICAN TOWER SBA COMM…”), each of the one or more objects (comprised by “objects in the captured frame”) detected in the image (said or “image” of “Godzilla”), wherein each of the one or more objects is identified (via said such that “each object is identified”) using multi-modal learning techniques, and wherein the multi-modal learning techniques comprise a Barnes-Hut approximation; and 














identifying (expressing the result of identify via either of an “addressing” “scheme” or “identified” “sources” represented in fig. 41: “DATABASE OF PUBLIC IMAGES, IMAGE-RELATED FACTORS, AND OTHER METADATA”, twice), by (said as indicated in fig. 3) the computing device (said fig. 3:antenna: “CROWN CASTLE AMERICAN TOWER SBA COMM…”), one or more sources (or “the cloud resource” via said “addressing” “scheme”, represented in fig. 10A: “oval on the left”, or “Collections…and other content…resources can also serve as” “identified” “sources”) resulting in identified sources (via said “identified” “sources” “such as the GM trucks web site, Flickr, and a fan site devoted to identifying vehicles in Hollywood motion pictures: IMCDB-dot-com”, [0875], 2nd S, represented in said fig. 41: “DATABASE OF PUBLIC IMAGES, IMAGE-RELATED FACTORS, AND OTHER METADATA”, twice) of (“of” is used to indicated possession, connection or association”:Dictionary.com) each of (“of” is used to indicated possession, connection or association”) the one or more objects (comprised by “objects in the captured frame”: fig. 9: “My Car”: “ ’06 Vista”: vehicle objects) in the image (said or “image” of “Godzilla”), wherein the identified sources (or identified collections of images serving as sources of metadata such as said “My Car”: “ ’06 Vista”) include one or more locations (or store map locations via fig. 51: “PLACE”: ”THING”: “GROCERY”: “FOOD”: “(c) Identify local stores that sell”) where the retail objects can be acquired (via “houses for sale nearest in location”, [0572], 2nd S or “cars for sale...including…seller location”, [0574]), and one or more venues (or a stadium: “the Pepsi Center in Denver”: “the venue”, [0149], 2nd and 3rd Ss) where the non-retail objects can be viewed (via:


“[0014] Certain aspects of the technology detailed herein are introduced in FIG. 
0.  A user's mobile phone captures imagery (either in response to user command, or autonomously), and objects within the scene are recognized.  Information associated with each object is identified, and made available to the user through a scene-registered interactive visual "bauble" that is graphically overlaid on the imagery.  The bauble may itself present information, or may simply be an indicia that the user can tap at the indicated location to obtain a lengthier listing of related information, or launch a related function/application.”;

“[0015] In the illustrated scene, the camera has recognized the face in the foreground as "Bob" and annotated the image accordingly.  A billboard promoting the Godzilla movie has been recognized, and a bauble saying "Show Times" has been blitted onto the display--inviting the user to tap for screening information.”;

“[0105] Relatedly, it seems that there should be a common denominator set of "device-side" operations performed on visual data that will serve all cloud processes, including certain formatting, elemental graphic processing, and other rote operations.  Similarly, it seems there should be a standardized basic header and addressing scheme for the resulting communication traffic (typically packetized) back and forth with the cloud.”

“[0125] Elements of the foregoing are distilled in FIG. 10A, showing an implementation of aspects of the technology as a physical matter of (usually) software components.  The two ovals in the figure highlight the symmetric pair of software components which are involved in setting up a "human real-time" visual recognition session between a mobile device and the generic cloud or service providers, data associations and visual query results.  The oval on the left refers to "keyvectors" and more specifically "visual keyvectors." As noted, this term can encompass everything from simple JPEG compressed blocks all the way through log-polar transformed facial feature vectors and anything in between and beyond.  The point of a keyvector is that the essential raw information of some given visual recognition task has been optimally 
pre-processed and packaged (possibly compressed).  The oval on the left assembles these packets, and typically inserts some addressing information by which they will be routed.  (Final addressing may not be possible, as the packet may ultimately be routed to remote service providers--the details of which may not yet be known.) Desirably, this processing is performed as close to the raw sensor data as possible, such as by processing circuitry integrated on the same substrate as the image sensor, which is responsive to software instructions stored in memory or provided from another stage in packet form.”






“[0451] In turn, the cloud resource may alert the cell phone of any information it expects might be requested from the phone in performance of the expected operation, or action it might request the cell phone to perform, so that the cell phone can similarly anticipate its own forthcoming actions and prepare accordingly.  For example, the cloud process may, under certain conditions, request a further set of input data, such as if it assesses that data originally provided is not sufficient for the intended purpose (e.g., the input data may be an image without sufficient focus resolution, or not enough contrast, or needing further filtering).  Knowing, in advance, that the cloud process may request such further data can allow the cell phone to consider this possibility in its own operation, e.g., keeping processing modules configured in a certain filter manner longer than may otherwise be the case, reserving an interval of sensor time to possibly capture a replacement image, etc.”

“[0472] Collections of publicly-available imagery and other content are becoming more prevalent.  Flickr, YouTube, Photobucket (MySpace), Picasa, Zooomr, FaceBook, Webshots and Google Images are just a few.  Often, these resources can also serve as sources of metadata--either expressly identified as such, or inferred from data such as file names, descriptions, etc. Sometimes geo-location data is also available.”;

“[0477] After feature metrics for the image are determined, a search is conducted through one or more publicly-accessible image repositories for images with similar metrics, thereby identifying apparently similar images.  (As part of its image ingest process, Flickr and other such repositories may calculate eigenvectors, color histograms, keypoint descriptors, FFTs, or other classification data on images at the time they are uploaded by users, and collect same in an index for public search.) The search may yield the collection of apparently similar telephone images found in Flickr, depicted in FIG. 22.”;

“[0296] She touches the virtual shutter button, capturing a frame of high resolution imagery, and image analysis gets underway--trying to recognize what's in the field of view, so that the camera application can overlay graphical links related to objects in the captured frame.  (Or this may happen without user action--the camera may be watching proactively.)”;

“[0566] An illustrative usage model is as follows.  A system responds to an image 128 (either optically captured or wirelessly received) by displaying a collection of related images to the user, on the cell phone display.  For example, the user captures an image and submits it to a remote service.  The service determines image metrics for the submitted image (possibly after pre-processing, as detailed above), and searches (e.g., Flickr) for visually similar images.  These images are transmitted to the cell phone (e.g., by the service, or directly from Flickr), and they are buffered for display.  The service can prompt the user, e.g., by instructions presented on the display, to repeatedly press the right-arrow button 116b on the four-way controller (or press-and-hold) to view a sequence of pattern-similar images (130, FIG. 45A).  Each time the button is pressed, another one of the buffered apparently-similar images is displayed.”; 
“[0663] A fixed set of image assessment criteria can be applied to distinguish images in the three categories.  However, the detailed embodiment determines such criteria adaptively.  In particular, this embodiment examines the set of images and determines which image features/characteristics/metrics most reliably (1) group like-categorized images together (similarity); and (2) distinguish differently-categorized images from each other (difference).  Among the attributes that may be measured and checked for similarity/difference behavior within the set of images are dominant color; color diversity; color histogram; dominant texture; texture diversity; texture histogram; edginess; wavelet-domain transform coefficient histograms, and dominant wavelet 
coefficients; frequency domain transfer coefficient histograms and dominant frequency coefficients (which may be calculated in different color channels); eigenvalues; keypoint descriptors; geometric class probabilities; symmetry; percentage of image area identified as facial; image autocorrelation; low-dimensional "gists" of image; etc. (Combinations of such metrics may be more reliable than the characteristics individually.)
[0664] One way to determine which metrics are most salient for these purposes 
is to compute a variety of different image metrics for the reference images.  If the results within a category of images for a particular metric are clustered (e.g., if, for place-centric images, the color histogram results are clustered around particular output values), and if images in other categories have few or no output values near that clustered result, then that metric would appear well suited for use as an image assessment criteria.  (Clustering is commonly performed using an implementation of a k-means algorithm.)”;

and

a	Dictionary.com:
consumer
noun
2	Economics. a person or organization that uses a commodity or service.

wherein “commodity” is defined:
commodity
noun, plural com·mod·i·ties.
1	an article of trade or commerce, especially a product as distinguished from a service.

wherein “trade” is defined:
trade
noun
1	the act or process of buying, selling, or exchanging commodities, at either wholesale or retail, within a country or between countries:
domestic trade; foreign trade.).



Thus, Rhoads does not teach, as indicated in bold above, the claimed:
A.	“retail objects and non-retail objects”; 
B.	“using multi-modal learning techniques, and wherein the multi-modal learning techniques comprise a Barnes-Hut approximation”;
C.	“wherein the identified sources include… one or more venues where the non-retail objects can be viewed”.

















Accordingly Rodriguez teaches:
A.	wherein the one or more objects (fig. 5:510) are delaminated (or classified) into retail objects (via fig. 5:516:items for sale) and non-retail (“non-“ by definition is “indicating the opposite of something”) objects (via fig. 5:512,514: people and shopping carts indicating the opposite of retail or selling: indicating purchasing: the person and the shopping kart is not performing the action of selling).
C.	wherein the identified sources (or giving “source”b  or giving any thing or place from which something comes, arises, or is obtained to specific goods via manufacturers with trademarks via “branded items”, [0040], 1st S, comprising “carrying the brand or trademark of a manufacturer”b) include…one or more venues where the non-retail objects can be viewed (via “the person’s field of view relative to certain items”, [0015], last S).






















b	Dictionary.com:
branded
adjective
2	Commerce. carrying the brand or trademark of a manufacturer:
branded merchandise.

wherein “trademark” is defined:
trademark
noun
1	any name, symbol, figure, letter, word, or mark adopted and used by a manufacturer or merchant in order to designate specific goods and to distinguish them from those manufactured or sold by others. A trademark is proprietary and is usually registered with the Patent and Trademark Office to assure its exclusive use by its owner or licensee.

wherein “manufacturer” is defined:
manufacturer
noun
2	a person, group, or company that manufactures.

wherein “manufactures” is defined:
manufacture
noun
verb (used with object), man·u·fac·tured, man·u·fac·tur·ing.
4	to make or produce by hand or machinery, especially on a large scale.

wherein “make” is defined:
make1
verb (used with object), made, mak·ing.
9	to give rise to; occasion:
It's not worth making a fuss over such a trifle.

wherein “rise” is defined:
rise
noun
45	origin, source, or beginning:
the rise of a stream in a mountain.

wherein “source” is defined:
source
noun
1	any thing or place from which something comes, arises, or is obtained; origin:
Which foods are sources of calcium?


	Thus, one of ordinary skill in the art of classifiers with stores can modify Rhoads’ said classification into “other classes” with Rodriguez’s teaching of said fig. 5:510 by:
a)	having BOB go to the store as shown in Rodriguez’s fig. 2;
b)	making Rhoads said fig. 0: box with buttons be as Rodriguez’s fig. 3:330: “PERSONAL DEVICE” or fig. 3:335: EMPLOYEE DEVICE” by capturing an image of people shopping as shown in Rodriguez’s fig. 5:510; 
c)	making Rhoads’ classification into “other classes” be as Rodriguez’s teaching of fig. 5:516:items for sale and Rodriguez’s teaching of fig. 5:512:514 people with shopping karts indicating the opposite of retail: purchasing by classifying said image of people with karts and items as shown in Rodriguez’s fig. 5:510; and
d)	recognizing that the modification is predictable or looked forward to because 
Rhoads teaches “analysis/identification of the image within other classes can naturally be employed”, Rhoads [0743], “as would be expected”, wherein “naturally” is defined via Dictionary.com:
naturally
adverb
3	of course; as would be expected; needless to say; and
Rodriguez’s teaching of figs 4 and 5 is “efficient processing of image data”, [0027], 1st S.

	












Thus, the combination Rhoads and Rodriguez do not teach:
B.	“using multi-modal learning techniques, and wherein the multi-modal learning techniques comprise a Barnes-Hut approximation”; and
C.	wherein the identified sources include… one or more venues where the non-retail objects can be viewed.










Oramas teaches claim 1 of B:
B.	using (or “exploit”) multi-modal learning techniques (via a “multimodal…learning approach”), and wherein the multi-modal learning techniques (said via a “multimodal…learning approach”) comprise a Barnes-Hut approximation (via:
section: 1 INRODUCTION, 2nd paragraph:
“To this end, we present MuMu, a new large-scale multimodal dataset for multi-label music genre classification. MuMu contains information of roughly 31k albums classified into one or more 250 genre classes. For every album we analyze the cover image, text reviews, and audio tracks, with a total number of approximately 147k audio tracks and 447k album reviews. Furthermore, we exploit this dataset with a novel deep learning approach to learn multiple genre labels for every album using different data modalities (i.e., audio, text, and image). In addition, we combine these modalities to study how the different combinations behave.”).
	
Thus, one of ordinary skill in vector classification and image/video/audio in recognizing, as indicated in Rhoads:
“[0476] (Uses of vector characterizations/classifications and other image/video/audio metrics in recognizing faces, imagery, video, audio and other patterns are well known and suited for use in connection with certain embodiments of the present technology.  See, e.g., patent publications 20060020630 and 20040243567 (Digimarc), 20070239756 and 20020037083 (Microsoft), 20070237364 (Fuji Photo Film), U.S.  Pat.  No. 7,359,889 and U.S. Pat.  No. 6,990,453 (Shazam), 20050180635 (Corel), U.S.  Pat.  No. 6,430,306, U.S.  Pat.  No. 6,681,032 and 20030059124 (L-1 Corp.), U.S.  Pat.  No. 
7,194,752 and U.S.  Pat.  No. 7,174,293 (Iceberg), U.S.  Pat.  No. 7,130,466 (Cobion), U.S.  Pat.  No. 6,553,136 (Hewlett-Packard), and U.S.  Pat.  No. 6,430,307 (Matsushita), and the journal references cited at the end of this disclosure.  When used in conjunction with recognition of entertainment content such as audio and video, such features are sometimes termed content ‘fingerprints’ or ‘hashes.’)”

can modify Rhoads’ said “each object is identified”, as modified via the combination of Rodriguez, with Oramas’ said  “multimodal… learning approach” by:




a)	having “Jane” (Rhoads: cited below and fig. 62) and BOB (Rhoads: fig. 0: 
“BOB” and fig. 62) go see Godzilla! and attend a “Paul Simon” (Rhoads: cited below)
concert;
b)	making Rhoads’ “object” of said “each object is identified”, as modified via the combination of Rodriguez, be the “image” of the “different data modalities (i.e., audio, text, and image)” of Oramas;
c)	making Rhoads’ “ ‘Jane's review: Pretty Good!’ ” be the “text” of said “different data modalities (i.e., audio, text, and image)” of Oramas via Rhoads:

“[0016] The phone has recognized the user's car from the scene, and has also identified--by make and year--another vehicle in the picture.  Both are noted by overlaid text.  A restaurant has also been identified, and an initial review from a collection of reviews ("Jane's review: Pretty Good!") is shown.  Tapping brings up more reviews.”;

d)	making Rhoads’ fig. 20A: “Image Classification”, as already modified via the combination of Rodriguez, or Rhoads’ fig. 20A: “Image/Facial Recognition” be as Oramas’ “classification from these images” via Oramas:
“5.3 Image-based Approach 
Every album in the dataset has an associated cover art image. To perform music genre 
classification from these images, we use Deep Residual Networks (ResNets) [11]. They are the state-of-the-art in various image classification tasks like Imagnet [35] and 
Microsoft COCO [19]. ResNet is a common feed-forward CNN with residual learning, 
which consists on bypassing two or more convolution layers. We employ a slightly 
modified version of the original ResNet 5 : the scaling and aspect ratio augmentation 
are obtained from [41], the photometric distortions from [12], and weight decay is 
applied to all weights and biases. The network we use is composed of 101 layers 
(ResNet101), initialized with pretrained parameters learned on ImageNet. This is our 
starting point to finetune the network on the genre classification task. Our ResNet 
implementation has a logistic regression final layer with sigmoid activations and uses 
the binary cross entropy loss.”;







e)	inputting Rhoads’ “ ‘Jane's review: Pretty Good!’ ” to “genre classification from text” via Oramas:	

“5.2 Text-based Approach 
In the presented dataset, each album has a variable number of customer reviews. We use an approach similar to [13, 29] for genre classification from text, where all reviews from the same album are aggregated into a single text. The aggregated result is truncated at 1000 characters, thus balancing the amount of text per album, as more popular artists tend to have a higher number of reviews. Then we apply a Vector Space Model approach (VSM) with tfidf weighting [47] to create a feature vector for each album. Although word embeddings [25] with CNNs are state-ofthe-art in many text classification tasks [15], a traditional VSM approach is used instead, as it seems to perform better when dealing with large texts [31]. The vocabulary size is limited to 10k as it was a good balance of network complexity and accuracy.”;

f)	classifying or recognizing via said “multimodal…learning approach” based on the image of the “object” and Rhoads’ “ ‘Jane's review: Pretty Good!’ ”; 
g)	making a similar modification regarding “collection of reviews” (Rhoads: cited above [0016]) and “iTunes” and “image of Paul Simon” via Rhoads:
“[0151] As another example, consider a Facebook user who has earned, or paid for, or otherwise received credit that can be applied to certain services--such as for downloading songs from iTunes, or for music recognition services, or for identifying clothes that go with particular shoes (for which an image has been submitted), etc. These services may be associated with the particular Facebook page, so that friends can invoke the services from that page--essentially spending the host's credit (again, with suitable authorization or invitation by that hosting user).  Likewise, friends may submit images to a facial recognition service accessible through an application associated with the user's Facebook page.  Images submitted in such fashion are analyzed for faces of the host's friends, and identification information is returned to the 
submitter, e.g., through a user interface presented on the originating Facebook page.  Again, the host may be assessed a fee for each such operation, but may allow authorized friends to avail themselves of such service at no cost.”; and










“[0750] In another example, a first user snaps an image of Paul Simon at a concert.  The system automatically posts the image to the user's Flickr account--together with metadata inferred by the procedures detailed above.  (The name of the artist may have been found in a search of Google for the user's geolocation; e.g., a Ticketmaster web page revealed that Paul Simon was playing that venue that night.) The first user's picture, a moment later, is encountered by a system processing a second concert-goer's photo of the same event, from a different vantage.  The second user is shown the first user's photo as one of the system's responses to the second photo.  The system may 
also alert the first user that another picture of the same event--from a different viewpoint--is available for review on his cell phone, if he'll press a certain button twice.”; 

and

h)	recognizing that the combination is predictable or looked forward to because the
modification “improves the results” or achieves the “best” results (as shown in Oramas’
Table 2, in section 6.1 Audio Classification, showing different types of audio, text and 
Image classifications) regarding “how accurate the classification is” and is more 
accurate than image classification or recognition alone thus providing the improved 
classification accuracy, with respect to “single modality approaches”, of the “object”,
such as Rhoads’ fig. 0: “GODZILLA!” or Paul Simon or shopping kart or shoppers, in the
context of Rhoads’ “ ‘Jane's review: Pretty Good!’ ” via Oramas: 
section 4.2 Evaluation Metrics, 2nd paragraph:
“The output of a multi-label classifier is a label-item matrix. Thus, it can be evaluated either from the labels or the items perspective. We can measure how accurate the classification is for every label, or how well the labels are ranked for every item. In this work, the former point of view is evaluated with the AUC measure, which is computed for every label and then averaged. We are interested in classification models that strengthen the diversity of label assignments. As the taxonomy is composed of broad genres which are over-represented in the dataset (see Table 1), and more specific subgenres (e.g., Vocal Jazz, Britpop), we want to measure whether the classifier is focusing only on over-represented genres, or on more fine-grained ones. To this end, catalog coverage (also known as aggregated diversity) is an evaluation measure used in the extreme multi-label classification [14] and the recommender systems [32] communities. Coverage@k measures the percentage of normalized unique labels present in the top k predictions made by an algorithm across all test items. Values of k = 1, 3, 5 are typically employed in multi-label classification.”; and


section 6.4 Mulimodal Classification, 2nd paragraph:
“Results suggest that the combination of modalities outperforms single modality approaches. As image features are learned using a LOGISTIC configuration, they seem to improve multimodal approaches with LOGISTIC configuration only. Multimodal approaches that include text features tend to improve the results. Nevertheless, the best approaches are those that exploit the three modalities of MuMu. COSINE approaches have similar AUC than LOGISTIC approaches but a much better catalog coverage, thanks to the spatial properties of the factor space.”.

	Thus, the combination does not teach, as indicated in bold above, the claimed
“comprise a Barnes-Hut approximation”. Accordingly, Xue teaches:
comprise a Barnes-Hut approximation (or “Barnes-Hut tSNE” “to approximate the embedded distribution” via:
pages 2,3:
“The t-Distributed Stochastic Neighbor Embedding (tSNE) [20] provides a 2D embedding and Barnes-Hut tSNE [33] accelerates the original t-SNE from O(n2) to O(n log n). Both t-SNE and and Barnes-Hut t-SNE are non-parametric embedding algorithms, so there is no natural way to perform out-of-sample extension. Parametric
t-SNE [32] and supervised t-SNE [23, 24] introduce deep neural networks into data embedding and realize non-linear parametric embedding. Inspired by this work, we introduce a method for texture manifolds that treats the embedded distribution from non-parametric embedding algorithms as an output, and use a deep neural network to predict the manifold coordinates of a texture image directly. This texture manifold uses the features of the DEP network and is referred to as DEP-manifold.”

page 7:
“5. Texture Manifold
Inspired by Parametric t-SNE [32] and supervised t-SNE [23, 24], we introduce a parametric texture manifold approach that learns to approximate the embedded distribution of non-parametric embedding algorithms [20, 33] using a deep neural network to directly predict the 2D manifold coordinates for the texture images. We refer to this manifold learning method using DEP feature embedding as DEP-manifold. Following prior work [24,32], the deep neural network structure is depicted in Figure 6. Input features are the feature maps before the classification layer of DEP, which means each image is represented by a 128 dimensional vector. Unlike the experiment in [24, 32], we add non-linear functions (Batch Normalization and ReLU) before fully connected layers, and we do not pre-train the network with a stack of Restricted Boltzmann Machines (RBMs) [13]. We train the embedding network from scratch instead of the three-stage training procedure (pre-training, construction and fine-tuning) in parametric t-SNE and supervised t-SNE. We randomly choose 60000 images from the multi-scale GTOS dataset for the experiment. We experiment with DEP-parametric t-SNE, and DEP-manifold based on outputs from the last fully connected layer of DEP.”).
	Thus one of ordinary skill in the art of t-SNE can modify Rhoads’ said “each object is identified” as modified via the combination with Xue’s teaching of Barnes-Hut t-SNE by:
a)	using said “Barnes-Hut t-SNE” instead of “the original t-SNE”; and
b)	recognizing that the modification is predictable or looked forward to because the modification “accelerates the original t-SNE” (Xue: cited above) thus providing an “informative” “visual style” giving information or that is instructive, as shown in Oramas’ fig. 2: “Particular of the t-SNE…, regarding visual style faster than originally “using t-SNE” via Oramas, section 6.3 Image Classification, 2nd paragraph:
“In Figure 2 a set of cover images of five of the most frequent genres in the dataset is shown using t-SNE over the obtained image feature vectors. In the left top corner theResNet recognizes women faces on the foreground, which seems to be common in Country albums (red). The jazz albums (green) on the right are all clustered together probably thanks to the uniform type of clothing worn by the people of their covers. Therefore, the visual style of the cover seems to be informative when recognizing the album genre. For instance, many classical music albums include
an instrument in the cover, and Dance & Electronics covers are often abstract images with bright colors, rarely including human faces.”.

















Thus, the combination of Rhoads, Rodriguez, Oramas and Xue do not teach:
C.	wherein the identified sources include… one or more venues where the non-retail objects can be viewed.













Jung teaches:
C.	wherein the identified sources (via “recognize the manufacturers”, [0059], 2nd S: fig. 2: 102: “Mfg. by XYZ CO.”; fig. 5: “104: “Manufactured by ACME Company” each giving a source to clothing, cat food and widget) include one or more locations (via “goods…at other locations”, id.) where the retail (via a consumer, represented in fig. 2: hand holding 102 “in a retail location”, [0040], last S) objects (said “goods”) can be acquired, and one or more venues (via “a customer moves through…any suitable venue”, [0165], 1st S) where the non-retail (via figs. 1,3,4,5,6,7:104: “not a good, item, or service for sale…‘for display purposes only’”, [0049], 3rd S to last S) objects (or “other items”, [0049[, 3rd S, not for sale) can be viewed.
	












Thus, one of ordinary skill in the art of venues can modify the Pepsi Center in Denver, with Jung recognition of manufacturers by:
a)	making the combination’s store be as Jung’s figs. 2,3,4,5,6,7:100; fig. 8: “STORE”, fig.10:1000: “a particular check-out location”, [0147], last S (such as at the Pepsi Center);
b)	making Rhoads’ fig. 0:Box with buttons as modified via the combination of Rodriguez be as Jung’s fig. 2,3,4,5,6,7:102: a shopping device; ‘
c)	making the manufacturer’s cat food for sale to a consumer; 
d)	making a fake version of the cat food not for sale (for display purposes only) to a consumer;
e)	displaying the fake cat food to a consumer; and
f)	recognizing that the modification is predictable or looked forward to because the modification would “limit incentives for shoplifting, decrease expenses of acquiring stocked goods or items, and/or diminish expense associated with loss or damage of the exemplar”, Jung, [0049], 2nd S, such as a cat eating cat food in the store.







The combination does not result in claim 1. Rather, the combination changes the principle of operation (MPEP 2143.01 VI) of Rodriguez’s (and hence the combination’s) teaching of taking an image (Rodriguez, fig. 5:550) of retail (Rodriguez, fig. 5:516: groceries) and non-retail objects (Rodriguez, fig. 5:512,514: customer and shopping cart) and the final combination in view of Jung is in contrast to claim 1, lines 7,8’s “one or more objects contained within the image, wherein the one or more objects is or are delaminated into retail objects and non-retail objects”. 
Instead, the final combination is recognizing items (“recognizing an exemplar”, Jung, Abstract) that are not for sale such as the clothing worn by a model, fake can or cans of cat food and a fake widget (Jung, figs: 2,3,4,5:104).Thus, Jung does not teach the claimed delaminating into real food (the claimed “retail objects”) and into fake cat food (the claimed “non-retail objects”) and instead teaches “marked or stamped” (Jung, [0049], 5th S) fake food. Thus, to recreate applicant’s claim 1 (via modifying the final combination via making Jung’s fig. 3:104 or Rodriguez’s fig. 1:145:items on a shelf have both real and fake cat food and take a picture showing both real and fake cat food) in view of the above references would be more directed to improper hindsight of applicant’s disclosure than what the art teaches to one of ordinary skill of image classifiers and venues for shoppers.





Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENNIS ROSARIO whose telephone number is (571)272-7397. The examiner can normally be reached Monday-Friday, 9AM-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/DENNIS ROSARIO/Examiner, Art Unit 2667   

/MATTHEW C BELLA/Supervisory Patent Examiner, Art Unit 2667