Notice of Pre-AIA  or AIA  Status
	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
Claims 1-20 are pending.

DETAILED ACTION
In view of the Appeal Brief filed on 10/28/2020, PROSECUTION IS HEREBY REOPENED. A new ground rejection is set forth below.
To avoid abandonment of the application, appellant must exercise one of the following two options:
(1) file a reply under 37 CFR 1.111 (if this Office action is non-final) or a reply under 37 CFR 1.113 (if this Office action is final); or,
(2) initiate a new appeal by filing a notice of appeal under 37 CFR 41.31 followed by an appeal brief under 37 CFR 41.37. The previously paid notice of appeal fee and appeal brief fee can be applied to the new appeal. If, however, the appeal fees set forth in 37 CFR 41.20 have been increased since they were previously paid, then appellant must pay the difference between the increased fees and the amount previously paid.

/BORIS GORNEY/Supervisory Patent Examiner, Art Unit 2158                                                                                                                                                                                                        





Response to Arguments:

Claim Rejections – 35 USC § 112(b):
Applicant’s arguments to claim 18-20 have been considered under 35 U.S.C. 112(b) and found unpersuasive based on the same rationales as applied to the claims under 35 U.S.C. 112(f). The rejection of the claims under 35 U.S.C. 112(b) is maintained accordingly.

Claim Rejections – 35 USC § 112(a):
Applicant’s arguments to claim 18-20 has been considered under 35 U.S.C. 112(a) and found unpersuasive based on the same rationales as applied to the claims under 35 U.S.C. 112(f). The rejection of the claims under 35 U.S.C. 112(a) is maintained accordingly.

Claim Rejections – 35 USC § 103:
Applicant’s arguments with respect to the rejection under 35 USC § 103 have been considered but moot in view of the new ground(s) rejection.


CLAIM INTERPRETATION
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims 18-20 in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without means for assigning second-vocabulary tags taken from a second-vocabulary set to the plurality of digital assets through machine learning, the plurality of digital assets having first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic; means for determining that at least one said first-vocabulary tag includes a plurality of visual classes, the determining based on the assigning of at least one said second-vocabulary tag; means for collecting digital assets from the plurality of digital assets that correspond to one visual class of the plurality of visual classes; means for training the model using machine learning to assign the at least one said first-vocabulary tag to a subsequent digital image having the one visual class of the plurality of visual classes based on the collected digital assets; means for assigning a second-vocabulary tag taken from the second-vocabulary set to the subsequent digital asset through machine learning; means for locating the model from a plurality of said models based at least in part on the assigned second-vocabulary tag; and means for determining a probability based on the located model through machine learning that the subsequent digital asset corresponds to a digital asset characteristic associated with the located model; and means for assigning a respective said first-vocabulary tag associated with the located model to the subsequent digital asset based at least in part on the determined probability” in claim 18. 
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 18-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention.
Claim limitations “means for assigning second-vocabulary tags taken from a second-vocabulary set to the plurality of digital assets through machine learning, the plurality of digital assets having first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic; means for determining that at least one said first-vocabulary tag includes a plurality of visual classes, the determining based on the assigning of at least one said second-vocabulary tag; means for collecting digital assets from the plurality of digital assets that correspond to one visual class of the plurality of visual classes; means for training the model using machine learning to assign the at least one said first-vocabulary tag to a subsequent digital image having the one visual class of the plurality of visual classes based on the collected digital assets; means for assigning a second-vocabulary tag taken from the second-vocabulary set to the subsequent digital asset through machine learning; means for locating the model from a plurality of said models based at least in part on the assigned second-vocabulary tag; and means for determining a probability based on the located model through machine learning that the subsequent digital asset corresponds to a digital asset characteristic associated with the located model; and means for assigning a respective said first-vocabulary tag associated with the located model to the subsequent digital asset based at least in part on the determined probability” in claim 18 invoke 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. The Specification is devoid of adequate structure to perform the claimed function; thus, the metes and bounds of the claims are unknown.  Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.

Applicant may:

(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 


Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a)  IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

	Claims 18-20 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for pre-AIA  the inventor(s), at the time the application was filed, had As described above, the disclosure does not provide adequate structure to perform the claimed functions.  The Specification does not demonstrate that Applicant has made an invention that achieves the claimed functions because the invention is not described with sufficient detail such that one of ordination skill in the art can reasonably conclude that the inventor had possession of the claimed invention.


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(d):
(d) REFERENCE IN DEPENDENT FORMS.—Subject to subsection (e), a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), fourth paragraph:
Subject to the [fifth paragraph of 35 U.S.C. 112 (pre-AIA )], a claim in dependent form shall contain a reference to a claim previously set forth and then specify a further limitation of the subject matter claimed. A claim in dependent form shall be construed to incorporate by reference all the limitations of the claim to which it refers.


Claim 3 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the The limitation “wherein the generated model is a support vector machine binary classifier model” fails to further limit the subject matter of the independent claim 1; in particular, the limitation is not a functional limitation.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.

Claim 12 is rejected under 35 U.S.C. 112(d) or pre-AIA  35 U.S.C. 112, 4th paragraph, as being of improper dependent form for failing to further limit the subject matter of the claim upon which it depends, or for failing to include all the limitations of the claim upon which it depends.  The limitation “wherein the model is a support vector machine binary classifier model” fails to further limit the subject matter of the independent claim 10; in particular, the limitation is not a functional limitation.  Applicant may cancel the claim(s), amend the claim(s) to place the claim(s) in proper dependent form, rewrite the claim(s) in independent form, or present a sufficient showing that the dependent claim(s) complies with the statutory requirements.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 1-9 are rejected under 35 U.S.C. 103 as being unpatentable over US Pub No 2015/0036919 by Bourdev et al in view of US Pub No 2016/0379091 by Lin, hereinafter “Lin-091”.

Regarding independent claim 1, Bourdev teaches “a method implemented by at least one computing device, the method comprising:”
“obtaining, by the at least one computing device, a plurality of digital assets having first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic” ([0029-31])
([0029] A social networking system may also provide or support the ability to indicate, identify, categorize, label, describe, or otherwise provide information about an item of content or attributes about the content. One way to indicate such a tag that may identify or otherwise relate to subject matter of the content or its attributes (“each said first-vocabulary tag indicative of a respective digital asset characteristic”). Another way to indicate such information is through global positioning system (GPS) coordinates of a user uploading content to identify the location of the upload, or where the content was captured. As described in more detail herein, there are many other ways to indicate information about content in social networking systems. Many such indicators, including tags (e.g., hashtag or other metadata tag) and GPS system coordinates, are nonvisual, and are not based on automated analysis of visual data in the content.
[0031] Although the subjectivity of nonvisual indicators helps users of social networking systems to creatively express and share a rich variety of content, the subjectivity of nonvisual indicators often makes it difficult to search user-uploaded images, such as photographs (“a plurality of digital assets”). For instance, an attempt to search images posted to a social networking system for "cats" may reveal an image of a user in a Catwoman costume on Halloween. An attempt to perform a graphical search for images of the Eiffel Tower in Paris, France, may lead to images of a dog named "Paris." An attempt to search for Super Bowl photographs may reveal personal photographs of a fan's family that may not be highly relevant to someone looking for first-hand accounts a football game (“first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic”). In a sense, the nonvisual indicators associated with the images in these examples are "noisy" in that they may not accurately reflect the contents of the image they are associated with. It would be desirable to accurately search user-uploaded content in social networking systems.)
“assigning, by the at least one computing device, second-vocabulary tags taken from a second-vocabulary set to the plurality of digital assets through machine learning” ([0036] [0037])
([0036] An image class (“second-vocabulary tags”) may include, for example, objects (e.g., a cat, car, person, purse, etc.), brands or objects associated with brands (e.g., Coca-Cola.RTM., Ferrari.RTM.), professional sports teams (e.g., the Golden State Warriors.RTM.), locations (e.g., Mount Everest), activities (e.g., swimming), phrases or concepts (e.g., a red dress, happiness), and any other thing, action, or notion that can be associated with content. While many examples provided herein may refer to a single "image class," it is noted that the image class may refer to a plurality of image classes or one or more image classes comprising an amalgamation of objects, brands, professional sports teams, locations (“second-vocabulary tags taken from a second-vocabulary set”), etc.
[0037] In some embodiments, the image classification module 104 may use a trained classifier to compare visual attributes of an evaluation set of images with visual attributes of the image class and to determine whether visual attributes in an evaluation set of images can be sufficiently correlated with visual attributes of the image class (“assigning… second-vocabulary tags”). An evaluation set of images may include a group of images selected for classification by a classifier. In various embodiments, the evaluation set of images may include all or a portion of the images in a datastore, or all or a portion of the images in a social networking system. In an embodiment, the classifier may be trained by any suitable technique, such as machine learning (“through machine learning”).)
“determining, by the at least one computing device, that at least one said first-vocabulary tag includes a plurality of visual classes, the determining based on the assigning of at least one said second-vocabulary tag” ([0045])
([0045] During the training phase, the image classification training module 204 may be configured to identify and select contextual cues that correspond to the image class. In various embodiments, the image classification training module 204 may For example, the image classification training module 204 may determine that one type of tags is likely to accompany a photo of a domestic housecat, while another type of tags is likely to accompany a photo of a user in a Catwoman costume on Halloween (“determining… that at least one said first-vocabulary tag includes a plurality of visual classes”). In such a case, the image classification training module 204 may select the type of tags that is likely to accompany a photo of a domestic housecat to correspond to the image class of a cat. As discussed in more detail herein, consideration of whether contextual cues apply to a particular image class may be based on many considerations, such as tags (e.g., the tag "#cat", the tag "#Halloween", etc.), the order of tags, whether particular tags are accompanied by other particular tags (e.g., whether the tag "#cat" is accompanied by the tag "#animal" or whether the tag "#cat" is accompanied by the tag "#Halloween"), etc. The image classification training module 204 may also be configured to rank and/or score the extent that the contextual cues associated with a particular image correspond to a particular image class (“based on the assigning of at least one said second-vocabulary tag
“collecting, by the at least one computing device, digital assets from the plurality of digital assets that correspond to one visual class of the plurality of visual classes” ([0060] [0036])
([0060] In some embodiments, the image class correlation module 308 may analyze the syntax of image tags of the sample set of images. The image class correlation module 308 may determine how likely a specific syntax correlates with a given image class. In some embodiments, syntactical analysis of the image tags may involve assigning weights to the exact language of the image tags. That is, the image class correlation module 308 may determine that the exact wording of tags associated with an image indicates that the tags should be correlated with an image class. For instance, an image may be tagged with the image tag "#domestic housecat." The image class correlation module 308 may determine that a tag "#domestic housecat" correlates to a high degree with (leani for images of domestic housecats (“collecting…digital assets from the plurality of digital assets that correspond to one visual class”). As another example, the image class correlation module 308 may determine that a tag "#domestic house market" correlates to a low degree with the image class for domestic housecats.
[0036] An image class may include, for example, objects (e.g., a cat, car, person, purse, etc.), brands or objects associated the image class may refer to a plurality of image classes or one or more image classes comprising an amalgamation of objects, brands, professional sports teams, locations, etc.)
Bourdev does not explicitly teach, “In a digital medium environment to generate a model usable to tag digital assets, a method implemented by at least one computing device, the method comprising: training, by the at least one computing device, the model using machine learning to assign the at least one said first-vocabulary tag to a subsequent digital image having the one visual class of the plurality of visual classes, the training based on the collected digital images; and outputting, by the at least one computing device, the trained model”.
Lin-091 teaches,
“In a digital medium environment to generate a model usable to tag digital assets; training, by the at least one computing device, the model using machine learning to assign the at least one said- first vocabulary tag to a subsequent digital image having the one visual class of the plurality of visual classes, the training based on the collected digital images” ([0025] [0031])
([0031] As used herein, the term "classifier algorithm" is used to refer to an algorithm executed by one or more processing devices that identifies one or more associations between the semantic content of an image and a class of semantically similar images (“In a digital medium environment to generate a model usable to tag digital assets”). For example, a classifier algorithm may analyze training images with certain recurring objects, color schemes, or other semantic content and determine that the objects, color schemes, or other semantic content are indicative of a certain class of content (e.g., "dogs," "vehicles," "trees," etc.). The classifier algorithm may apply the learned associations between different classes and different types of semantic content to classify subsequently received images. An example of a classifier algorithm is an algorithm that uses a neural network model to identify associations between certain semantic features and certain classes of semantic content. 
[0025] After training the classifier algorithm, the trained classifier algorithm can be used to accurately identify semantic similarities between untagged images and examples of tagged images (“training…the model using machine learning”). For one or more tags (e.g. "dog" and "car") can be automatically generated for the untagged image (“assign the at least one said- first vocabulary tag to a subsequent digital image”) using the tags of the semantically similar tagged image (“having the one visual class of the plurality of visual classes”).)
“outputting, by the at least one computing device, the trained model” ([0026])
([0026] In some embodiments, a classifier algorithm can be trained using a publicly available set of training images and then provided to a private asset management system. For example, a server system with more processing power may be used to train a neural network or other classier algorithm using tagged images from an online image-sharing service. After training the neural network, the server system provides the trained neural network model to a computing system (“outputting, by the at least one computing device, the trained model”) that manages private image assets. The computing system can use the trained neural network 
Bourdev and Lin-091 are analogous art because they both are directed to the same field of automatically generating tags to be applied to images. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have combined the teachings of Lin-091 with the method/system of Bourdev in order to provide users with a means for automatically selecting tags to be applied to an input image based on the semantic content of the input image as shown in [Lin-091: paragraph 0008].
	Bourdev in view of Lin-091 teaches “In a digital medium environment to generate a model usable to tag digital assets, a method implemented by at least one computing device”.

As to claim 2, Bourdev in view of Lin-091 teaches “wherein the respective digital asset characteristic identifies an object or a semantic class” (Bourdev: [0031]).

As to claim 3, Bourdev in view of Lin-091 teaches “wherein the generated model is a support vector machine binary classifier model” (Bourdev: [0070]).

As to claim 4, Bourdev in view of Lin-091 teaches “wherein the collecting includes collecting positive and negative examples of the one visual class from the plurality of digital assets and the generating of the model is based on the collected positive and negative examples” (Bourdev: [0070]).

As to claim 5, Bourdev in view of Lin-091 teaches “wherein at least one collected negative example includes a digital asset, to which, the at least one said first-vocabulary tag is not assigned” (Bourdev: [0070]).

As to claim 6, Bourdev in view of Lin-091 teaches “wherein the assigning of the second-vocabulary tags is performed using at least one model trained, by machine learning, using a plurality of training digital assets that are tagged using second-vocabulary tags taken from the second-vocabulary set” (Bourdev: [0058] [0037]).

As to claim 7, Bourdev in view of Lin-091 teaches “wherein the plurality of digital assets are digital images” (Bourdev: [0028]).

As to claim 8, Bourdev in view of Lin-091 teaches “wherein the first-vocabulary set includes first-vocabulary tags that are different than second-vocabulary tags of the second-vocabulary set” (Bourdev: [0045]).

As to claim 9, Bourdev in view of Lin-091 teaches “wherein the first-vocabulary set and the second-vocabulary set have different respective taxonomies” (Bourdev: [0045]).

Claim 10-17 are rejected under 35 U.S.C. 103 as being unpatentable over US Pub No 2015/0036919 by Bourdev et al in view of US Pub No 20170004383 by Lin et al, hereinafter “Lin-383”.

	Regarding independent claim 10, Bourdev teaches “a processing system; and a computer-readable storage medium having instructions stored thereon that, responsive to execution by a computing device, causes the computing device to perform operations comprising:” ([0029-31])
([0029] A social networking system may also provide or support the ability to indicate, identify, categorize, label, describe, or otherwise provide information about an item of content or attributes about the content. One way to indicate such information is through a tag that may identify or otherwise relate to subject matter of the content or its attributes. Another way to indicate such information is through global positioning system (GPS) coordinates of a user uploading content to identify the location of the upload, or where the content was captured. As described in more detail herein, there are many other ways to indicate information about content in social networking systems. Many such indicators, including tags (e.g., hashtag or other metadata tag) and GPS system coordinates, are nonvisual, and are not based on automated analysis of visual data in the content.)
“assigning a second-vocabulary tag taken from a second-vocabulary set to the digital asset through machine learning” ([0036-37])
([0036] An image class (“second-vocabulary tags”) may include, for example, objects (e.g., a cat, car, person, purse, etc.), brands or objects associated with brands (e.g., Coca-Cola.RTM., Ferrari.RTM.), professional sports teams (e.g., the Golden State a plurality of image classes or one or more image classes comprising an amalgamation of objects, brands, professional sports teams, locations (“second-vocabulary tags taken from a second-vocabulary set”), etc.
[0037] In some embodiments, the image classification module 104 may use a trained classifier to compare visual attributes of an evaluation set of images with visual attributes of the image class and to determine whether visual attributes in an evaluation set of images can be sufficiently correlated with visual attributes of the image class (“assigning… second-vocabulary tags”). An evaluation set of images may include a group of images selected for classification by a classifier. In various embodiments, the evaluation set of images may include all or a portion of the images in a datastore, or all or a portion of the images in a social networking system. In an embodiment, the classifier may be trained by any suitable technique, such as machine learning (“through machine learning
“a plurality of visual classes that are each associated with a single first-vocabulary tag taken from the first-vocabulary set; the locating of the model is based at least in part on the assigned second-vocabulary tag” ([0030] [0045])
([0030] In certain circumstances, nonvisual indicators may be subjective or potentially misleading. For example, although the tags that a content generator chooses to apply to his or her own content may describe the subject matter of the content from the perspective of the content generator, the tags may be deemed misdescriptive or even irrelevant from the perspective of others. A user posting a picture of herself dressing up as Catwoman on Halloween, for instance, may tag the picture as a "#cat," (“a single first-vocabulary tag”) even though the picture does not contain a domestic housecat. A user posting a picture of a dog named "Paris" may tag the picture with the tag "#paris," even though the picture does not depict Paris, France. A user posting images that he captured of his family at the Super Bowl in New Orleans on Super Bowl Sunday may have GPS coordinates and/or time stamps that indicate the images were captured at the Super Bowl, but the content of the images themselves may not relate to a football game.
[0045] During the training phase, the image classification training module 204 may be configured to identify and select For example, the image classification training module 204 may determine that one type of tags is likely to accompany a photo of a domestic housecat, while another type of tags is likely to accompany a photo of a user in a Catwoman costume on Halloween (“that are each associated with a single first-vocabulary tag”). In such a case, the image classification training module 204 may select the type of tags that is likely to accompany a photo of a domestic housecat to correspond to the image class of a cat. As discussed in more detail herein, consideration of whether contextual cues apply to a particular image class may be based on many considerations, such as tags (e.g., the tag "#cat", the tag "#Halloween", etc.), the order of tags, whether particular tags are accompanied by other particular tags (e.g., whether the tag "#cat" is accompanied by the tag "#animal" or whether the tag "#cat" is accompanied by the tag "#Halloween"), etc. The image classification training module 204 may also be configured to rank and/or score the extent that the contextual cues associated with a particular image correspond to a particular image class based at least in part on the assigned second-vocabulary tag”).)
Bourdev does not explicitly teach, “In a digital medium environment to use a model to tag a digital asset according to a first-vocabulary set, a system comprising: locating a model from a plurality of models, in which: the plurality of models correspond to a plurality of visual classes that are each associated with a single first-vocabulary tag taken from the first-vocabulary set; determining a probability that the digital asset corresponds to a digital asset characteristic associated with the single first-vocabulary tag”.
Lin-383 discloses, 
“In a digital medium environment to use a model to tag a digital asset according to a first-vocabulary set, a system comprising: locating a model from a plurality of models, in which: the plurality of models correspond to a plurality of visual classes” ([0029] [0056])
([0029] Semantic similarity can be determined between two or more images by employing a neural network or other classifier algorithm (“to use a model… locating a model”) executed by one or more processing devices. The network or algorithm can identify one or more associations between the semantic content of an image and a class of semantically similar images (“visual class”). For example, a neural network or other classifier 
[0056] In some embodiments, selection of the one or more relevant images can dynamically generate a new visually-based query based on the selected one or more relevant images. To this end, the user can initiate a subsequent visually-based search using the new visually-based query comprising the one or more selected relevant images. Consequently, the search results would become more relevant as this process is iteratively performed. the neural network or other classifier algorithms (“from a plurality of models”) and its capability of determining such differences.
“determining a probability that the digital asset corresponds to a digital asset characteristic associated with the single first-vocabulary tag” (Lin-383 [0025])
([0025] Upon obtaining the first set of result images, one or more images from the first set of result images can be used to generate a visually-based query. The one or more images from the first set of result images used to generate the visually-based query can be determined based on an association score generated by the text-based search engine (“determining a probability”). an image having at least two tags "Eiffel tower" and "fireworks" (“the digital asset corresponds to a digital asset characteristic”)) could produce a relatively high association score (“determining a probability”), whereas a partial hit of keywords from the text-based query (i.e., an image having one of tags "Eiffel tower" or "fireworks") could produce a relatively medium association score, whereas no or minimal hit of keywords from the text-based query (i.e., an image having neither tags "Eiffel tower" nor "fireworks", or maybe just "Eiffel" or "tower" or "fire") could produce a relatively low association score.)
Bourdev and Lin-383 are analogous art because they both are directed to the same field of automatically generating tags to be applied to images. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have combined the teachings of Lin with the method/system of Bourdev in order to provide users with a means for using images generated from the text-based image search to generate one or more image queries used to conduct a subsequent visually-based image search as shown in [Lin: para. 0005].
Bourdev in view of Lin teaches “In a digital medium environment to use a model to tag a digital asset according to a first-vocabulary set, a system comprising: locating a model from a plurality of models, in which: the plurality of models correspond to a plurality of visual classes that are each associated with a single first-vocabulary tag taken from the first-vocabulary set; determining a probability based on the located model through machine learning that the digital asset corresponds to a digital asset characteristic associated with the single first-vocabulary tag of the located model”.

As to claim 11, Bourdev in view of Lin-383 teaches “wherein the digital asset characteristic identifies an object or a semantic class” (Bourdev: [0031]).

As to claim 12, Bourdev in view of Lin-383teaches “wherein the model is a support vector machine binary classifier model” (Bourdev: [0070]).

As to claim 13, Bourdev in view of Lin-383 teaches “further comprising a tag assignment module implemented at least partially in hardware of the at least one computing device to assign the single first-vocabulary tag to the digital asset based at least in part on the determined probability” (Bourdev: [0058]).

As to claim 14, Bourdev in view of Lin-383 teaches “wherein the second-vocabulary tagging module is configured to assign the second-vocabulary tag based on at least one model trained, by machine learning, using a plurality of training digital assets that are tagged in compliance with the second-vocabulary set” (Bourdev: [0058] [0037]).

As to claim 15, Bourdev in view of Lin-383 teaches “wherein the plurality of digital assets are digital images” (Bourdev: [0028]).

As to claim 16, Bourdev in view of Lin-383 teaches “wherein the first-vocabulary set includes first-vocabulary tags that are different than second-vocabulary tags of the second-vocabulary set” (Bourdev: [0045]).

As to claim 17, Bourdev in view of Lin-383 teaches “wherein the first-vocabulary set and the second-vocabulary set have different respective taxonomies” (Bourdev: [0045]).

Claim 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over US Pub No 2015/0036919 by Bourdev et al in view of US Pub No 2016/0379091 by Lin, hereinafter “Lin-091” further in view of US Pub No 20170004383 by Lin et al, hereinafter “Lin-383”.

Regarding independent claim 18, Bourdev teaches “In a digital medium environment to train and use a model to tag a subsequent digital asset according to a first-vocabulary set, a system comprising:”
“means for assigning second-vocabulary tags taken from a second-vocabulary set to the plurality of digital assets through machine learning, the plurality of digital assets having first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic” ([0036-37][0029-0031])
([0036] An image class (“second-vocabulary tags”) may include, for example, objects (e.g., a cat, car, person, purse, etc.), brands or objects associated with brands (e.g., Coca-Cola.RTM., Ferrari.RTM.), professional sports teams (e.g., the Golden State Warriors.RTM.), locations (e.g., Mount Everest), activities (e.g., swimming), phrases or concepts (e.g., a red dress, happiness), and any other thing, action, or notion that can be associated with content. While many examples provided herein may refer to a single "image class," it is noted that the image class may refer to a plurality of image classes or one or more image classes comprising an amalgamation of objects, brands, professional sports teams, locations (“second-vocabulary tags taken from a second-vocabulary set”), etc.
[0037] In some embodiments, the image classification module 104 may use a trained classifier to compare visual attributes of an evaluation set of images with visual attributes of the image class and to determine whether visual attributes in an evaluation set of images can be sufficiently correlated with visual attributes of the image class (“assigning… second-vocabulary tags”). An evaluation set of images may include a group of images selected for classification by a classifier. In various embodiments, the evaluation set of images may include all or a portion of the images in a datastore, or all or a portion of the images in a social networking system. In an embodiment, the classifier may be trained by any suitable technique, such as machine learning (“through machine learning”).
[0029] A social networking system may also provide or support the ability to indicate, identify, categorize, label, describe, or otherwise provide information about an item of content or attributes about the content. One way to indicate such information is through a tag that may identify or otherwise relate to subject matter of the content or its attributes (“each said first-vocabulary tag indicative of a respective digital asset characteristic”). Another way to indicate such information is through global positioning system (GPS) coordinates of a user uploading content to identify the location of the upload, or where the content was captured. As described in more detail herein, there are many other ways to indicate information about content in social networking systems. Many such indicators, including tags (e.g., hashtag or other metadata tag) and GPS system coordinates, are nonvisual, and are not based on automated analysis of visual data in the content.
[0031] Although the subjectivity of nonvisual indicators helps users of social networking systems to creatively express and share a rich variety of content, the subjectivity of nonvisual indicators often makes it difficult to search user-uploaded images, such as photographs (“a plurality of digital assets”). For instance, an attempt to search images posted to a social networking system for "cats" may reveal an image of a user in a Catwoman costume on Halloween. An attempt to perform a graphical search for images of the Eiffel Tower in Paris, France, may lead to images of a dog named "Paris." An attempt to search for Super Bowl photographs may reveal personal photographs of a fan's family that may not be highly relevant to someone looking for first-hand accounts a football game (“first-vocabulary tags taken from a first-vocabulary set, each said first-vocabulary tag indicative of a respective digital asset characteristic”). 
“means for determining that at least one said first-vocabulary tag includes a plurality of visual classes, the determining based on the assigning of at least one said second-vocabulary tag” ([0045])
([0045] During the training phase, the image classification training module 204 may be configured to identify and select contextual cues that correspond to the image class. In various embodiments, the image classification training module 204 may evaluate the attributes of a particular image class and may determine whether certain contextual cues are likely associated with that image class. For example, the image classification training module 204 may determine that one type of tags is likely to accompany a photo of a domestic housecat, while another type of tags is likely to accompany a photo of a user in a Catwoman costume on Halloween (“determining that at least one said first-vocabulary tag includes a plurality of visual classes”). In such a case, the image classification training module 204 may select the type of tags that is likely to accompany a photo of a domestic housecat to correspond to the associated with a particular image correspond to a particular image class (“based on the assigning of at least one said second-vocabulary tag”).)
“means for collecting digital assets from the plurality of digital assets that correspond to one visual class of the plurality of visual classes” ([0060])
([0060] In some embodiments, the image class correlation module 308 may analyze the syntax of image tags of the sample set of images. The image class correlation module 308 may determine how likely a specific syntax correlates with a given image class. In some embodiments, syntactical analysis of the image tags may involve assigning weights to the exact language of the image tags. That is, the image class correlation module 308 may determine that the exact wording of tags associated with an image indicates that the tags should be correlated with an determine that a tag "#domestic housecat" correlates to a high degree with (leani for images of domestic housecats (“collecting digital assets from the plurality of digital assets”). As another example, the image class correlation module 308 may determine that a tag "#domestic house market" correlates to a low degree with the image class for domestic housecats.)
“means for assigning a second-vocabulary tag taken from the second-vocabulary set to the subsequent digital asset through machine learning” ([0036-37])
([0036] An image class (“second-vocabulary tags”) may include, for example, objects (e.g., a cat, car, person, purse, etc.), brands or objects associated with brands (e.g., Coca-Cola.RTM., Ferrari.RTM.), professional sports teams (e.g., the Golden State Warriors.RTM.), locations (e.g., Mount Everest), activities (e.g., swimming), phrases or concepts (e.g., a red dress, happiness), and any other thing, action, or notion that can be associated with content. While many examples provided herein may refer to a single "image class," it is noted that the image class may refer to a plurality of image classes or one or more image classes comprising an amalgamation of objects, brands, professional sports teams, locations (“second-vocabulary tag taken from the second-vocabulary set”), etc.
[0037] In some embodiments, the image classification module 104 may use a trained classifier to compare visual attributes of an evaluation set of images with visual attributes of the image class and to determine whether visual attributes in an evaluation set of images can be sufficiently correlated with visual attributes of the image class (“assigning a second-vocabulary tag”). An evaluation set of images may include a group of images selected for classification by a classifier. In various embodiments, the evaluation set of images may include all or a portion of the images in a datastore, or all or a portion of the images in a social networking system. In an embodiment, the classifier may be trained by any suitable technique, such as machine learning (“through machine learning”).)
Bourdev does not explicitly teach; however, Lin-091 discloses, 
“In a digital medium environment to train and use a model to tag a subsequent digital asset according to a first-vocabulary set; means for training the model using machine learning to assign the at least one said first-vocabulary tag to a subsequent digital image having the one visual class of the plurality of visual classes based on the collected digital assets” ([0029] [0056])
([0029] Semantic similarity can be determined between two or more images by employing a neural network or other classifier algorithm (“to train and use a model”) executed by one or more processing devices. The network or algorithm can identify one or more associations between the semantic content of an image and a class of semantically similar images (“visual class”). For example, a neural network or other classifier algorithm may analyze training images with certain recurring objects, color schemes, or other semantic content (“training the model”) and determine that the objects, color schemes, or other semantic content are indicative of a certain class of content (e.g., "dogs," "vehicles," "trees," etc.). The neural network or other classifier algorithm may apply the learned associations between different classes and different types of semantic content to classify subsequently received images (“using machine learning”). An example of a classifier algorithm is an algorithm that uses a neural network model to identify associations between certain semantic features and certain classes of semantic content. As such, using the Eiffel tower example above, the neural network or classifier algorithm may look at the two separate images, one having the Eiffel tower isolated, front and center, the other having an image of a dog front and center with the Eiffel tower offset and in the background, as having at least some semantic similarity (i.e., both having the Eiffel tower depicted within the image).
[0056] In some embodiments, selection of the one or more relevant images can dynamically generate a new visually-based query based on the selected one or more relevant images. To this end, the user can initiate a subsequent visually-based search using the new visually-based query comprising the one or more selected relevant images. Consequently, the search results would become more relevant as this process is iteratively performed. For example, if the only search result images selected by the user were of the Eiffel tower having fireworks going off behind the tower, then it is likely that, at least, the images of the Eiffel tower having fireworks advertisements superimposed thereon, and the images of the Eiffel tower on fire, would be removed from consideration. Further, in subsequent iterations of the relevance feedback, if selected result images were further refined to identify only those images having the Eiffel tower with fireworks displayed behind the tower, it is possible that the desired images will result as relevance feedback is continuously provided. It is contemplated, however, that the intelligence of the visual image search engine is dependent on depth of the neural network or other classifier algorithms (“from a plurality of models
“means for locating the model from a plurality of said models based at least in part on the assigned second-vocabulary tag” ([0029] [0056])
([0029] Semantic similarity can be determined between two or more images by employing a neural network or other classifier algorithm (“locating the model”) executed by one or more processing devices. The network or algorithm can identify one or more associations between the semantic content of an image and a class of semantically similar images. For example, a neural network or other classifier algorithm may analyze training images with certain recurring objects, color schemes, or other semantic content and determine that the objects, color schemes, or other semantic content are indicative of a certain class of content (e.g., "dogs," "vehicles," "trees," etc.). The neural network or other classifier algorithm may apply the learned associations between different classes and different types of semantic content to classify subsequently received images. An example of a classifier algorithm is an algorithm that uses a neural network model to identify associations between certain semantic features and certain classes of semantic content. As such, using the Eiffel tower example above, the neural network or classifier algorithm may look at the two separate images, one having the Eiffel tower isolated, front and center, the other having an image of a dog front and center with the Eiffel tower 
[0056] In some embodiments, selection of the one or more relevant images can dynamically generate a new visually-based query based on the selected one or more relevant images. To this end, the user can initiate a subsequent visually-based search using the new visually-based query comprising the one or more selected relevant images. Consequently, the search results would become more relevant as this process is iteratively performed. For example, if the only search result images selected by the user were of the Eiffel tower having fireworks going off behind the tower, then it is likely that, at least, the images of the Eiffel tower having fireworks advertisements superimposed thereon, and the images of the Eiffel tower on fire, would be removed from consideration. Further, in subsequent iterations of the relevance feedback, if selected result images were further refined to identify only those images having the Eiffel tower with fireworks displayed behind the tower, it is possible that the desired images will result as relevance feedback is continuously provided. It is contemplated, however, that the intelligence of the visual image search engine is dependent on depth of the neural network or other classifier algorithms from a plurality of models”) and its capability of determining such differences.
Bourdev and Lin-091 are analogous art because they both are directed to the same field of automatically generating tags to be applied to images. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have combined the teachings of Lin-091 with the method/system of Bourdev in order to provide users with a means for automatically selecting tags to be applied to an input image based on the semantic content of the input image as shown in [Lin-091: paragraph 0008].
Bourdev in view of Lin-091 teaches “In a digital medium environment to train and use a model to tag a subsequent digital asset according to a first-vocabulary set”.
Bourdev-Lin-091 does not explicitly teach; however, Lin-383 discloses, “means for determining a probability that the subsequent digital asset corresponds to a digital asset characteristic” (Lin-383: [0025])
([0025] Upon obtaining the first set of result images, one or more images from the first set of result images can be used to generate a visually-based query. The one or more images from the first set of result images used to generate the visually-based query can be determined based on an association score generated by the text-based search engine (“determining a probability”). an image having at least two tags "Eiffel tower" and "fireworks" (“the digital asset corresponds to a digital asset characteristic”)) could produce a relatively high association score (“determining a probability”), whereas a partial hit of keywords from the text-based query (i.e., an image having one of tags "Eiffel tower" or "fireworks") could produce a relatively medium association score, whereas no or minimal hit of keywords from the text-based query (i.e., an image having neither tags "Eiffel tower" nor "fireworks", or maybe just "Eiffel" or "tower" or "fire") could produce a relatively low association score.)
Bourdev-Lin-091 and Lin-383 are analogous art because they both are directed to the same field of automatically generating tags to be applied to images. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to have combined the teachings of Lin-383 with the method/system of Bourdev-Lin-091 in order to provide users with a means for using images generated from the text-based image search to generate one or more image queries used to conduct a subsequent visually-based image search as shown in [Lin-383: para. 0005].
Bourdev-Lin-091 in view of Lin-383 teaches “means for determining a probability based on the located model through machine learning that the subsequent digital asset corresponds to a digital asset characteristic associated with the located model; means for assigning a respective said first-vocabulary tag associated with the located model to the subsequent digital asset based at least in part on the determined probability”.

As to claim 19, Bourdev in view of Lin-091 and Lin-383 teaches “wherein the plurality of digital assets are digital images” (Bourdev: [0028]).

As to claim 20, Bourdev in view of Lin teaches “wherein the first-vocabulary set and the second-vocabulary set have different respective taxonomies” (Bourdev: [0045]).


Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BAO G TRAN whose telephone number is (571)270-3493.  The examiner can normally be reached on Mon-Fri 6:30-3:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Boris Gorney can be reached on (571)270-5626.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BAO G TRAN/Patent Examiner of Art Unit 2158
/BORIS GORNEY/Supervisory Patent Examiner, Art Unit 2158