DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
In response to the Office Action mailed 3/30/2022, applicant has submitted an amendment filed 7/1/2022.
Claim(s) 1-2, 4-8, 10-11, 13-18, and 20, has/have been amended.  
Response to Arguments
Applicant’s Arguments pertaining to Dzik have been acknowledged.
EXAMINER'S AMENDMENT
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.

Authorization for this examiner’s amendment was given in an interview with Marina Portnova on 7/22/2022.

The application has been amended as follows: 

	Amend the claims as follows:
1.	(Previously presented) A method comprising:
determining phoneme data of a text source, the text source comprising a sequence of words;
receiving audio data comprising a word spoken by a user as the user reads the text source aloud;
 performing, by a processing device, a phonetic comparison of the phoneme data of the text source and phoneme data of the audio data; and
identifying a reading location in the text source as the user reads the text source aloud based on the phonetic comparison of the phoneme data of the text source and the phoneme data of the audio data.
2.	(Previously presented) The method of claim 1, wherein the text source is a book and the reading location is a current reading location in the book.
3.	(Previously presented) The method of claim 1, wherein the phoneme data of the text source comprises a phonetic encoding of the sequence of words, the phonetic encoding comprising one or more sequences of phonetic values.
4.	(Previously presented) The method of claim 1, wherein performing the phonetic comparison comprises calculating a phoneme edit distance between phoneme data of the audio data and phoneme data of the text source.
5.	(Previously presented) The method of claim 1, wherein performing the phonetic comparison comprises calculating a numeric value representing a similarity between two or more sequences of phonetic values.
6.	(Previously presented) The method of claim 1, wherein performing the phonetic comparison comprises performing a fuzzy match between phoneme data corresponding to the audio data and the phoneme data of the text source.
7.	(Previously presented) The method of claim 1, wherein performing the phonetic comparison comprises comparing the audio data and the text source without any conversion of the audio data to text using speech recognition.
8.	(Currently amended) The method of claim 1, wherein identifying the reading location in the text source comprises:
determining the spoken word matches a particular word in the sequence of words based on the phoneme data of text source; and
selecting the location of the particular word based on the phoneme data of the text source.
9.	(Currently amended) The method of claim 1, further comprising:
accessing textual data of the text source;
generating [[the]] phoneme data based on the textual data; and
associating [[the]] phoneme data with the text source.

10.	(Previously presented) A system comprising:
a memory; and
a processing device, coupled to the memory, to:
determine phoneme data of a text source, the text source comprising a sequence of words;
receive audio data comprising a word spoken by a user as the user reads the text source aloud;
 perform a phonetic comparison of the phoneme data of the text source and phoneme data of the audio data; and
identify a reading location in the text source as the user reads the text source aloud based on the phonetic comparison of the phoneme data of the text source and the phoneme data of the audio data.
11.	(Previously presented) The system of claim 10, wherein the text source is a book and the reading location is a current reading location in the book.
12.	(Previously presented) The system of claim 10, wherein the phoneme data of the text source comprises a phonetic encoding of the sequence of words, the phonetic encoding comprising one or more sequences of phonetic values.
13.	(Previously presented) The system of claim 10, wherein to perform the phonetic comparison, the processing device is further to calculate a phoneme edit distance between phoneme data of the audio data and phoneme data of the text source.
14.	(Previously presented) The system of claim 10, wherein to perform the phonetic comparison, the processing device is further to calculate a numeric value representing a similarity between two or more sequences of phonetic values.
15.	(Previously presented) The system of claim 10, wherein to perform the phonetic comparison, the processing device is further to perform a fuzzy match between phoneme data corresponding to the audio data and the phoneme data of the text source.
16.	(Currently amended) The system of claim 10, wherein to perform the phonetic comparison, the processing device is to compare the audio data and the text source without any conversion of the audio data to text using speech recognition.
17.	(Currently amended) The system of claim 10, wherein to identify the reading location in the text source, the processing device is to perform operations comprising:
determining the spoken word matches a particular word in the sequence of words based on the phoneme data of text source; and
selecting the location of the particular word based on the phoneme data of the text source.
18.	(Currently amended) The system of claim 10, the processing device is further to:
access textual data of the text source;
generate [[the]] phoneme data based on the textual data; and
associate [[the]] phoneme data with the text source.
19.	(Previously presented) The system of claim 10, wherein the system is configured to implement a virtual assistant.
20.	(Currently amended) A non-transitory computer readable medium storing program instructions, which when executed 
determining phoneme data of a text source, the text source comprising a sequence of words;
receiving audio data comprising a word spoken by a user as the user reads the text source aloud;
 performing a phonetic comparison of the phoneme data of the text source and phoneme data of the audio data; and
identifying a reading location in the text source as the user reads the text source aloud based on the phonetic comparison of the phoneme data of the text source and the phoneme data of the audio data.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 

	As per Claim(s) 1 (and similarly claim[s] 10 and 20, and consequently claim[s] 2-9 and 11-19 which depend on claim[s] 1 and 10), the prior art of record does not teach or suggest the combination of all limitations in claim(s) 1, including (i.e. in combination with the remaining limitations in claim[s] 1) a method comprising: determining phoneme data of a text source, the text source comprising a sequence of words; receiving audio data comprising a word spoken by a user as the user reads the text source aloud; performing, by a processing device, a phonetic comparison of the phoneme data of the text source and phoneme data of the audio data; and identifying a reading location in the text source as the user reads the text source aloud based on the phonetic comparison of the phoneme data of the text source and the phoneme data of the audio data.
As per Claim 1, King et al. (US 2015/0170648) suggests a method comprising:… phoneme data of a text source, the text source comprising a sequence of words; receiving audio data comprising a word spoken by a user as the user reads the text source aloud; performing, by a processing device, a… comparison of… data of the text source and… data of the audio data; and identifying a reading location in the text source as the user reads the text source aloud based on… comparison of… data of the text source and… data of the audio data (Figure 4, 5A-5B, 6A-6B; paragraphs 13, 22, 24, 29-30, 40-45, 54-55; [all paragraphs and Figures are cited for each limitation with “key” paragraphs and Figures pertaining to each limitation identified below, i.e. all other paragraphs and Figures not specifically referenced for any particular limitation are eligible to provide context and additional support]
“a method comprising:”: Figure 4; paragraph 13; a process can be interpreted as a “method”
“…phoneme data of a text source, the text source comprising a sequence of words;”: paragraphs 22, 24, 54; Figures 5A-5B and 6A-6B; associated phoneme pronunciation data [“phoneme data”] for textual content of an ebook which is displayed and which a user is reading aloud [i.e. the textual content is a “text” “source”-of-information-for-the-user-to-read], where textual content of an ebook is suggested to include a sequence of words [particularly because book content is commonly made of word sequences, see e.g. Figures 5A-5B and 6A-6B]
“receiving audio data comprising a word spoken by a user as the user reads the text source aloud;”: paragraphs 22, 24, 54; Figures 5A-5B and 6A-6B; receiving audio data of a user reading displayed ebook textual content [“the” displayed “text source”] aloud, where reading aloud is at least suggested to involve speaking the displayed words [such that the audio data is at least suggested to comprise “a word spoken by a user”]
“performing, by a processing device, a… comparison of… data of the text source and… data of the audio data; and identifying a reading location in the text source as the user reads the text source aloud based on… comparison of… data of the text source and… data of the audio data”: paragraphs 24, 29-30, 40-45, 54-55; Figures 5A-5B, 6A-6B; comparing text data associated with a displayed portion of an ebook [“data of the text source”] and spoken text data [“data of the audio data”], and determining/”identifying” a user’s reading location based on the comparison of the text data associated with the displayed portion of the ebook to spoken text data, where the user’s reading location is suggested to be the user’s reading location “in the text source” [paragraphs 44 and 55 describe where the reading location is “in the text data” where “the text data” is suggested to be “text data associated with the displayed portion of the ebook”/text-data-“of the text source”, see also paragraphs 41, 42])
King does not describe a phonetic comparison between a determined phoneme data of a text source (pronunciation data in King appears to be pre-stored and not determined/generated) and phoneme data of user speech used to identify the reading location.  Paragraphs 44-45 appear to describe a phonetic comparison performed after the reading location has been determined, and while paragraphs 29-30 and 45 suggest where spoken text data can be phoneme data (since the spoken text data can be divided into phonemes in paragraph 45), paragraphs 29-30 specifically state that the comparison is with “text data” (not pronunciation data).
Bocchieri et al. (US 5,329,608) teaches “The method of operating speech recognizing system 1 for recognizing a spoken word such as "dwd" comprises generating a phonetic transcription string "diydahbixlyuwdiy" from the word "dwd" and recording both the word "dwd" and generated phonetic transcription string "diydahbixlyuwdiy" in vocabulary lexicon database 1031, FIG. 3. Upon receiving the spoken word "dwd", the method of operating speech recognizing system 1 accesses subword model database 1032 and constructs a model string of phonemes "d iy d ah b ix l y uw d iy" characteristic of the sounds of the spoken word "dwd". The constructed phoneme string model "d iy d ah b ix l y uw d iy" is compared to ones of the lexicon vocabulary recorded phonetic transcription strings and when there is a match of the constructed phoneme string model "d iy d ah b ix l y uw d iy" with the vocabulary lexicon database 1031 recorded phonetic transcription string "diydahbixlyuwdiy", the spoken word is recognized as the word "dwd" recorded with the matched phonetic transcription string "diydahbixlyuwdiy"” (col. 5, line 50 – col. 6, line 19).  Bocchieri describes where matching a spoken word to a recognizable word is done by a phonetic comparison of a phonetic transcription of a spoken word with a phonetic transcription of a recognizable word.  Bocchieri and King also do not describe where the phonetic transcriptions of the text source are determined (in King, the pronunciation data is pre-stored and in Bocchieri, the phonetic transcriptions are recorded [suggested to be pre-stored])
Yacoub (US 2005/0065790) teaches “In one type of ASR system, the written text of a word is received by a text-to-speech unit, such as TTS system 240, so the system can create a phoneme transcription of the written text using rules of text-to-speech conversion. The phoneme transcription of the written text is then compared with the phonemes derived from the operation of a speech recognition algorithm 250. The speech recognition algorithm, in turn, compares the utterances with the models of phonemes 260. The models of phonemes can be adjusted during this "model training" process until an adequate match is obtained between the phoneme derived from the text-to-speech transcription of the utterances and the phonemes recognized by the speech recognition algorithm 250” (paragraph 24).  Yacoub describes where phonetic transcription of a written text which is compared to “phonemes derived from the operation of a speech recognition algorithm”, but appears to be directed to model training (not necessarily recognizing what word is spoken).  Additionally, since, in King and Bocchieri, the pronunciation data is already present/pre-stored, it is unnecessary to use TTS to generate phoneme transcriptions of the displayed ebook words.
Bickley et al. (US 2003/0069729) teaches predicting when a speech recognizer will confuse spoken phrases by using strings of phonemes as an intermediate representation of text forms (paragraphs 24, 43) where text forms can also be an audio file (paragraph 19).
Hagen, A. (2006). Advances in children's speech recognition with application to interactive literacy tutors (Order No. 3207739). Available from ProQuest Dissertations and Theses Professional. (305355457). Retrieved from https://dialog.proquest.com/professional/docview/305355457?accountid=131444 teaches using speech recognition to track reading position (see e.g. last paragraph of Section 4.2 on page 43; lines 3-5 of page 29, and lines 6-7 of iii)

Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERIC YEN whose telephone number is (571)272-4249. The examiner can normally be reached M-F 12:00PM -8:30PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, RICHEMOND DORVIL can be reached on (571)272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





EY 7/22/2022
/ERIC YEN/           Primary Examiner, Art Unit 2658