Predicting Native Language from Gaze


A system that predicts native language from gaze can provide benefit for forensic, advertisement-related, and educational applications.

Problems Addressed

Current frameworks for understanding cross-linguistic influence in multilingualism derive primarily from studies of language production. Such studies examine cross-linguistic influence, or how a first language affects second language processing, in writing produced by an individual. Analysis that derives from this work on written text provides key insights for native language identification and other features of language acquisition in learners. However, insights gained from studies that focus on language production offer an incomplete account of cross-linguistic influence in language processing.

This system presents a novel framework for reviewing cross-linguistic influence in language comprehension. The end-to-end system predicts the native language of a reader based on their eye-movement patterns when reading free-form English. This innovative analysis of eye-movement patterns during reading provides the basis for further inquiry of cross-linguistic influence in language comprehension. Combined with evidence from language production, this line of investigation can play a key role in advancing linguistic theory of multilingualism.


The presented framework predicts the native language of a reader when reading English. Tested on native speakers of Portuguese, Spanish, Japanese, Mandarin Chinese, as well as English, the system relies on an eye tracker camera to record gaze location of participants as they read a small set of free-form sentences. Using the gaze recording, the system extracts a set of linguistically motivated features to characterize gaze patterns and then implements a machine learning algorithm on those features to predict native language of a reader.

Moreover, the system can reliably distinguish different languages as well as the difference between native and non-native English speakers. For substantially different languages like Japanese and Spanish, the system differentiates between the languages with over 90% accuracy. For similar languages like Portuguese and Spanish, the system distinguishes with above chance accuracy. 


  • Machine learning algorithm, improved performance possible with more data
  • Feature extraction and prediction occurs in seconds
  • Applicable to other languages beyond English

Related Technologies

This technology is related to technology 19689.