GitHub | LinkedIn | Scholar | Twitter
fsamir at mail dot ubc dot ca
I’m a Ph.D. student candidate in the Natural Language Processing Group at the University of British Columbia, where I’m advised by Jian Zhu. My research is supported by a Doctoral Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and an award from the Public Scholars Initiative. I'm broadly interested in applied and theoretical aspects of multilingual natural language processing, and I work with both text and audio modalities. Before I came to UBC, I completed my MSc in the Computational Linguistics group at the University of Toronto. Prior to graduate school, I completed my Bachelor of Science (Honours Computer Science) at the University of Toronto.
For the 2023-2024 academic year, I was a visiting researcher at the Paul G. Allen Center for Computer Science & Engineering.
I’m was also a research intern at Ai2 Aristo.
Efficient Identification of Low-Quality Language Partitions in Multilingual Datasets: A Case Study on a Large-Scale Multilingual Audio Dataset
In submission [preprint]
Farhan Samir, Emily P. Ahn, Shreya Prakash, Márton Sóskuthy, Vered Shwartz, Jian Zhu
Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia
EMNLP 2024 [preprint] [poster] [proceedings] [citation]
Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia Tsvetkov
The taste of IPA: Towards open-vocabulary keyword matching and forced alignment in any language
NAACL 2024
Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam
Understanding compositional data augmentation in typologically diverse morphological inflection
EMNLP 2023 [oral presentation] [🏆 outstanding paper award]
Farhan Samir, Miikka Silfverberg
[preprint]
One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes
ComputEL @ ACL 2022 [oral presentation]
Farhan Samir, Miikka Silfverberg
[paper]
Quantifying cognitive factors in lexical decline
TACL
David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson
A formidable ability: Detecting adjectival extremeness with Distributional Semantic Models
ACL Findings 2021
Farhan Samir, Barend Beekhuizen, Suzanne Stevenson
Summer 2024, Seattle WA
PhD Research Intern, Aristo team. Mentors: Bodhisattwa Prasad Majumder, Bhavana Dalvi, Harshit Surana, Ben Bogin, Lucy Lu Wang, Peter Clark
Summer 2023, Sunnyvale CA
Applied Scientist Intern @ Amazon Science. Mentors: Daniel Elkind & Timothy Leffel.
Summer 2022, Toronto ON
Worked with Griffin Lacey and Graham Taylor on developing scalable Transformer-based Graph Neural Networks.
Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia
November 2024, Language, Cognition, and Computation Group, University of Toronto, Toronto, Canada. [slides]
November 2024, Linguistics Outside the Classroom (LOC) Colloquium, University of British Columbia, Vancouver, Canada. [slides]
clap-ipa: A lightweight tool for performing keyword spotting and forced alignment in multiple languages
flowmason: A library for managing complex computational experiments