94e62ed9-5117-46ed-88fe-ab3787c04e50.jpg

GitHub | LinkedIn | Scholar | Twitter

fsamir at mail dot ubc dot ca


I’m a 3rd-year PhD student candidate in the Natural Language Processing Group at the University of British Columbia, where I’m advised by Jian Zhu and Vered Shwartz. My research is supported by a Doctoral Scholarship from the Natural Sciences and Engineering Research Council of Canada (NSERC) and an award from the Public Scholars Initiative. I'm broadly interested in applied and theoretical aspects of multilingual natural language processing, and I work with both text and audio modalities. Before I came to UBC, I completed my MSc in the Computational Linguistics group at the University of Toronto. Prior to graduate school, I completed my Bachelor of Science (Honours Computer Science) at the University of Toronto.

For the 2023-2024 academic year, I was visiting at the Paul G. Allen Center for Computer Science & Engineering, where I was fortunate to be working with Prof. Yulia Tsvetkov’s group.

I’m currently a research intern at Ai2 Aristo, working on understanding the capabilities and limitations of AI-assisted data analysis.


Selected publications

ig_distribution.png

Untitled

SubsetSelectionSam.png

Untitled

Untitled

Locating Information Gaps and Narrative Inconsistencies Across Languages: A Case Study of LGBT People Portrayals on Wikipedia

Submitted.

Farhan Samir, Chan Young Park, Anjalie Field, Vered Shwartz, Yulia Tsvetkov

The taste of IPA: Towards open-vocabulary keyword matching and forced alignment in any language

NAACL 2024 [to appear]

Jian Zhu, Changbing Yang, Farhan Samir, Jahurul Islam

Understanding compositional data augmentation in typologically diverse morphological inflection

EMNLP 2023 [oral presentation] [🏆 outstanding paper award]

Farhan Samir, Miikka Silfverberg

[preprint]

One Wug, Two Wug+s Transformer Inflection Models Hallucinate Affixes

ComputEL @ ACL 2022 [oral presentation]

Farhan Samir, Miikka Silfverberg

[paper]

Quantifying cognitive factors in lexical decline

TACL

David Francis, Ella Rabinovich, Farhan Samir, David Mortensen, Suzanne Stevenson

[paper][code and data]

Untitled

A formidable ability: Detecting adjectival extremeness with Distributional Semantic Models

ACL Findings 2021

Farhan Samir, Barend Beekhuizen, Suzanne Stevenson

[paper][code and data]


Industry research experience

Untitled

Untitled

Untitled

Summer 2024, Seattle WA

PhD Research Intern, Aristo team. Mentors: Bodhisattwa Prasad Majumder, Bhavana Dalvi, Harshit Surana, Ben Bogin, Lucy Lu Wang, Peter Clark

Summer 2023, Sunnyvale CA

Applied Scientist Intern @ Amazon Science. Mentors: Daniel Elkind & Timothy Leffel.

Summer 2022, Toronto ON

Worked with Griffin Lacey and Graham Taylor on developing scalable Transformer-based Graph Neural Networks.

Undergraduate research advising

I'm interested in advising undergraduate students (or applied master's students) on doing high-quality multilingual NLP research, specifically students who have no prior NLP research experience. I'm open to working with students from any disciplinary background.

I don't have any more capacity for the upcoming term (Summer 2024), but I will be looking for new mentees to work with in the Fall. Feel free to reach out to me at [email protected] to discuss opportunities.

Current mentees:

Benjamin Movassagh (UBC; co-advised with EunJeong Hwang)

Former mentees:

Angela Li (UBC; co-advised with Prof. Miikka Silfverberg)