This is the second English blog to summarize my research this year, second to the 2023 version. Your criticism, comments, and suggestions are welcome to help me, a junior researcher, to learn and grow😄

The year 2024 was amazing and transformative for me in research, career, and life. For context: I will leave Microsoft and join William & Mary as an assistant professor starting January 2025 (a dramatic change in my career path and life). In research, this is the second year that many AI researchers worldwide have shifted their focus to large language models. Having made this transition early last year, my productivity has increased significantly. This year also marks when I began thinking about problems beyond individual papers, viewing them at a broader direction level.

I am proud to have pioneered several new research directions this year. Our major breakthroughs can be summarized into three key areas. These directions will remain my primary focus during my faculty career.

Machine learning with foundation models

As a machine learning researcher who believes ML remains vital in the era of generative AI, I have focused on bridging large foundation models with ML techniques. My most significant research breakthrough this year centers on our newly-proposed research direction—Catastrophic Inheritance (CI). Given the profound impact of generative AI across disciplines and the complexity of its training, data, and adaptation processes, we introduced CI to address a critical challenge: biases in upstream pre-training data become inherited by models, leading to catastrophic consequences in downstream tasks. Our major research outcomes in this area are:

image.png

I certainly do not want to abandon traditional ML since I firmly consider them useful in the era of LLMs. In the future, we will continue the research in both CI and other interesting new directions.

Philosophy of language models

I created the term “philosophy of LM” to unify the research that tries to understand LLMs in a more scientific way. It mainly consists of evaluation and enhancement.

LLM evaluation

We had several papers in top venues on LLM evaluation this year and I would like to only highlight two: the DyVal series and ERBench.

Well, I have to say that LLM evaluation could be one of the easiest research that everyone can do since it requires no specific model training or mathematics. However, despite plenty of work in this area, the major players in delivering super AI models are still leveraging the (old and contaminated) benchmarks like MMLU and GSM8K. This is somewhat sad to see.

LLM enhancement: Culture-specific LLMs

Current LLMs are incapable of handling multicultural contexts if you are not from the Western countries. This is the major reason why we created the CultureLLM series to enhance LLM’s reach to low-resource cultures.