Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

portfolio

publications

MCDTB: A Macro-level Chinese Discourse TreeBank

Published in COLING, 2018

In view of the differences between the annotations of micro and macro discourse rela-tionships, this paper describes the relevant experiments on the construction of the Macro Chinese Discourse Treebank (MCDTB), a higher-level Chinese discourse corpus. Fol-lowing RST (Rhetorical Structure Theory), we annotate the macro discourse information, including discourse structure, nuclearity and relationship, and the additional discourse information, including topic sentences, lead and abstract, to make the macro discourse annotation more objective and accurate. Finally, we annotated 720 articles with a Kappa value greater than 0.6. Preliminary experiments on this corpus verify the computability of MCDTB.

Recommended citation: Feng Jiang, Sheng Xu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Guodong Zhou: MCDTB: A Macro-level Chinese Discourse TreeBank. In Proceedings of the 27th International Conference on Computational Linguistics (COLING 2018): 3493-3504. https://aclanthology.org/C18-1296.pdf

Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading

Published in COLING, 2020

Discourse structure tree construction is the fundamental task of discourse parsing and most previous work focused on English. Due to the cultural and linguistic differences, existing successful methods on English discourse parsing cannot be transformed into Chinese directly, especially in paragraph level suffering from longer discourse units and fewer explicit connectives. To alleviate the above issues, we propose two reading modes, i.e., the global backward reading and the local reverse reading, to construct Chinese paragraph level discourse trees. The former processes discourse units from the end to the beginning in a document to utilize the left-branching bias of discourse structure in Chinese, while the latter reverses the position of paragraphs in a discourse unit to enhance the differentiation of coherence between adjacent discourse units. The experimental results on Chinese MCDTB demonstrate that our model outperforms all strong baselines.

Recommended citation: Feng Jiang, Xiaomin Chu, Peifeng Li, Fang Kong, Qiaoming Zhu: Chinese Paragraph-level Discourse Parsing with Global Backward and Local Reverse Reading. In Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020): 5749-5759. https://aclanthology.org/2020.coling-main.506.pdf

Hierarchical Macro Discourse Parsing Based on Topic Segmentation

Published in AAAI, 2021

Hierarchically constructing micro (i.e., intra-sentence or inter-sentence) discourse structure trees using explicit boundaries (e.g., sentence and paragraph boundaries) has been proved to be an effective strategy. However, it is difficult to apply this strategy to document-level macro (i.e., inter-paragraph) discourse parsing, the more challenging task, due to the lack of explicit boundaries at the higher level. To alleviate this issue, we introduce a topic segmentation mechanism to detect implicit topic boundaries and then help the document-level macro discourse parser to construct better discourse trees hierarchically. In particular, our parser first splits a document into several sections using the topic boundaries that the topic segmentation detects. Then it builds a smaller and more accurate discourse sub-tree in each section and sequentially forms a whole tree for a document. The experimental results on both Chinese MCDTB and English RST-DT show that our proposed method outperforms the state-of-the-art baselines significantly.

Recommended citation: Feng Jiang, Yaxin Fan, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Fang Kong: Hierarchical Macro Discourse Parsing Based on Topic Segmentation. In Proceedings of the Conference on Artificial Intelligence (AAAI 2021): 13152-13160. https://ojs.aaai.org/index.php/AAAI/article/view/17554/17361

Not Just Classification: Recognizing Implicit Discourse Relation on Joint Modeling of Classification and Generation

Published in EMNLP, 2021

Implicit discourse relation recognition (IDRR) is a critical task in discourse analysis. Previous studies only regard it as a classification task and lack an in-depth understanding of the semantics of different relations. Therefore, we first view IDRR as a generation task and further propose a method joint modeling of the classification and generation. Specifically, we propose a joint model, CG-T5, to recognize the relation label and generate the target sentence containing the meaning of relations simultaneously. Furthermore, we design three target sentence forms, including the question form, for the generation model to incorporate prior knowledge. To address the issue that large discourse units are hardly embedded into the target sentence, we also propose a target sentence construction mechanism that automatically extracts core sentences from those large discourse units. Experimental results both on Chinese MCDTB and English PDTB datasets show that our model CG-T5 achieves the best performance against several state-of-the-art systems.

Recommended citation: Feng Jiang, Yaxin Fan, Xiaomin Chu, Peifeng Li, Qiaoming Zhu: Not Just Classification: Recognizing Implicit Discourse Relation on Joint Modeling of Classification and Generation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP 2021): 2418-2431. https://aclanthology.org/2021.emnlp-main.187.pdf

Automated Chinese Essay Scoring from Multiple Traits

Published in COLING, 2022

Automatic Essay Scoring (AES) is the task of using the computer to evaluate the quality of essays automatically. Current research on AES focuses on scoring the overall quality or single trait of prompt-specific essays. However, the users not only expect to obtain the overall score but also the instant feedback from different traits to help their writing in the real world. Therefore, we first annotate a mutli-trait dataset ACEA including 1220 argumentative essays from four traits, i.e., essay organization, topic, logic, and language. And then we design a hierarchical multi-task trait scorer HMTS to evaluate the quality of writing by modeling these four traits. Moreover, we propose an inter-sequence attention mechanism to enhance information interaction between different tasks and design the trait-specific features for various tasks in AES. The experimental results on ACEA show that our HMTS can effectively score essays from multiple traits, outperforming several strong models.

Recommended citation: Yaqiong He, Feng Jiang, Xiaomin Chu, Peifeng Li: Automated Chinese Essay Scoring from Multiple Traits. In Proceedings of the 29th International Conference on Computational Linguistics (COLING 2022): 3007-3016. https://aclanthology.org/2022.coling-1.266.pdf

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study

Published in LREC-COLING, 2024

Large language models, like ChatGPT, have shown remarkable capability in many downstream tasks, yet their ability to understand discourse structures of dialogues remains less explored, where it requires higher level capabilities of understanding and reasoning. In this paper, we aim to systematically inspect ChatGPT’s performance in two discourse analysis tasks: topic segmentation and discourse parsing, focusing on its deep semantic understanding of linear and hierarchical discourse structures underlying dialogue. To instruct ChatGPT to complete these tasks, we initially craft a prompt template consisting of the task description, output format, and structured input. Then, we conduct experiments on four popular topic segmentation datasets and two discourse parsing datasets. The experimental results showcase that ChatGPT demonstrates proficiency in identifying topic structures in general-domain conversations yet struggles considerably in specific-domain conversations. We also found that ChatGPT hardly understands rhetorical structures that are more complex than topic structures. Our deeper investigation indicates that ChatGPT can give more reasonable topic structures than human annotations but only linearly parses the hierarchical rhetorical structures. In addition, we delve into the impact of in-context learning (e.g., chain-of-thought) on ChatGPT and conduct the ablation study on various prompt components, which can provide a research foundation for future work. The code is available at https://github.com/yxfanSuda/GPTforDDA.

Recommended citation: Yaxin Fan, Feng Jiang, Peifeng Li, and Haizhou Li: Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): 16998–17010. https://aclanthology.org/2024.lrec-main.1477.pdf

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark

Published in LREC-COLING, 2024

Topic segmentation and outline generation strive to divide a document into coherent topic sections and generate corresponding subheadings, unveiling the discourse topic structure of a document. Compared with sentence-level topic structure, the paragraph-level topic structure can quickly grasp and understand the overall context of the document from a higher level, benefitting many downstream tasks such as summarization, discourse parsing, and information retrieval. However, the lack of large-scale, high-quality Chinese paragraph-level topic structure corpora restrained relative research and applications. To fill this gap, we build the Chinese paragraph-level topic representation, corpus, and benchmark in this paper. Firstly, we propose a hierarchical paragraph-level topic structure representation with three layers to guide the corpus construction. Then, we employ a two-stage man-machine collaborative annotation method to construct the largest Chinese Paragraph-level Topic Structure corpus (CPTS), achieving high quality. We also build several strong baselines, including ChatGPT, to validate the computability of CPTS on two fundamental tasks (topic segmentation and outline generation) and preliminarily verified its usefulness for the downstream task (discourse parsing).

Recommended citation: Feng Jiang, Weihao Liu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, and Haizhou Li: Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): 495–506. https://aclanthology.org/2024.lrec-main.44.pdf

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Published in ACL, 2024

The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called ‘Socratic‘. The experimental results show our response model, ‘PlatoLM‘, achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.

Recommended citation: Chuyi Kong, Yaxin Fan, Xiang Wan, Feng Jiang, and Benyou Wang: PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers): 7841–7863. https://aclanthology.org/2024.lrec-main.1477.pdf

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.