Research | Language Models for Social Good

Current Projects

ALAVON

Alavon is a community platform for discovering, sharing, and running high-quality AI prompts. Users can browse a live feed of prompts, search by topic or tag, save favorites, vote, comment, follow creators, and share prompts with others. The app also lets users execute prompts directly in an AI chat experience, so they can move from discovery to results without copy-pasting between tools.
Alavon helps creators discover, share, and run AI prompts in one place. Browse community prompts, save your favorites, follow prompt creators, discuss ideas, and execute prompts instantly with built-in AI chat.
Features:

Discover community prompts sorted by recent, trending, and top-rated.

Create and publish prompts with titles, descriptions, tags, and cover images.

Generate prompt drafts with AI before publishing.

Execute any prompt in a built-in AI chat workflow.

Save, upvote, downvote, comment on, and share prompts.

Follow creators and view public profiles.

Send prompts and messages directly to other users.

Publications

Large Language Models for Manufacturing

Large Language Models (LLMs) are rapidly advancing and have strong potential to transform the manufacturing industry by improving efficiency, innovation, and process optimization. This paper explores how LLMs can enhance tasks across manufacturing, including product design, quality control, supply chain management, and workforce development. It highlights the capabilities of advanced models like GPT-4V in handling complex instructions, analyzing large datasets, and enabling better knowledge sharing. The paper also discusses future impacts such as improved education, automated coding, smarter robotics, and industrial metaverse applications, positioning LLMs as key drivers of sustainable growth.

Valuing Time in Silicon: Can Large Language Models Replicate Human Value of Travel Time

Large language models (LLMs) have the potential to transform transportation systems, but their ability to accurately mimic human behavior in complex scenarios is still uncertain. This study evaluates three LLMs by analyzing their value of travel time (VOT) across different contexts such as travel purpose, choice settings, and socio-demographic factors. The results show that LLMs exhibit strong similarities to human decision-making, including sensitivity to income and time-cost trade-offs, with consistent behavior across scenarios. However, differences between models highlight the need for further validation and refinement before LLMs can reliably serve as proxies for human travelers.

AugGPT: Leveraging ChatGPT for Text Data Augmentation

Text data augmentation is used to address the challenge of limited and low-quality data in NLP, especially in few-shot learning scenarios. Existing methods often fail to maintain correct labels or generate sufficiently diverse data. The proposed approach, AugGPT, uses ChatGPT to create multiple semantically varied but conceptually similar versions of training samples. Results show that AugGPT improves model accuracy and produces more effective data distributions compared to prior methods.

Prompt engineering for healthcare: Methodologies and applications

Prompt engineering is a key technique in NLP that focuses on designing inputs to improve model performance, and it has become especially important with the rise of large language models. Its impact is growing in the healthcare domain, where it supports applications like question answering, text summarization, and machine translation. Despite its importance, there is a lack of comprehensive reviews specifically addressing prompt engineering in medical NLP. This article aims to fill that gap by summarizing recent advances and providing guidance and inspiration for researchers in healthcare NLP.

TRUSTLLM: TRUSTWORTHINESS IN LARGE LANGUAGE MODELS

This paper introduces TRUSTLLM, a comprehensive framework for evaluating the trustworthiness of large language models across multiple dimensions such as truthfulness, safety, fairness, robustness, privacy, and ethics. The authors benchmark 16 major LLMs using over 30 datasets and find that higher-performing models generally exhibit greater trustworthiness, with proprietary models often outperforming open-source ones. However, some open-source models like Llama2 achieve comparable results, though certain models may become overly cautious and reduce their usefulness. The study highlights ongoing challenges such as misinformation, bias, privacy risks, and ethical limitations, emphasizing the need for improved methods, transparency, and collaboration to enhance LLM reliability.

Summary of ChatGPT-Related research and perspective towards the future of large language models

This paper surveys research on ChatGPT (GPT-3.5 and GPT-4) and related large language models, examining their capabilities and applications across many fields. It highlights key innovations such as large-scale pre-training, instruction fine-tuning, and Reinforcement Learning from Human Feedback that improve performance and adaptability. The study analyzes 194 arXiv papers, showing growing interest in ChatGPT, especially in natural language processing, with expanding use in domains like education, medicine, and science. It also discusses the models’ implications, ethical concerns, and directions for future research.