🎓 Master's student in Software Engineering
🔬 Researching LLM reasoning enhancement and reliability
⚙️ Interested in LLM systems, RAG architectures, and inference infrastructure
I enjoy building systems that make large language models more reliable, scalable, and efficient.
-
LLM Systems & Infrastructure
- vLLM inference optimization
- high-throughput LLM serving
- GPU inference platforms
-
Retrieval-Augmented Generation (RAG)
- GraphRAG architectures
- knowledge graph retrieval
- multi-hop reasoning
-
AI Engineering
- distributed inference services
- experiment automation
- ML system performance optimization
A domain-specific QA system for food health recommendations.
Key features:
- Knowledge graph built from collected food standard data
- GraphRAG multi-hop retrieval
- Embedding + Rerank entity matching pipeline
- Significant improvement in entity matching accuracy
Tech stack:
Python · FastAPI · Neo4j · Milvus · RAG
Designed and maintained a shared LLM inference platform for 30+ researchers.
Highlights:
- Centralized inference service architecture
- vLLM + Ollama model serving
- Prometheus monitoring and alert system
- Optimized 32B model inference using 8-bit quantization + tensor parallel
Performance improvement:
- Throughput increased from ~60 TPS → 600+ TPS
Tech stack:
vLLM · Docker · Linux · Prometheus · GPU servers
Currently exploring methods to improve the reliability of LLM reasoning using structured knowledge.
Topics of interest:
- reasoning-aware retrieval
- proposition graphs
- knowledge-grounded reasoning chains
- National Third Prize — Challenge Cup Competition
- MCM/ICM Mathematical Contest in Modeling — Honorable Mention
- Annual Author (technology writing platform)
Email: fanghejin@qq.com
GitHub: https://github.com/Xianyu39

