I'm Chaeyoung Kim. My expertise and research primarily focus on Natural Language Processing (NLP), Large Language Models (LLMs), and Multimodal AI. Most of my projects and experiences involve machine learning (ML) and deep learning (DL), particularly in building and evaluating models. I am also skilled in pipeline management, including data crawling, preprocessing, feature engineering, and model development, as well as data analysis using SQL. Checkout the source code and the documentation at Github and here .
For more information, have a look at my curriculum vitae .
A multithreaded hyperparameter tuning system that leverages C's pthread library to parallelize grid search over different hyperparameter combinations. Each thread executes a Python subprocess to train a dummy classification model with specific parameters. All experiment results are tracked and visualized using MLflow, enabling systematic selection of the best-performing configuration.
A lightweight intrusion detection system prototype that parses raw network packet logs using a C-based parser and applies a TensorFlow CNN model for attack classification. It simulates a real-world packet analysis pipeline by extracting IP, port, protocol, and flag features via bitwise operations and struct parsing, then feeding them into a deep learning model for binary classification of malicious behavior.
An automated data pipeline for sequential user behavior data that integrates extraction, transformation, and loading (ETL) using Airflow.
A weather chatbot that recommends appropriate clothing and food based on the user's current location and weather conditions. It was developed using a Retrieval-Augmented Generation (RAG) structure, leveraging the GPT-4 API key and a custom database built through Python-based web crawling and NLP preprocessing, hosted on a GCP database.
View CodeDesigned and developed a website hosted via GitHub Pages using the Jekyll framework. Additionally, created comprehensive documentation implemented with the Python-Sphinx framework, hosted on ReadTheDocs to enhance user understanding of the project structure.
View CodeDesigned and implemented a framework to automate the retrieval, parsing, and storage of research articles based on user-defined search queries or keywords. Leveraged PUBMED and EuropePMC APIs to download PDF and XML papers, parsed them, and saved the results as Parquet files in Azure Blob Storage. Integrated this data with a GraphRAG system, enabling the retrieval of the most relevant answers based on user input. The GraphRAG system was built using the LangChain framework and Neo4j database, with Cypher queries for efficient graph traversal.
View CodeDesigned and implemented the UI/UX and customized C# scripts for an Oculus-based VR game using Unity, featuring a poker-themed cheating scenario where players aim to achieve a target amount in a Texas Hold'em casino setting. Configured the XR Interaction Toolkit and Meta XR SDK to create intuitive interfaces and interactions, enabling players to seamlessly act as both the dealer and player while executing sleight-of-hand tricks. Contributed to gameplay mechanics and integrated dynamic AI opponents using Barracuda and ONNX, delivering an immersive and engaging user experience.
View CodeA project to index game item data using OpenSearch and Sentence Transformers, and build a REST API that provides keyword search and vector-based recommendation functions.
View CodeA project to implement a simple profiler for measuring and analyzing the performance of deep learning model (CNN) on NPU.
View CodePlease feel free to contact me anytime you want!