📖 Education
Yonsei University, Seoul Campus
Bachelor of Economics (Major) & Applied Statistics (Minor)
- INTRODUCTION TO STATISTICS (A0)
- STATISTICAL METHOD (A+)
- CALCULUS (TBD)
- LINEAR ALGEBRA (B+)
- MATHEMATICAL STATISTICS 1 (A+)
- LINEAR REGRESSION (B+)
- R AND PYTHON PROGRAMMING (A+)
- DATA STRUCTURE (TBD)
- SPECIAL PROBLEMS IN COMPUTING (A0)
- SOCIAL INFORMATICS (A+)
- TIME SERIES ANALYSIS (A+)
Student Clubs
🏆 Competition Awards
Topic / Task | Result |
---|---|
Machine Reading Compehension |
🥈 2nd (2/26) |
Korean Standard Industry Classification |
🎖 7th (7/311) |
KLUE benchmark Natural Language Inference |
🥇 1st (1/468) |
Python Code Clone Detection |
🥉 3rd (3/337) |
Stock Price Forecast on KOSPI & KOSDAQ |
🎖 6th (6/205) |
**Dacon is Kaggle alike competition platform in Korea.
🛠 Multimodal Projects
KoDALLE: Text to Fashion (2021)
Generating dress outfit images based on given input text | 📄 Presentation
- Created training pipeline from VQGAN through DALLE
- Maintained versions of 1 million pairs image-caption dataset.
- Trained VQGAN and DALLE model from the scratch.
- Established live demo for the KoDALLE on Huggingface Space via FastAPI.
🔐 Deep Learning Security Projects
Language Model Memorization (2022)
Implementation of Carlini et al(2020) Extracting Training Data from Large Language Models
- Accelerated inference speed with parallel Multi-GPU usage.
- Ruled out 'low-quality repeated generations' problem of the paper with repetition penalty and with ngram restriction.
Membership Inference Attack (2022)
Implementation of Shokri et al(2016) Membership Inference Attacks Against Machine Learning Models
- Prevented overfitting of shadow models' by adding early stop, regularizing with weight decay and allocating train/val/test datasets.
- Referenced Carlini et al(2021) to conduct further research on different types of models and metrics.
- Reproduced attack metrics as the following.
MIA Attack Metrics | Accuracy | Precision | Recall | F1 Score |
---|---|---|---|---|
CIFAR10 | 0.7761 | 0.7593 | 0.8071 | 0.7825 |
CIFAR100 | 0.9746 | 0.9627 | 0.9875 | 0.9749 |
MIA ROC Curve CIFAR10 | MIA ROC Curve CIFAR100 |
---|---|
💬 Natural Language Processing Projects
KoQuillBot (2022) & T5 Translation (2022)
Paraphrasing tool with round trip translation utilizing T5 Machine Translation. | 🤗 KoQuillBot Demo & 🤗 Translator Demo
BLEU Score | Translation Result | |
---|---|---|
Korean ➡️ English | 45.15 | 🔗 Inference Result |
English ➡️ Korean | - | - |
Deep Encoder Shallow Decoder (2022)
Implementation of Kasai et al(2020) Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation | 📄 Translation Output
- Composed custom dataset, trainer, inference code in pytorch and huggingface.
- Trained and hosted encoder-decoder transformers model using huggingface.
BLEU Score | Translation Result | |
---|---|---|
Korean ➡️ English | 35.82 | 🔗 Inference Result |
English ➡️ Korean | - | - |
KLUE-RBERT (2021)
Extracting relations between subject and object entity in KLUE Benchmark dataset | ✍️ Blog Post
- Finetuned RoBERTa model according to RBERT structure in pytorch.
- Applied stratified k-fold cross validation for the custom trainer.
Conditional Generation with KoGPT (2021)
Sentence generation with given emotion conditions | 🤗 Huggingface Demo
- Finetuned KoGPT-Trinity with conditional emotion labels.
- Maintained huggingface hosted model and live demo.
Machine Reading Comprehension in Naver Boostcamp (2021)
Retrieved and extracted answers from wikipedia texts for given question | ✍️ Blog Post
- Attached bidirectional LSTM layers to the backbone transformers model to extract answers.
- Divided benchmark into start token prediction accuracy and end token prediction accuracy.
Mathpresso Corporation Joint Project (2020)
Corporate joint project for mathematics problems classification task | 📄 Presentation
- Preprocessed Korean mathematics problems dataset based on EDA.
- Maintained version of preprocessing module.
Constructing Emotional Instagram Posts Dataset (2019)
Created Emotional Instagram Posts(글스타그램) dataset | 📄 Presentation
- Managed version control for the project Github Repository.
- Converted Korean texts on image file into text file using Google Cloud Vision API.
👀 Computer Vision Projects
ElimNet (2021)
Elimination based Lightweight Neural Net with Pretrained Weights | 📄 Presentation
- Constructed lightweight CNN model with less than 1M #params by removing top layers from pretrained CNN models.
- Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
- Compared metrics accross VGG11, MobileNetV3 and EfficientNetB0.
Face Mask, Age, Gender Classification in Naver Boostcamp (2021)
Identifying 18 classes from given images: Age Range(3 classes), Biological Sex(2 classes), Face Mask(3 classes) | ✍️ Blog Post
- Optimized combination of backbone models, losses and optimizers.
- Created additional dataset with labels(age, sex, mask) to resolve class imbalance.
- Cropped facial characteristics with MTCNN and RetinaFace to reduce noise in the image.
Realtime Desktop Posture Classification (2020)
Real-time desk posture classification through webcam | 📷 Demo Video
- Created real-time detection window using opencv-python.
- Converted image dataset into Yaw/Pitch/Roll numerical dataset using RetinaFace model.
- Trained and optimized random forest classification model with precision rate of 93%.
🕸 Web Projects
Exchange Program Overview Website (2020)
Overview for student life in foreign universities | ✈️ Website Demo
- 3400 Visitors within a year (2021.07 ~ 2022.07)
- 22000 Pageviews within a year (2021.07 ~ 2022.07)
- 3 minutes+ of Average Retention Time
- Collected and preprocessed 11200 text review data from the Yonsei website using pandas.
- Visualized department distribution and weather information using matplotlib.
- Sentiment analysis on satisfaction level for foreign universities with pretrained BERT model.
- Clustered universities with provided curriculum with K-means clustering.
- Hosted reports on universities using Gatsby.js, GraphQL, and Netlify.
fitcuration website (2020)
Search-based exercise retrieval web service | 📷 Demo Video
- Built retrieval algorithm based on search keyword using TF-IDF.
- Deployed website using Docker, AWS RDS, AWS S3, AWS EBS
- Constructed backend using Django, Django ORM & PostgreSQL.
- Composed client-side using Sass, Tailwind, HTML5.
💰 Quantitative Finance Projects
Forecasting Federal Rate with Lasso Regression Model (2022)
Federal Rate Prediction for the next FOMC Meeting
- Wrangled quantitative dataset with Finance Data Reader.
- Yielded metrics and compared candidate regression models for the adaquate fit.
- Hyperparameter optimization for the candidate models.
Korean Spinoff Event Tracker (2020)
Get financial data of public companies involved in spinoff events on Google Spreadsheet | 🧩 Dataset Demo
- Wrangled finance dataset which are displayed on Google Sheets
🏷 Opensource Contributions
NVlabs/stylegan2-ada-pytorch (2021)
Fixed torch version comparison fallback error for source repo of NVIDIA Research | ✍️ Pull Request
- Skills: torch, torchvision
docker/docker.github.io (2020)
Updated PostgreSQL initialization for "Quickstart: dockerizing django" documentation | 🐳 Pull Request
- Skills: Docker, docker-compose, Django
🗄 ETCs
Covid19 Confirmed Cases Prediction (2020)
Predict the spread of COVID-19 in early stage after its entrance to country.
- Fixed existing errors on Github Repository.
- Wrote footnotes in both English and Korean.
- ±5% accuracy for one-day prediction.
- ±10% accuracy for 30-day prediction.
Indigo (2019)
Don't miss concerts for your favorite artists with KakaoTalk Chatbot | 📷 Demo Video
- Created API server for KakaoTalk chatbot with Flask, Pymongo and MongoDB.
- Deployed the API server on AWS EC2.
- Visualized concert schedules on user's Google Calendar.
- Created / Updated events in Google Calendar.
🛠 Skillsets
Data Analysis and Machine Learning
- Data Analysis Library: pandas, numpy
- Deep Learning: pytorch, transformers
- Machine Learning: scikit-learn, gensim, xgboost
Backend
- Python / Django - Django ORM, CRUD, OAuth
- Python / FastAPI(uvicorn) - CRUD API
- Python / Flask - CRUD API
Client
- HTML / Pug.js
- CSS / Sass, Tailwind, Bulma
- JavaScript / ES6
Deployment
- Docker, docker-compose
- AWS EC2, Google Cloud App Engine
- AWS S3, RDS (PostgreSQL)
- AWS Elastic Beanstalk, CodePipeline;