Portfolio

September 25th 2022

📖 Education

Yonsei University, Seoul Campus

Bachelor of Economics (Major) & Applied Statistics (Minor)
  • INTRODUCTION TO STATISTICS (A0)
  • STATISTICAL METHOD (A+)
  • CALCULUS (TBD)
  • LINEAR ALGEBRA (B+)
  • MATHEMATICAL STATISTICS 1 (A+)
  • LINEAR REGRESSION (B+)
  • R AND PYTHON PROGRAMMING (A+)
  • DATA STRUCTURE (TBD)
  • SPECIAL PROBLEMS IN COMPUTING (A0)
  • SOCIAL INFORMATICS (A+)
  • TIME SERIES ANALYSIS (A+)

Student Clubs


🏆 Competition Awards

Topic / Task Result
Machine Reading
Compehension
🥈 2nd
(2/26)
Korean Standard
Industry Classification
🎖 7th
(7/311)
KLUE benchmark
Natural Language Inference
🥇 1st
(1/468)
Python Code
Clone Detection
🥉 3rd
(3/337)
Stock Price Forecast
on KOSPI & KOSDAQ
🎖 6th
(6/205)

**Dacon is Kaggle alike competition platform in Korea.


🛠 Multimodal Projects

KoDALLE: Text to Fashion (2021)

image

Generating dress outfit images based on given input text | 📄 Presentation

  • Created training pipeline from VQGAN through DALLE
  • Maintained versions of 1 million pairs image-caption dataset.
  • Trained VQGAN and DALLE model from the scratch.
  • Established live demo for the KoDALLE on Huggingface Space via FastAPI.

🔐 Deep Learning Security Projects

Language Model Memorization (2022)

Implementation of Carlini et al(2020) Extracting Training Data from Large Language Models

  • Accelerated inference speed with parallel Multi-GPU usage.
  • Ruled out 'low-quality repeated generations' problem of the paper with repetition penalty and with ngram restriction.

Membership Inference Attack (2022)

Implementation of Shokri et al(2016) Membership Inference Attacks Against Machine Learning Models

  • Prevented overfitting of shadow models' by adding early stop, regularizing with weight decay and allocating train/val/test datasets.
  • Referenced Carlini et al(2021) to conduct further research on different types of models and metrics.
  • Reproduced attack metrics as the following.
MIA Attack Metrics Accuracy Precision Recall F1 Score
CIFAR10 0.7761 0.7593 0.8071 0.7825
CIFAR100 0.9746 0.9627 0.9875 0.9749
MIA ROC Curve CIFAR10 MIA ROC Curve CIFAR100
roc_curve CIFAR10 roc_curve CIFAR100

💬 Natural Language Processing Projects

KoQuillBot (2022) & T5 Translation (2022)

Paraphrasing tool with round trip translation utilizing T5 Machine Translation. | 🤗 KoQuillBot Demo & 🤗 Translator Demo

BLEU Score Translation Result
Korean ➡️ English 45.15 🔗 Inference Result
English ➡️ Korean - -

Deep Encoder Shallow Decoder (2022)

Implementation of Kasai et al(2020) Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation | 📄 Translation Output

  • Composed custom dataset, trainer, inference code in pytorch and huggingface.
  • Trained and hosted encoder-decoder transformers model using huggingface.
BLEU Score Translation Result
Korean ➡️ English 35.82 🔗 Inference Result
English ➡️ Korean - -

KLUE-RBERT (2021)

Extracting relations between subject and object entity in KLUE Benchmark dataset | ✍️ Blog Post

  • Finetuned RoBERTa model according to RBERT structure in pytorch.
  • Applied stratified k-fold cross validation for the custom trainer.

Conditional Generation with KoGPT (2021)

Sentence generation with given emotion conditions | 🤗 Huggingface Demo

  • Finetuned KoGPT-Trinity with conditional emotion labels.
  • Maintained huggingface hosted model and live demo.

Machine Reading Comprehension in Naver Boostcamp (2021)

Retrieved and extracted answers from wikipedia texts for given question | ✍️ Blog Post

  • Attached bidirectional LSTM layers to the backbone transformers model to extract answers.
  • Divided benchmark into start token prediction accuracy and end token prediction accuracy.

Mathpresso Corporation Joint Project (2020)

Corporate joint project for mathematics problems classification task | 📄 Presentation

  • Preprocessed Korean mathematics problems dataset based on EDA.
  • Maintained version of preprocessing module.

Constructing Emotional Instagram Posts Dataset (2019)

Created Emotional Instagram Posts(글스타그램) dataset | 📄 Presentation

  • Managed version control for the project Github Repository.
  • Converted Korean texts on image file into text file using Google Cloud Vision API.

👀 Computer Vision Projects

ElimNet (2021)

Elimination based Lightweight Neural Net with Pretrained Weights | 📄 Presentation

  • Constructed lightweight CNN model with less than 1M #params by removing top layers from pretrained CNN models.
  • Assessed on Trash Annotations in Context(TACO) Dataset sampled for 6 classes with 20,851 images.
  • Compared metrics accross VGG11, MobileNetV3 and EfficientNetB0.

Face Mask, Age, Gender Classification in Naver Boostcamp (2021)

Identifying 18 classes from given images: Age Range(3 classes), Biological Sex(2 classes), Face Mask(3 classes) | ✍️ Blog Post

  • Optimized combination of backbone models, losses and optimizers.
  • Created additional dataset with labels(age, sex, mask) to resolve class imbalance.
  • Cropped facial characteristics with MTCNN and RetinaFace to reduce noise in the image.

Realtime Desktop Posture Classification (2020)

Real-time desk posture classification through webcam | 📷 Demo Video

  • Created real-time detection window using opencv-python.
  • Converted image dataset into Yaw/Pitch/Roll numerical dataset using RetinaFace model.
  • Trained and optimized random forest classification model with precision rate of 93%.

🕸 Web Projects

Exchange Program Overview Website (2020)

Overview for student life in foreign universities | ✈️ Website Demo

  • 3400 Visitors within a year (2021.07 ~ 2022.07)
  • 22000 Pageviews within a year (2021.07 ~ 2022.07)
  • 3 minutes+ of Average Retention Time
  • Collected and preprocessed 11200 text review data from the Yonsei website using pandas.
  • Visualized department distribution and weather information using matplotlib.
  • Sentiment analysis on satisfaction level for foreign universities with pretrained BERT model.
  • Clustered universities with provided curriculum with K-means clustering.
  • Hosted reports on universities using Gatsby.js, GraphQL, and Netlify.

fitcuration website (2020)

Search-based exercise retrieval web service | 📷 Demo Video

  • Built retrieval algorithm based on search keyword using TF-IDF.
  • Deployed website using Docker, AWS RDS, AWS S3, AWS EBS
  • Constructed backend using Django, Django ORM & PostgreSQL.
  • Composed client-side using Sass, Tailwind, HTML5.

💰 Quantitative Finance Projects

Forecasting Federal Rate with Lasso Regression Model (2022)

Federal Rate Prediction for the next FOMC Meeting

  • Wrangled quantitative dataset with Finance Data Reader.
  • Yielded metrics and compared candidate regression models for the adaquate fit.
  • Hyperparameter optimization for the candidate models.

Korean Spinoff Event Tracker (2020)

Get financial data of public companies involved in spinoff events on Google Spreadsheet | 🧩 Dataset Demo

  • Wrangled finance dataset which are displayed on Google Sheets

🏷 Opensource Contributions

NVlabs/stylegan2-ada-pytorch (2021)

Fixed torch version comparison fallback error for source repo of NVIDIA Research | ✍️ Pull Request

  • Skills: torch, torchvision

docker/docker.github.io (2020)

Updated PostgreSQL initialization for "Quickstart: dockerizing django" documentation | 🐳 Pull Request

  • Skills: Docker, docker-compose, Django

🗄 ETCs

Covid19 Confirmed Cases Prediction (2020)

Predict the spread of COVID-19 in early stage after its entrance to country.

  • Fixed existing errors on Github Repository.
  • Wrote footnotes in both English and Korean.
  • ±5% accuracy for one-day prediction.
  • ±10% accuracy for 30-day prediction.

Indigo (2019)

Don't miss concerts for your favorite artists with KakaoTalk Chatbot | 📷 Demo Video

  • Created API server for KakaoTalk chatbot with Flask, Pymongo and MongoDB.
  • Deployed the API server on AWS EC2.
  • Visualized concert schedules on user's Google Calendar.
  • Created / Updated events in Google Calendar.

🛠 Skillsets

Data Analysis and Machine Learning

  • Data Analysis Library: pandas, numpy
  • Deep Learning: pytorch, transformers
  • Machine Learning: scikit-learn, gensim, xgboost

Backend

  • Python / Django - Django ORM, CRUD, OAuth
  • Python / FastAPI(uvicorn) - CRUD API
  • Python / Flask - CRUD API

Client

  • HTML / Pug.js
  • CSS / Sass, Tailwind, Bulma
  • JavaScript / ES6

Deployment

  • Docker, docker-compose
  • AWS EC2, Google Cloud App Engine
  • AWS S3, RDS (PostgreSQL)
  • AWS Elastic Beanstalk, CodePipeline;
©snoop2head