Education

Experience

Intel Corporation - Big Data Software Engineer Intern

June 2021 - August 2021

  • Worked in Analytics Zoo team aiming to scale TensorFlow/PyTorch applications to distributed big data frameworks.
  • Encapsulated commonly used big data pre-processing logic into 10+ PySpark DataFrame functions, e.g., transform continuous columns to categorical columns, which have been merged into the Analytics Zoo GitHub repository.
  • Built a recommender system using the DeepFm model on WeChat dataset of more than 10 millions video feeds, to predict the probability of users’ interactive actions (likes, clicks on avatars, favorites) according to historical n-day behavior data and video information.
  • Scaled out PySpark data pre-processing and PyTorch prediction model from single node to YARN clusters utilizing the Orca framework, which reduced the model processing time by 20%.

UMich Foreseer Group - Research Assistant Instructed by Prof. Qiaozhu Mei

October 2019 - December 2020

  • Proposed a novel method to improve many existing graph fingerprinting methods by parameterizing with a multi-channel fuzzy histogram and CNN, which increases their graph-level prediction accuracies by 4% to 7% on average, and consistently outperforms or matches the performance of advanced higher-order GNN methods.
  • Developed smooth fuzzy histograms with PyTorch to convert node-level fingerprints to universal graph representations, and visualized hidden layers and graph structures with TensorBoard and NetworkX to find performance bottlenecks.
  • Designed and ran the experiments on synthetic and real-world datasets using the lab's CUDA machines and AWS EC2.
  • Publication: A Simple Yet Effective Method Improving Graph Fingerprints for Graph-Level Prediction. Jiaxin Ying*, Jiaqi Ma*, Qiaozhu Mei. The WebConf (WWW) 2021 Workshop on Graph Learning Benchmarks.

UMich Situated Language & Embodied Dialogue Group - Research Assitant Instructed by Prof. Joyce Y. Chai

May 2020 - January 2021

  • Augmented the ACT dataset by adding human-labeled bounding boxes with annotations and cleansed the data.
  • Designed and implemented two noun grounding modules by adapting Faster RCNN networks and attention networks, to determine bounding boxes with inputs of images, captions and noun embeddings from ResNet, BERT and word2vec.
  • Participated in running experiments to compare our verb acquisition model with a mental attention layer pre-trained from noun grounding task with baseline models.
  • Wrote the first draft of Related Work section of the paper and came up with multiple suggestions to other parts.

Projects

Intelligent Fall Detection System for The Elderly Living Alone

May 2021 - August 2021

  • Worked on the capstone project sponsored by Cambricon, to realize the intelligent fall detection alarm system for the elderly through cameras and deep learning algorithms, which can send an alarm in case of emergency.
  • Developed the detection model with OpenPose & VGG16 to classify a given sequence (5 continuous frames) into 3 categories (stand/fall/lie) in real-time, achieving an accuracy of 95.2%.
  • Used Twilio API to send text messages on falling events, and displayed corresponding camera videos on a web app.

Projects for UMich EECS 485 Web Systems

January 2021 - April 2021

  • Developed an Instagram-like web application using React for front-end, Flask for back-end, and SQLite for database, supporting features like user sign-in, posts and comments.
  • Built a multi-worker and fault-tolerant MapReduce server in Python which can process user-submitted tasks.
  • Implemented a search engine from scratch, based on text segmentation, Hadoop MapReduce indexing, and tf-idf scores.

Iterative Data Processing with Spark

December 2020

  • Analyzed a Twitter social graph with Spark RDD and DataFrame on Zeppelin Notebook, and implemented the PageRank algorithm to find the most influential users.
  • Debugged the programs utilizing YARN & Spark UI and identified performance bottlenecks through DAG visualization.
  • Deployed the Spark applications to Azure Databricks and reduced the end-to-end execution time by 12% compared to Azure HDInsight.

Projects for EECS 484 Database Management Systems

January 2020 - April 2020

  • Designed the relational database schema for a Facebook-like service and developed a Java application for SQL executions with JDBC, to support features like user information queries, nearby event discoveries, and friend suggestions.
  • Migrated the Facebook dataset from Oracle relational database to NoSQL for higher schema flexibility in JSON format, and translated the SQL queries to MongoDB syntax written in JavaScript.

Breast Cancer Dataset Analysis - STATS 415 Data Mining Group Project

November 2019 - December 2019

  • Based on "Breast Cancer Wisconsin Data Set" which recorded the characteristics of the cell nuclei, applied principal component analysis to explore the most informative combination of predictors.
  • Compared performance of logistic regression, random forest, SVM, LDA and QDA implemented in R, and found QDA with subset selection is most effective to make a well-performed prediction for the diagnosis.
  • Plotted ROC curve for QDA, adjusted posterior probability to control positive error rate and negative error rate.

Projects & Labs for VE 280/EECS 281 Data Structures and Algorithms

May 2019 - December 2019

  • Utilized inheritance and basic dynamic polymorphism to implement sorted priority queue, binary heap priority queue and pairing heap priority queue developed from templated generic code.
  • Applied branch and bound algorithm to solve TSP problem for complete weighted graph, used MST to get the lower bound for remaining cost, and explored various heuristic approaches to achieve a nearly-optimal solution.
  • Implemented a C++ version of the game 2048 which responded to player's keystrokes, and enabled customized tile values such as Unicode Emojis by reading from files provided by the player.

Group Project for VV 471 Numerical Methods

June 2019 - August 2019

  • Solved a nonlinear optimization task in MATLAB to find the minimum for an objective function which integrated the exponential of polynomial function with degree 101 in the standard 3-simplex region, and proved that the local minimum was global by analysis of convexity of objective function.
  • Applied Quasi-Newton to increase time efficiency, Gauss-Lobatto to do integration part, and line search to find out the optimum step with consideration of time-efficiency, accuracy and stability.

Skills

NumPy
5 / 5
PyTorch
5 / 5
scikit-learn
5 / 5
Python
5 / 5
Flask
4 / 5
NLTK
4 / 5
Bash
4 / 5
C++
4 / 5
Git
4 / 5
Java
4 / 5
Jupyter Notebook
4 / 5
MATLAB
4 / 5
Matplotlib
4 / 5
SQL
4 / 5
MySQL
4 / 5
SQL*Plus
4 / 5
AWS
4 / 5
Spark
4 / 5
CSS
3 / 5
Docker
3 / 5
React
3 / 5
OpenCV
3 / 5
TensorFlow
3 / 5
HTML
3 / 5
JavaScript
3 / 5
MongoDB
3 / 5
Pandas
3 / 5
Scala
3 / 5
Azure
3 / 5

Selected Courses

  • EECS 281:

    Data Structures and Algorithms

  • EECS 376:

    Foundations of Computer Science

  • EECS 440:

    System Design of a Search Engine

  • EECS 445:

    Machine Learning

  • EECS 477:

    Introduction to Algorithms

  • EECS 484:

    Database Management Systems

  • EECS 485:

    Web Systems

  • MATH 420:

    Advanced Linear Algebra

  • MATH 525:

    Probability Theory

  • MATH 526:

    Discrete State Stochastic Processes

  • MATH 565:

    Combinatorics and Graph Theory

  • MATH 565:

    Combinatorics and Graph Theory

  • STATS 415:

    Data Mining

  • VE 280:

    Programming & Elem. Data Structures

  • VV 471:

    Introduction to Numerical Methods