- Worked in Analytics Zoo team aiming to scale TensorFlow/PyTorch applications to distributed big data frameworks.
- Encapsulated commonly used big data pre-processing logic into 10+ PySpark DataFrame functions, e.g., transform continuous columns to categorical columns, which have been merged into the Analytics Zoo GitHub repository.
- Built a recommender system using the DeepFm model on WeChat dataset of more than 10 millions video feeds, to predict the probability of users’ interactive actions (likes, clicks on avatars, favorites) according to historical n-day behavior data and video information.
- Scaled out PySpark data pre-processing and PyTorch prediction model from single node to YARN clusters utilizing the Orca framework, which reduced the model processing time by 20%.
- Proposed a novel method to improve many existing graph fingerprinting methods by parameterizing with a multi-channel fuzzy histogram and CNN, which increases their graph-level prediction accuracies by 4% to 7% on average, and consistently outperforms or matches the performance of advanced higher-order GNN methods.
- Developed smooth fuzzy histograms with PyTorch to convert node-level fingerprints to universal graph representations, and visualized hidden layers and graph structures with TensorBoard and NetworkX to find performance bottlenecks.
- Designed and ran the experiments on synthetic and real-world datasets using the lab's CUDA machines and AWS EC2.
- Publication: A Simple Yet Effective Method Improving Graph Fingerprints for Graph-Level Prediction. Jiaxin Ying*, Jiaqi Ma*, Qiaozhu Mei. The WebConf (WWW) 2021 Workshop on Graph Learning Benchmarks.
- Augmented the ACT dataset by adding human-labeled bounding boxes with annotations and cleansed the data.
- Designed and implemented two noun grounding modules by adapting Faster RCNN networks and attention networks, to determine bounding boxes with inputs of images, captions and noun embeddings from ResNet, BERT and word2vec.
- Participated in running experiments to compare our verb acquisition model with a mental attention layer pre-trained from noun grounding task with baseline models.
- Wrote the first draft of Related Work section of the paper and came up with multiple suggestions to other parts.
- Worked on the capstone project sponsored by Cambricon, to realize the intelligent fall detection alarm system for the elderly through cameras and deep learning algorithms, which can send an alarm in case of emergency.
- Developed the detection model with OpenPose & VGG16 to classify a given sequence (5 continuous frames) into 3 categories (stand/fall/lie) in real-time, achieving an accuracy of 95.2%.
- Used Twilio API to send text messages on falling events, and displayed corresponding camera videos on a web app.
- Developed an Instagram-like web application using React for front-end, Flask for back-end, and SQLite for database, supporting features like user sign-in, posts and comments.
- Built a multi-worker and fault-tolerant MapReduce server in Python which can process user-submitted tasks.
- Implemented a search engine from scratch, based on text segmentation, Hadoop MapReduce indexing, and tf-idf scores.
- Analyzed a Twitter social graph with Spark RDD and DataFrame on Zeppelin Notebook, and implemented the PageRank algorithm to find the most influential users.
- Debugged the programs utilizing YARN & Spark UI and identified performance bottlenecks through DAG visualization.
- Deployed the Spark applications to Azure Databricks and reduced the end-to-end execution time by 12% compared to Azure HDInsight.
- Designed the relational database schema for a Facebook-like service and developed a Java application for SQL executions with JDBC, to support features like user information queries, nearby event discoveries, and friend suggestions.
- Based on "Breast Cancer Wisconsin Data Set" which recorded the characteristics of the cell nuclei, applied principal component analysis to explore the most informative combination of predictors.
- Compared performance of logistic regression, random forest, SVM, LDA and QDA implemented in R, and found QDA with subset selection is most effective to make a well-performed prediction for the diagnosis.
- Plotted ROC curve for QDA, adjusted posterior probability to control positive error rate and negative error rate.
- Utilized inheritance and basic dynamic polymorphism to implement sorted priority queue, binary heap priority queue and pairing heap priority queue developed from templated generic code.
- Applied branch and bound algorithm to solve TSP problem for complete weighted graph, used MST to get the lower bound for remaining cost, and explored various heuristic approaches to achieve a nearly-optimal solution.
- Implemented a C++ version of the game 2048 which responded to player's keystrokes, and enabled customized tile values such as Unicode Emojis by reading from files provided by the player.
- Solved a nonlinear optimization task in MATLAB to find the minimum for an objective function which integrated the exponential of polynomial function with degree 101 in the standard 3-simplex region, and proved that the local minimum was global by analysis of convexity of objective function.
- Applied Quasi-Newton to increase time efficiency, Gauss-Lobatto to do integration part, and line search to find out the optimum step with consideration of time-efficiency, accuracy and stability.
Data Structures and Algorithms
Foundations of Computer Science
System Design of a Search Engine
Introduction to Algorithms
Database Management Systems
Advanced Linear Algebra
Discrete State Stochastic Processes
Combinatorics and Graph Theory
Combinatorics and Graph Theory
Programming & Elem. Data Structures
Introduction to Numerical Methods