Link prediction using hebbian graph embeddings (Issued, 2022)
A patent that uses the hebbian graph embeddings algorithm for link prediction. Hebbian graph embeddings learns the embeddings of the nodes in a graph and uses these embeddings for link prediction and recommendations.
Similarity learning-based device attribution (Issued, 2021)
Using the probabilities outputted from a Gaussian Mixture Model as features, this patented algorithm uses a random forest classifier for cross device cookie matching.
JCOL: A Java package for solving the graph coloring problem (Journal of Open Source Software, 2019)
JCOL is a Java graph coloring package that does a cascade of three randomized graph coloring algorithms including DSATUR, randomized Iterated Greedy and Min-Conflicts local search. Results show that it is able to outperform ColPack on some DIMACS instances.
GKNAP: A Java and C++ package for solving the multidimensional knapsack problem (Journal of Open Source Software, 2019)
GKNAP is a package to solve the multiconstraint (multidimensional) knapsack problem. It is implemented in both Java and C++. The algorithm uses Lagrangian multipliers as constraint weights in a greedy algorithm. GKNAP is able to solve almost all benchmarks to optimality.
Model Adaptation via Model Interpolation and Boosting for Web Search Ranking (EMNLP, 2009)
By interpolating between several models (from different domains, but the same target), the algorithm creates a weighted ensemble of ranking algorithms which outperform each individual algorithm for search engine ranking.
Three Proofs that the Square Root of 2 Is Irrational
This short article surveys three proofs that the square root of 2 is irrational. The first proof is a simple proof by contradiction and the other two proofs use field theory from abstract alggebra.
Comparison of Stochastic Forecasting Models
This paper explores five stochastic forecasting methods like stochastic differential equations, arima, kalman filter, and the bayesian filter. It tries to forecast stock prices of Microsoft, Tesla and Target, 54 days into the future, and finds interesting results.
Comparison of Three Recent Personalization Algorithms
This article compares three recent personalization algorithms including Taxonomy Discovery for Personalized Recommendations, Multi-Matrix Factorization and Bayesian Personalized Ranking. It computes the hit rate @10 for all three algorithms on a sample of 1 million users and 200 thousand items.
Hebbian Graph Embeddings
Using an error-free (derivative-free) associative learning update rule (Gaussian perturbed, as in annealing), this algorithm learns the embeddings of the nodes (vertices) of a graph. Using nearest neighbors, the algorithm is applied to link prediction, reconstruction and to a recommender system at Target Corporation. The algorithm outperforms almost all current algorithms (except SEAL) in the task of link prediction and reconstruction.
Genetic Algorithm for a class of Knapsack Problems
A genetic algorithm is applied to the task of solving hard knapsack problems introducted by D. Pisinger called Spanner knapsack instances. The algorithm is able to solve all Spanner instances to optimality, and so, it can be applied to larger and harder instances for which the optimum solution is unknown.
Analysis of Greenhouse Gases
Using counterfactuals (through extrapolation), this work tries to predict the temperature increase in the atmosphere when the CO2 and CH4 levels increase.
Randomized heuristic for the maximum clique problem
A fast random search algorithm for the maximum clique problem. The algorithm finds good large cliques in DIMACS benchmarks very fast, and these cliques can be used in further applications, like graph coloring or recommender systems.
Other Technical Papers:
Markowitz Portfolio Optimization
This is a course project on Markowitz Portfolio Optimization as part of a masters degree course. Portfolio Optimization aims to generate proportions of the stocks to invest into.
A One Pager on Bayesian Statistics in Baseball
This short one page tutorial explains how Bayesian statistics and hierarchical models can be used for analyzing the batting hits of baseball players.
Introduction to Matrix Factorization for Recommender Systems
This short expository tutorial on matrix factorization introduces the reader to PCA, SVD and general principles of matrix factorization for recommender systems including describing briefly on how to parallelize the algorithm to run on potentially millions of dimensions.
Notes on Icebreaker
This is an expository tutorial on Icebreaker, a new publication from Microsoft Research, which is able to achieve state of the art performance on tasks in which there is inherent missing data. Icebreaker is able to suggest which attributes to impute to gain the maximum benefit, using mutual information. The tutorial talks about the Bayes Rule, Autoencoders, the evidence lower bound, Variational Methods and Autoencoders, Deep Sets (set based machine learning) and finally leading to Icebreaker.
Analysis of E-Commerce Product Graphs
This work analyses the e-commerce product graph at Target Corporation using the Erdos-Renyi random graph model. Results show that the graph exhibits assortive mixing, the small world property, and a scale free degree distribution (power law).
Open Source Code:
Stock Market Forecasting Exercise (R)
This R code has implementations of five forecasting models (using R libraries) which try to predict stock prices 54 days into the future.
Bayesian Personalized Ranking (BPR) (Python)
Bayesian Personalized Ranking (BPR) is a recommender systems algorithm that can be used for personalization in movie rental services, retail stores, online book stores and so on. This is a python implementation which achieves 51% test set hit rate @ 10 on the MovieLens data set.
Genetic Algorithm for the Maximum Clique Problem (C++)
A genetic algorithm for the maximum clique problem that uses innovative crossovers. The algorithm is able to solve publicly available instances quite efficiently. Results are on the git page.
Heuristic for Graph Coloring (Java)
A very powerful Java implementation of a heuristic that requires very little computational resources. The code is able to solve several of the DIMACS instanes in a short amount of time.
Sudoku Solvers and Puzzle Generator (C++ and Java)
A backtracking Sudoku solver which is non-recursive. It is inspired by Knuth's dancing links algorithm. The backtracking solver is quite fast and solves most Sudokus in less than a second. A Java based puzzle generator generates Sudoku puzzles which have a unique solution.
Iterative Local Search - Quadratic Assignment Problem (C++)
The QAP (quadratic assignment problem) is a strongly NP-hard problem which is difficult to solve even for very small instances. This algorithm is able to solve the nugXX (e.g. nug22) instances quite fast. More experiments are needed to see if the algorithm scales to very large instances.
Branch and Bound Algorithm for the 0/1 Knapsack Problem (Java)
Branch and bound is an exact algorithm that can solve the knapsack problem with provable optimality. This Java code implements branch and bound for the 0/1 knapsasck problem using Lagrangian relaxation to prune the search tree. It uses best first search and is able to solve moderately large instances quite fast (like 2000 variables).
Monty Hall Simulation (Java)
A simulation of the Monty hall paradox in probability theory.
Education:
M.S. Applied and Computational Mathematics, Johns Hopkins University
M.S. Computer Science, University of Southern California