AI Scientist at Target Corporation
Grad Student at Johns Hopkins University
Previously Research Engineer at Microsoft Bing
| Git Repositories
- JCOL: A Java package for solving the graph coloring problem
JCOL is a Java graph coloring package that does a cascade of three randomized graph coloring algorithms including DSATUR, randomized Iterated Greedy and Min-Conflicts local search. Results show that it is able to outperform ColPack on some DIMACS instances.
- GKNAP: A Java and C++ package for solving the multidimensional knapsack problem
GKNAP is a package to solve the multiconstraint (multidimensional) knapsack problem. It is implemented in both Java and C++. The algorithm uses Lagrangian multipliers as constraint weights in a greedy algorithm. GKNAP is able to solve almost all benchmarks to optimality.
- Model Adaptation via Model Interpolation and Boosting for Web Search Ranking
By interpolating between several models (from different domains, but the same target), the algorithm creates a weighted ensemble of ranking algorithms which outperform each individual algorithm for search engine ranking.
- Similarity learning-based device attribution
Using the probabilities outputted from a Gaussian Mixture Model as features, this patented algorithm uses a random forest classifier for cross device cookie matching.
- Genetic Algorithm for the 0/1 Multidimensional Knapsack Problem
This algorithm is a hybrid greedy genetic algorithm to solve the multidimensional knapsack problem. A different version of the algorithm was published on the Journal of Open Source Software.
- Implementation of iterative local search (ILS) for the quadratic assignment problem
A population based iterative local search for the quadratic assignment problem. The algorithmis quite fast and is able to find optimum solutions to benchmark instances.
- Simulated Annealing Algorithm for the Multiple Choice Multidimensional Knapsack Problem
A randomized algorithm based on simulated annealing for the multiple choice multi-constraint knapsack problem. The implementation is open source and uses new ways of making an infeasible solution valid using a randomized greedy approach.
- Hebbian Graph Embeddings
Using an error-free (derivative-free) associative learning update rule (Gaussian perturbed, as in annealing), this algorithm learns the embeddings of the nodes (vertices) of a graph. Using nearest neighbors, the algorithm is applied to link prediction, reconstruction and to a recommender system at Target Corporation. The algorithm outperforms almost all current algorithms (except SEAL) in the task of link prediction and reconstruction.
- Introduction to Matrix Factorization for Recommender Systems
This short expository tutorial on matrix factorization introduces the reader to PCA, SVD and general principles of matrix factorization for recommender systems including describing briefly on how to parallelize the algorithm to run on potentially millions of dimensions.
- Genetic Algorithm for a class of Knapsack Problems
A genetic algorithm is applied to the task of solving hard knapsack problems introducted by D. Pisinger called Spanner knapsack instances. The algorithm is able to solve all Spanner instances to optimality, and so, it can be applied to larger and harder instances for which the optimum solution is unknown.
- Parallel Taxonomy Discovery
This paper talks about a parallel implementation of the algorithm in the paper "Taxonomy Discovery for Personalized Recommendation". The algorithm is implemented on Apache Spark and it is several order of magnitude faster.
- GCLIQUE: An Open Source Genetic Algorithm for the Maximum Clique Problem
A genetic algorithm for the maximum clique problem which finds good solutions very fast to most DIMACS instances.
- Notes on Icebreaker
This is an expository tutorial on Icebreaker, a new publication from Microsoft Research, which is able to achieve state of the art performance on tasks in which there is inherent missing data. Icebreaker is able to suggest which attributes to impute to gain the maximum benefit, using mutual information. The tutorial talks about the Bayes Rule, Autoencoders, the evidence lower bound, Variational Methods and Autoencoders, Deep Sets (set based machine learning) and finally leading to Icebreaker.
- Analysis of E-Commerce Product Graphs
This work analysis the e-commerce product graph at Target Corporation using the Erdos-Renyi random graph model. Results show that the graph exhibits assortive mixing, the small world property, and a scale free degree distribution (power law).
- Analysis of Greenhouse Gases
Using counterfactuals (through extrapolation), this work tries to predict the temperature increase in the atmosphere when the CO2 and CH4 levels increase.
- Randomized heuristic for the maximum clique problem
A fast random search algorithm for the maximum clique problem. The algorithm finds good large cliques in DIMACS benchmarks very fast, and these cliques can be used in further applications, like graph coloring or recommender systems.
Open Source Code:
- Randomized Heuristic for the Maximum Clique Problem (Java)
A simple randomized heuristic for the maximum clique problem in Java. The algorithm is very fast and requires little memory.
- Genetic Algorithm for the Maximum Clique Problem (C++)
A genetic algorithm for the maximum clique problem that uses innovative crossovers. The algorithm is able to solve publicly available instances quite efficiently. Results are on the git page.
- Genetic Algorithm - 0/1 Multi-Constraint Knapsack Problem (C++ and Java)
Implementation of a genetic algorithm for the 0/1 multiconstraint (multidimensional) knapsack problem in C++ as well as Java.
- Heuristic for Graph Coloring (Java)
A very powerful Java implementation of a heuristic that requires very little computational resources. The code is able to solve several of the DIMACS instanes in a short amount of time.
- Sudoku Solvers and Puzzle Generator (C++ and Java)
A backtracking Sudoku solver which is non-recursive (so no stack overflows). It is inspired by Knuth's dancing links algorithm. The backtracking solver is quite fast and solves most Sudokus in less than a second. A Java based puzzle generator generates Sudoku puzzles which have a unique solution.
- Simulated Annealing Algorithm - Multiple Choice Multidimensional knapsack problem (C++)
A simulated annealing implementation for the multiple choice multidimensional knapsack problem with C++ code.
- Iterative Local Search - Quadratic Assignment Problem (C++)
The QAP (quadratic assignment problem) is a strongly NP-hard problem which is difficult to solve even for very small instances. This algorithm is able to solve the nugXX (e.g. nug22) instances quite fast. More experiments are needed to see if the algorithm scales to very large instances.
- Branch and Bound Algorithm for the 0/1 Knapsack Problem (Java)
Branch and bound is an exact algorithm that can solve the knapsack problem with provable optimality. This Java code implements branch and bound for the 0/1 knapsasck problem using Lagrangian relaxation to prune the search tree. It uses best first search and is able to solve moderately large instances quite fast (like 2000 variables).
- Monty Hall Simulation (Java)
A simulation of the Monty hall paradox in probability theory.
shah DOT shalin AT gmail DOT com