If you need custom work, please find me on Upwork
Patents:
Publications:
- Causal Effects of ESG Risk Scores on Stock Prices Using Instrumental Variables
This paper studies the causal effects of ESG risk scores on yearly stock market returns.
The results show that as the risk increases, the causal effects in a 2-stage least squares model are negative.
The yearly stock returns decrease as the environmental, social and governance risk increases.
- Multi-Document Financial Question Answering using LLMs
This work explores financial question answering using knowledge graphs, semantic tagging and semantic RAG. The
results show that financial question answering can be improved using semantic tagging and knowledge graphs. The
financial questions used for evaluation are esoteric and difficult to answer, and the answers are not completely
obvious.
- Topic
Driven Text Summarization with Defragmentation using LLMs
The paper describes a prompt based method to do text summarization by first planning to get a set of topics and
then generating a final summary from the topic summaries. The work can also be used to do conditional
summarization based on a given set of topics.
- JCOL: A Java package for solving the graph coloring
problem (Journal of Open Source Software, 2019)
JCOL is a Java graph coloring package that does a cascade of three randomized graph coloring algorithms including
DSATUR, randomized Iterated Greedy and Min-Conflicts local search. Results show that it is able to outperform
ColPack on some DIMACS instances.
- GKNAP: A Java and C++ package for solving the
multidimensional knapsack problem (Journal of Open Source Software, 2019)
GKNAP is a package to solve the multiconstraint (multidimensional) knapsack problem. It is implemented in both
Java and C++. The algorithm uses Lagrangian multipliers as constraint weights in a greedy algorithm. GKNAP is able
to solve almost all benchmarks to optimality.
- Model
Adaptation via Model Interpolation and Boosting for Web Search Ranking (EMNLP, 2009)
By interpolating between several models (from different domains, but the same target), the algorithm creates a
weighted ensemble of ranking algorithms which outperform each individual algorithm for search engine ranking.
- Three
Proofs that the Square Root of 2 Is Irrational
This short article surveys three proofs that the square root of 2 is irrational. The first proof is a simple proof
by contradiction and the other two proofs use field theory from abstract alggebra.
- Genetic Algorithm for the 0/1 Multidimensional Knapsack
Problem
This algorithm is a genetic algorithm to solve the multidimensional knapsack problem. A different version of the
algorithm was published on the Journal of Open Source Software.
- Multi-Task Triplet Loss for Named Entity Recognition using
Supplementary Text
This work uses multi-task learning for named entity recognition, through a triplet loss that uses supplementary
text.
- Comparison of Stochastic Forecasting Models
This paper explores five stochastic forecasting methods like stochastic differential equations, arima, kalman
filter, and the bayesian filter. It tries to forecast stock prices of Microsoft, Tesla and Target, 54 days into
the future, and finds interesting results.
- Multi-Task Learning of Query Intent and Named Entities using Transfer
Learning
This paper explores multi-task learning of named entity recognition and query intent using BERT transfer learning.
We find interesting results in which the NER loss and the intent loss both affect each other in interesting ways.
- A Survey of Latent Factor Models for Recommender Systems
and Personalization
This paper surveys four latent factor models for recommender systems. The methods include hebbian graph
embeddings, bayesian personalized ranking, singular value decomposition and non-negative matrix factorization.
- Comparison of Three Recent Personalization
Algorithms
This article compares three recent personalization algorithms including Taxonomy Discovery for Personalized
Recommendations, Multi-Matrix Factorization and Bayesian Personalized Ranking. It computes the hit rate @10 for
all three algorithms on a sample of 1 million users and 200 thousand items.
- Implementation
of iterative local search (ILS) for the quadratic assignment problem
A population based iterative local search for the quadratic assignment problem. The algorithmis quite fast and is
able to find optimum solutions to benchmark instances.
- Simulated
Annealing Algorithm for the Multiple Choice Multidimensional Knapsack Problem
A randomized algorithm based on simulated annealing for the multiple choice multi-constraint knapsack problem. The
implementation is open source and uses new ways of making an infeasible solution valid using a randomized greedy
approach.
- Hebbian Graph Embeddings
Using an error-free (derivative-free) associative learning update rule (Gaussian perturbed, as in annealing), this
algorithm learns the embeddings of the nodes (vertices) of a graph. Using nearest neighbors, the algorithm is
applied to link prediction, reconstruction and to a recommender system at Target Corporation. The algorithm
outperforms almost all current algorithms (except SEAL) in the task of link prediction and reconstruction.
- Genetic Algorithm for a class of Knapsack Problems
A genetic algorithm is applied to the task of solving hard knapsack problems introducted by D. Pisinger called
Spanner knapsack instances. The algorithm is able to solve all Spanner instances to optimality, and so, it can be
applied to larger and harder instances for which the optimum solution is unknown.
- GCLIQUE:
An Open Source Genetic Algorithm for the Maximum Clique Problem
A genetic algorithm for the maximum clique problem which finds good solutions very fast to most DIMACS instances.
- Analysis of Greenhouse Gases
Using counterfactuals (through extrapolation), this work tries to predict the temperature increase in the
atmosphere when the CO2 and CH4 levels increase.
- Randomized
heuristic for the maximum clique problem
A fast random search algorithm for the maximum clique problem. The algorithm finds good large cliques in DIMACS
benchmarks very fast, and these cliques can be used in further applications, like graph coloring or recommender
systems.
Other Technical Papers:
- Markowitz
Portfolio Optimization
This is a course project on Markowitz Portfolio Optimization as part of a masters degree course. Portfolio
Optimization aims to generate proportions of the stocks to invest into.
- A
One Pager on Bayesian Statistics in Baseball
This short one page tutorial explains how Bayesian statistics and hierarchical models can be used for analyzing
the batting hits of baseball players.
- Introduction to Matrix Factorization for Recommender Systems
This short expository tutorial on matrix factorization introduces the reader to PCA, SVD and general principles of
matrix factorization for recommender systems including describing briefly on how to parallelize the algorithm to
run on potentially millions of dimensions.
- Notes on Icebreaker
This is an expository tutorial on Icebreaker, a new publication from Microsoft Research, which is able to achieve
state of the art performance on tasks in which there is inherent missing data. Icebreaker is able to suggest which
attributes to impute to gain the maximum benefit, using mutual information. The tutorial talks about the Bayes
Rule, Autoencoders, the evidence lower bound, Variational Methods and Autoencoders, Deep Sets (set based machine
learning) and finally leading to Icebreaker.
- Analysis of
E-Commerce Product Graphs
This work analyses the e-commerce product graph at Target Corporation using the Erdos-Renyi random graph model.
Results show that the graph exhibits assortive mixing, the small world property, and a scale free degree
distribution (power law).
Open Source Code:
- Taxonomy for Recommender Systems (Python)
Implementation of the methods in the paper Taxonomy Discovery for Personalized Recommendation. The algorithm
alternates between path sampling (Gibbs sampling) and latent factor updates (stochastic gradient descent).
- Bayesian Personalized Ranking (BPR) (Python)
Bayesian Personalized Ranking (BPR) is a recommender systems algorithm that can be used for personalization in
movie rental services, retail stores, online book stores and so on. This is a python implementation which achieves
51% test set hit rate @ 10 on the MovieLens data set.
- Randomized Heuristic for the Maximum Clique Problem (Java)
A simple randomized heuristic for the maximum clique problem in Java. The algorithm is very fast and requires
little memory.
- Genetic Algorithm for the Maximum Clique Problem (C++)
A genetic algorithm for the maximum clique problem that uses innovative crossovers. The algorithm is able to solve
publicly available instances quite efficiently. Results are on the git page.
- Genetic Algorithm - 0/1 Multi-Constraint Knapsack
Problem (C++ and Java)
Implementation of a genetic algorithm for the 0/1 multiconstraint (multidimensional) knapsack problem in C++ as
well as Java.
- Heuristic for Graph Coloring (Java)
A very powerful Java implementation of a heuristic that requires very little computational resources. The code is
able to solve several of the DIMACS instanes in a short amount of time.
- Sudoku Solvers and Puzzle Generator (C++ and Java)
A backtracking Sudoku solver which is non-recursive. It is inspired by Knuth's dancing links algorithm. The
backtracking solver is quite fast and solves most Sudokus in less than a second. A Java based puzzle generator
generates Sudoku puzzles which have a unique solution.
- Simulated Annealing Algorithm - Multiple Choice
Multidimensional knapsack problem (C++)
A simulated annealing implementation for the multiple choice multidimensional knapsack problem with C++ code.
- Iterative Local Search - Quadratic Assignment Problem
(C++)
The QAP (quadratic assignment problem) is a strongly NP-hard problem which is difficult to solve even for very
small instances. This algorithm is able to solve the nugXX (e.g. nug22) instances quite fast. More experiments are
needed to see if the algorithm scales to very large instances.
- Branch and Bound Algorithm for the 0/1 Knapsack Problem
(Java)
Branch and bound is an exact algorithm that can solve the knapsack problem with provable optimality. This Java
code implements branch and bound for the 0/1 knapsasck problem using Lagrangian relaxation to prune the search
tree. It uses best first search and is able to solve moderately large instances quite fast (like 2000 variables).
- Monty Hall Simulation (Java)
A simulation of the Monty hall paradox in probability theory.
Education:
- M.S. Applied and Computational Mathematics, Johns Hopkins University
- M.S. Computer Science, University of Southern California
- Computational Linguistics, University of Washington
Graduate Courses:
- Statistical Methods
- Real Analysis
- Computational Statistics
- Stochastic Differential Equations
- Matrix Theory
- Monte Carlo Methods
- Bayesian Statistics
- Probability and Stochastic Processes I and II
- Data Mining
- Natural Language Processing I and II
- Quantitative Social Science Methods
- Software Engineering
- Analysis of Algorithms
- Compilers
- Programming Languages
Some Links: