Howdy, I'm Aditya Kumar

Software Engineer in Seattle, WA

About me!


I am from New Delhi, India and currently work as Software Engineer at Microsoft in Redmond, Washington. I have over 2 years of experience as a software engineer and have previously worked with Amazon (SDE Intern, 2019) and Sandvine Technologies.

In May 2020, I graduated from Texas A&M University with a degree in Master of Computer Science (MCS). I completed my bachelor's degree in Computer Science in 2016 and after that, I worked as a software developer for two years. During that time I worked on problems involving internet traffic classification, optimization and management.

Although the field of cloud computing and Machine Learning/AI have piqued my interest over the last few years, I strongly believe that all fields in Computer Science are extremely interlinked, and to have a successful career one must develop skills in all the areas. I believe my academics and professional experience so far has prepared me well to work with big and extremely complicated systems efficiently. I hope to keep developing my skills and use them to solve tough problems with profound business value.

In my spare time, I enjoy cooking and recently started cultivating the habit of reading books. I also love watching, playing and talking about Cricket.

My Journey in this Tech world!

Social Profiles


Linked-In

Join my professional network, check out my skills and see what my peers have to say about me

GitHub Repo

Do visit my github repo to check out all the projects that I have worked on.

Goodreads

Books I've been reading or read!

My Journey so far!


  • June 2020 - Present

    Software Engineer at Microsoft

    I am part of Visual Studio team and responsible for developing numerous production services like Dev Spaces, Azure Container Registry, Azure Redis Cache and Azure Dev/Test Labs. The role focuses on cloud services, web UI, data processing, and analytics.

  • June 2019 - August 2019

    Software Development intern at Amazon

    • Developed and launched highly-scalable internal service (1000 TPS) based on service-oriented architecture (SOA) using various AWS technologies like DynamoDB, Lambda, S3, etc. (Java, Python, SQL, shell).
    • Automated the data migrations using distributed data pipelines and job schedulers.
    • Enhanced the capability of service by allowing it to work with various data providers like files and S3 objects.
    • Built a responsive single-page application (SPA) in React.js for the service which is used for data analytics by various stakeholders.

  • August 2018 - May 2020

    Master of Computer Science (MCS) from Texas A&M University

    • GPA: 4.0/4.0
    • Worked as Graduate Assistant (GA) in the Department of Information Technology
    • Served as Treasurer for Computer Science Graduate Student Association (CSEGSA)
    • Major Coursework: Cloud Computing, Artificial Intelligence, Machine Learning, Information Retrieval, NLP

  • June 2016 - July 2018

    Software Engineer at Sandvine Technologies

    • Automated parameter calibration in fuzzy control system by developing service using C++ capable of monitoring network traffic over 100,000 locations.
    • Designed REST APIs for traffic shapers in C++ enabling dynamic policy enforcements without system reloads. This saves 9 hours (average) of maintenance time per month.
    • Developed hash map and timers based internet traffic classification mechanism in C++ improving identification of applications that rely on third-party services by 90% (on average).

  • July 2012 - May 2016

    Bachelor of Technology from NIT Calicut, India

    • Major : Computer Science and Engineering
    • GPA: 9.37/10.0
    • Final Project : Kernel based simulation study of RSDM-A: Enhanced memory management unit of Xen hypervisor by adding in-memory table to prevent rollback attacks and published the research in IEEE ISCBD 2107
    • Major Coursework: Data structures, Operating systems, Compiler design, NLP

Projects


Languages and Technologies!


Languages: C++ (Proficient), Java (Prior Experience), C, Python, MySQL, Shell Scripting
Cloud Services: Dynamo DB, S3, Lambda functions, Elasticsearch
Web Technologies: HTML, JavaScript, jQuery, PHP, CSS
Tools: Git, GitHub, PyTorch, GNU Debugger, Lex, Yacc, Laravel, Rational ClearCase

Download My Resume Here!

Download Now!

Let's Get In Touch!


Deep Person Re-Identification


For making sense of the vast quantity of visual data generated by the rapid expansion of large-scale distributed multi-camera systems, automated person re-identification is essential. Person re-identification, a tool used in intelligent video surveillance, is the task of correctly identifying individuals across multiple images captured under varied scenarios from multiple cameras. Solving this problem is inherently a challenging one because of the issues posed to it by low-resolution images, illumination changes per image, unconstrained pose and occlusions.


In this project, we aim at developing a Person re-identification model using Deep Neural Networks (DNN) which can handle variable size input images. Specifically, we aim at implementing two preprocessing techniques, which reduce the chances of overfitting i.e. we aim to make our model robust to occlusion using Random Erasing, a data augmentation technique, and reduce the influence of pose variations on features using a Pose normalized Generative Adversarial Network (GAN).
Along with this we also aim to implement and integrate Part-based Convolutional Baseline (PCB) to further improve on the results. We briefly describe the models trained along with their evaluation results on Market1501 dataset and provided validation and test sets.

Reverse Image Captioning


This task is just the reverse of image captioning task. Here, we are trying to generate the images that best fit the textual description.


Here's the sample output of the network.

YouTube Video showing use of GUI

YouTube Video Demonstrating the GUI


Architecture of Generative Adversarial Network (GAN)

Base code for the DNN model can be found below:
Base code reference
The author's version


The base model is based on an earlier paper - Generative Adversarial Text to Image Synthesis. The model described in the paper uses a pre-trained text embedding, trained using character-level processing (Char-CNN-RNN), which was learned using images and corresponding description together. The resulting embedding is a vector of size 1024. In my work, I have replaced this character-level encoding with much more robust Skip-Thought Vectors(Code). These vectors encode the text description of images into a vector of 4800 dimensions. I use a reducer layer which takes this big vector and returns a vector of 1024 dimensions. This final vector is then used in the GAN. The parameters of this layer are learned during the training.

Following is the diagram showing high-level design of the neural network:

Amazon Elasticsearch Service


What is Elasticsearch?

Elasticsearch is a search engine based on the Lucene library which uses BM25 under the hood. Apart from being open-source, it is distributed in nature meaning it can scale quite easily and works with schema-free JSON documents. To learn more about Lucene and opensource Elasticsearch please do look at the detailed Lucene and Elasticsearch Spotlight by Minreng.


Amazon Elasticsearch Service

AWS Elasticsearch Service is a fully managed service making it easier for developers to deploy Elasticsearch cost-effectively at scale. It also provides various security features that you can use to make your APIs secure. The service provides support for open-source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying.


Benefits

Easy to deploy and manage - Be ready in minutes!
Highly scalable and available - Upto 3 PB data in a single cluster
Highly secure - Isolation using Amazon virtual private cloud (VPC) and encryption
Cost-Effective - pay for what you use
Perform search in near real-time - slight latency(~1sec) to index a document

Disadvantage

Elasticsearch does not have multi-language support
Works only with JSON unlike Apache Solr which can work with CSV, XML and JSON formats

Key Concepts

Node - single instance running Elasticsearch
Cluster - a collection of one or more nodes. It is responsible for providing indexing and search capabilities.
Index - a collection of different type of documents and their properties
Document - collection of fields. Basic unit of information that can be indexed

Comparison with Relational database

Elasticsearch RDBMS
Cluster Collection of Database
Shard Shard
Index Database
Type Table
Mapping Schema
Field Column
Document (JSON Object) Row
ID Primary Key

Detailed Notebook

https://nbviewer.jupyter.org/github/aditya30394/Spotlight-on-AWS-Elasticsearch/blob/master/Elasticsearch.ipynb

Decentralized Howdy: A decentralized application for student record storage


As students move from one learning environment to another, it is essential that they carry with them the proof of their previous learning experiences and achievements. In most cases, this is expressed in the form of grade cards or certificate of completion and as such, academic records and transcripts play a vital role in the lives of every student. The official records showing the courses taken and the respective grades earned are continuously sought by recruiters as well as graduate schools. These grade cards are used both by the academia and industry for enrollment and administrative decisions, but most importantly used for verifying the credentials. Traditionally, these records are mostly stored either as documents or as unofficial copies on centralized servers. While these may be sufficient for decision-making procedures, they bear a consequence of delaying the on-boarding process. This is because, to obtain an official copy, students are required to place an application with the academic department, and the whole process usually takes about 10 days. The delay is primarily caused because the records are validated at the source, which usually follows a convoluted process. The challenge is to offer a platform upon which all of a student’s credentials can be verified readily without actually approaching the source. This expedites the time to obtain the official records.


Approach and Contribution

We propose to use the block-chain technology as the back end to solve the above-mentioned problem. The primary attraction of block-chain is that it offers immutable ledgers and we intend to leverage this property. The fact that block-chain data is tamper-proof provides it with built-in validation as an independent, transparent and permanent database. The aim of the project is to build a decentralized application for student record storage. A decentralized application (dapp for short) refers to an application that is executed by multiple users over a decentralized network.Bitcoin is the first generation of block-chain and the transactions involved currency. Ethereum can be seen as the next generation wherein it offers programmable transactions in the form of smart contracts. In this project, we use smart contracts to associate students with their official records and use the inherent properties of block-chain to make the records tamper proof. With this application, individuals can access their records from virtually any device. The records cannot be altered by students, but the application provides them with a control to exactly show what they want to various groups of people. We also designed a minimalisitc web-based interface which facilitates students to check their records and the department/registrar to add or modify the student records.

To summarize, our two important contributions are:
1. Provide a decentralized application to eliminate the delay in obtaining official records
2. Provide a mechanism to make the records tamper-proof so that they can be trusted


Smart Contract

Our smart contract has two map based data structures:

  1. studentRecords: This data structure maps the student IDs to their respective data. The data is another library structure (similar to struct in C language) containing hash of student record, their password and a boolean flag indicating if the record is valid or not. This field is required in Solidity because a mapping declaration creates a namespace in which all the keys exist and the values are initialized to 0 or false. As a result we don't have a direct way of finding if a key exists or not. We set this boolean flag to true when the department implants a new student record in the block chain.

  2. authList: This is another map that just maps the authentication IDs to a boolean flag. Solidity does not provides a Set data structure and the only other alternative for us was to use array to store the authentication IDs. However, iterating through arrays was increasing the gas usage. Therefore to increase the efficiency of lookup operation, we decided to use another map to store the IDs. Again, due to the reason explained above, we use a boolean flag to indicate the presence of valid ID. These IDs are setup when the web app gets started. This is done by a call to constructor in the smart contract.

Geeky WhatsApp


A multithreaded client-server architecture based Chat Application using Java Socket programming. A server continuously listens for connection requests from clients across the network or even from the same machine. Clients connect to the server using an IP address and port number. The client needs to provide a unique username while connecting to the server. This username is treated as the unique identifier for that client.

All the messages from the client are sent to the server using ObjectOutputStream in java. After receiving the message from the client, the server broadcasts the message if it is not a private message. And if it is a private message, which is detected using ‘@’ followed by a valid username, sends the message only to that user.

All the messages sent from various clients can be seen on the server console.


Screenshots

In the screenshots below, three clents namely "Aditya", "Abhishek" and "Himanshu" are active on the chat server.

Console of the Chat Server

Console of user Aditya

Console of user Himanshu

Kick it up!


In a world filled with brilliant entrepreneurs eager to make their ideas a reality, finding the funding necessary for such initiatives can be a challenging endeavour. Crowd-funding websites such as Kickstarter provide a way for savvy entrepreneurs to leverage the collective resources of the crowd to obtain the necessary capital to get their projects started. Kickstarter's core function is to act as a global crowd-funding platform where users (also known as backers) are offered tangible rewards or experiences in exchange for pledges of monetary support. While Kickstarter publishes many articles outlining best-practices for project creation, large scale analysis has shown that approximately half of all campaigns still fail.

We believe that meaningful indicators exist that can predict whether a project is more likely to get successfully funded as compared to others. By identifying these indicators, it is our goal to be able to provide insights into what types of decisions project creators should focus their scarce time and resources on. We believe that project creators across all categories would be able to use this information to inform their decisions and would both desire and benefit from these insights.

We feel that there are many ideas that exist in the world that, for one reason or another, do not have the financial resources or social visibility to be recognized by the public. It is our hope that by providing information to project creators about what features are most predictive of a successful campaign that all good ideas will be given equal representation and net social good will be promoted through the flourishing of novel innovations.


Questions to Answer?

Through group discussion and previous literature review, we have determined that there are 4 main questions that we wish to address through this work.

  1. Perform sentiment analysis on both the project 'blurb' and 'name' sections of each project to determine if there is any relationship between the predicted sentiment and the success or failure of the project.

  2. Determine if there is a specific time of year when projects can be launched where they are more likely to be funded.

  3. Find out whether projects with "small" goal amounts are more likely to be successful as compared to projects with "large" goal amounts.

  4. See if duration impacts whether a project will be successful or not. We want to study the impact of various durations like the time gap between the creation of project of Kickstarter and its official launch date - when the owners start accepting funds. We also study the impact of campaign length which is the difference between the launch date and the deadline - date on which owners stop accepting the funds.

Multikernel Simulation: A New Approach to Study Rollback Sensitive Memory Architecture


In today's cloud centred business environment, security of cloud solutions is a critical issue. Since virtualization is the foundational element of cloud computing and helps to achieve the benefits of cloud computing, security from virtualization becomes a major goal for the cloud based systems.

Virtualization aims to create virtual versions of resources such as processors, memory, storage, network interfaces and devices for virtual machines (VMs), allowing the same set of resources to be shared among various VMs so that they can run together on the same hardware without knowing about each other's presence. Despite its indisputable benefits, data security vulnerabilities and performance degradation from the user's viewpoint, remain the main cause of concern, motivating fervent research involving hardware and software improvements for alleviating the two concerns. However, implementation and proper testing of these innovations is not easy with hardware, hybrid or API based software simulators, due to high implementation costs, absence of simulators capable of testing solutions that span multiple levels of hardware and software, and the different privilege levels of instructions. This is more relevant when the improvements proposed include instruction set modifications at different privilege levels.

In this paper, we propose a different simulation approach - multikernel simulation approach. This is different from the conventional software-based simulation techniques in that we utilize the different privilege levels of the various kernels running on the server, and leverage them to distribute the components and logic that need to be simulated, into different levels of software, to simulate the effect of using it on the required privilege level. To accomplish this, we identify unused bits in the kernel software and use them to simulate hardware conditions. We implement and demonstrate this simulation technique for the Extended- HyperWall and RSDM architecture [1], [2], which is a hardware-based solution to improve the security of virtual machines in a fully virtualized environment, in the presence of an untrusted hypervisor (an entity that manages VMs), against rollback based attacks. Our simulation works in a fully virtualized environment and demonstrates the security of the proposed enhancement, without hardware prototypes, in a cost effective manner.

Self Evaluation Portal


The motivation behind the Self Evaluation project is to help students in assessing their preparation for the graduate-level Artificial Intelligence course. This project can assist students in understanding the course prerequisites. By using this evaluation portal, the students can make an informed decision as to whether they are sufficiently prepared to register for the class. The requirements for the project were provided by Dr. Duncan Walker, Professor, Texas A&M University. The key stakeholders of this project are prospective students, teaching assistants, and the course instructors.


Features Supported

Instructors
1. Can login and add/edit/update questions and topics
2. Can choose between a multiple-choice question or short answer type question
3. Can add math in the questions
4. Can recover passwords

Students
1. Can take the quiz and evaluate their performance using the feedback and remarks generated at the end
2. Can select the topics they want be tested on
3. Can go through the questions like flashcards

Important Links

Heroku App
Customer Interview
Application Demo
Pivotal tracker

Toxic Content Detection On Social Media Platforms


Cyberbullying and online abuse have been continuously rising at an alarming rate. This is detrimental to the mental state of teenagers and one of the biggest factors leading to depression. Manually determining the toxicity of the comments in the plethora of data generated on a daily basis is an impossible task. Automating the detection and censorship of such toxic comments by the social media platforms can go a long way in solving this issue. But detection of toxic comments is a very daunting task because of various factors such as context, perception, vocabulary etc.

In this work, we propose including the human element to the system to work with along with the automated model to improve the results by requesting feedback. We are using a recurrent neural network (RNN) with a long-short term memory cell (LSTM) and a convolutional neural network (CNN).

Additionally, we use the pre-trained FastText embedding from Facebook and Global Vectors for Word Representation (GloVe) embedding as the backbone of the automated system and introduce the human feedback in the recurring training. Instead of doing a binary classification of comments, we will rather provide a range. In our human feedback system as well, we would request feedback on a scale rather than a simple yes/no.


System Design