I am Aditya Kumar, a results-driven software engineer currently based in Redmond, Washington. My journey in software engineering began in New Delhi, India, and has spanned over seven years, including experiences at Amazon and Sandvine Technologies. In 2020, I earned my Master of Computer Science (MCS) degree from Texas A&M University, building upon my Bachelor's degree in Computer Science from 2016. My academic background, coupled with my professional endeavors, has equipped me to tackle complex challenges within large-scale systems efficiently.
At Microsoft, I contribute to the GitHub Codespaces team, where I'm deeply passionate about crafting and implementing robust, scalable, and dependable software solutions. I thrive in collaborative settings, finding immense satisfaction in mentoring junior developers, sharing my knowledge, and continuously expanding my own skillset through shared learning experiences.
Throughout my time at Microsoft, I've had the privilege of making substantial contributions to the Codespaces service, directly enhancing the user experience and driving business growth. Some of my key achievements include significantly reducing infrastructure costs, spearheading the migration of critical components to the Basis infrastructure, and consistently championing performance enhancements across various service elements. I'm unwavering in my commitment to technical excellence, demonstrably proactive in mitigating issues, enhancing security, and fostering a culture of continuous learning within the team.
While my technical expertise lies at the heart of my work, I also recognize the profound impact of a positive and inclusive team environment. I actively mentor junior developers, providing guidance and support to empower their growth and foster a sense of shared ownership.
Looking forward, I'm excited to embrace new challenges and contribute to the ongoing evolution of the Codespaces service, ensuring its continued success in empowering developers globally.
As a Senior Software Engineer at Microsoft, I worked on building scalable and reliable cloud-based solutions, contributing to GitHub Codespaces and leveraging various Azure services. I focused on designing automation systems, optimizing performance, and improving operational efficiency through cross-team collaboration.
Languages: C++ (Proficient), Java (Prior Experience), C, Python, MySQL, Shell Scripting
Cloud Services: Dynamo DB, S3, Lambda functions, Elasticsearch
Web Technologies: HTML, JavaScript, jQuery, PHP, CSS
Tools: Git, GitHub, PyTorch, GNU Debugger, Lex, Yacc, Laravel, Rational ClearCase
For making sense of the vast quantity of visual data generated by the rapid expansion of large-scale distributed multi-camera systems, automated person re-identification is essential. Person re-identification, a tool used in intelligent video surveillance, is the task of correctly identifying individuals across multiple images captured under varied scenarios from multiple cameras. Solving this problem is inherently a challenging one because of the issues posed to it by low-resolution images, illumination changes per image, unconstrained pose and occlusions.
This task is just the reverse of image captioning task. Here, we are trying to generate the images that best fit the textual description.
Here's the sample output of the network.
YouTube Video showing use of GUIBase code for the DNN model can be found below:
Base code reference
The author's version
The base model is based on an earlier paper - Generative Adversarial Text to Image Synthesis. The model described in the paper uses a pre-trained text embedding, trained using character-level processing (Char-CNN-RNN), which was learned using images and corresponding description together. The resulting embedding is a vector of size 1024. In my work, I have replaced this character-level encoding with much more robust Skip-Thought Vectors(Code). These vectors encode the text description of images into a vector of 4800 dimensions. I use a reducer layer which takes this big vector and returns a vector of 1024 dimensions. This final vector is then used in the GAN. The parameters of this layer are learned during the training.
Following is the diagram showing high-level design of the neural network:
Elasticsearch is a search engine based on the Lucene library which uses BM25 under the hood. Apart from being open-source, it is distributed in nature meaning it can scale quite easily and works with schema-free JSON documents. To learn more about Lucene and opensource Elasticsearch please do look at the detailed Lucene and Elasticsearch Spotlight by Minreng.
AWS Elasticsearch Service is a fully managed service making it easier for developers to deploy Elasticsearch cost-effectively at scale. It also provides various security features that you can use to make your APIs secure. The service provides support for open-source Elasticsearch APIs, managed Kibana, integration with Logstash and other AWS services, and built-in alerting and SQL querying.
Elasticsearch | RDBMS |
---|---|
Cluster | Collection of Database |
Shard | Shard |
Index | Database |
Type | Table |
Mapping | Schema |
Field | Column |
Document (JSON Object) | Row |
ID | Primary Key |
https://nbviewer.jupyter.org/github/aditya30394/Spotlight-on-AWS-Elasticsearch/blob/master/Elasticsearch.ipynb
As students move from one learning environment to another, it is essential that they carry with them the proof of their previous learning experiences and achievements. In most cases, this is expressed in the form of grade cards or certificate of completion and as such, academic records and transcripts play a vital role in the lives of every student. The official records showing the courses taken and the respective grades earned are continuously sought by recruiters as well as graduate schools. These grade cards are used both by the academia and industry for enrollment and administrative decisions, but most importantly used for verifying the credentials. Traditionally, these records are mostly stored either as documents or as unofficial copies on centralized servers. While these may be sufficient for decision-making procedures, they bear a consequence of delaying the on-boarding process. This is because, to obtain an official copy, students are required to place an application with the academic department, and the whole process usually takes about 10 days. The delay is primarily caused because the records are validated at the source, which usually follows a convoluted process. The challenge is to offer a platform upon which all of a student’s credentials can be verified readily without actually approaching the source. This expedites the time to obtain the official records.
To summarize, our two important contributions are:
1. Provide a decentralized application to eliminate the delay in obtaining official records
2. Provide a mechanism to make the records tamper-proof so that they can be trusted
Our smart contract has two map based data structures:
studentRecords: This data structure maps the student IDs to their respective data
. The data is another library structure (similar to struct in C language) containing hash of student record, their password and a boolean flag indicating if the record is valid or not. This field is required in Solidity because a mapping declaration creates a namespace in which all the keys exist and the values are initialized to 0 or false. As a result we don't have a direct way of finding if a key exists or not. We set this boolean flag to true when the department implants a new student record in the block chain.
authList: This is another map that just maps the authentication IDs to a boolean flag. Solidity does not provides a Set
data structure and the only other alternative for us was to use array to store the authentication IDs. However, iterating through arrays was increasing the gas usage. Therefore to increase the efficiency of lookup operation, we decided to use another map to store the IDs. Again, due to the reason explained above, we use a boolean flag to indicate the presence of valid ID. These IDs are setup when the web app gets started. This is done by a call to constructor in the smart contract.
A multithreaded client-server architecture based Chat Application using Java Socket programming. A server continuously listens for connection requests from clients across the network or even from the same machine. Clients connect to the server using an IP address and port number. The client needs to provide a unique username while connecting to the server. This username is treated as the unique identifier for that client.
All the messages from the client are sent to the server using ObjectOutputStream in java. After receiving the message from the client, the server broadcasts the message if it is not a private message. And if it is a private message, which is detected using ‘@’ followed by a valid username, sends the message only to that user.
All the messages sent from various clients can be seen on the server console.
In the screenshots below, three clents namely "Aditya", "Abhishek" and "Himanshu" are active on the chat server.
In a world filled with brilliant entrepreneurs eager to make their ideas a reality, finding the funding necessary for such initiatives can be a challenging endeavour. Crowd-funding websites such as Kickstarter provide a way for savvy entrepreneurs to leverage the collective resources of the crowd to obtain the necessary capital to get their projects started. Kickstarter's core function is to act as a global crowd-funding platform where users (also known as backers) are offered tangible rewards or experiences in exchange for pledges of monetary support. While Kickstarter publishes many articles outlining best-practices for project creation, large scale analysis has shown that approximately half of all campaigns still fail.
We believe that meaningful indicators exist that can predict whether a project is more likely to get successfully funded as compared to others. By identifying these indicators, it is our goal to be able to provide insights into what types of decisions project creators should focus their scarce time and resources on. We believe that project creators across all categories would be able to use this information to inform their decisions and would both desire and benefit from these insights.
We feel that there are many ideas that exist in the world that, for one reason or another, do not have the financial resources or social visibility to be recognized by the public. It is our hope that by providing information to project creators about what features are most predictive of a successful campaign that all good ideas will be given equal representation and net social good will be promoted through the flourishing of novel innovations.
Through group discussion and previous literature review, we have determined that there are 4 main questions that we wish to address through this work.
Perform sentiment analysis on both the project 'blurb' and 'name' sections of each project to determine if there is any relationship between the predicted sentiment and the success or failure of the project.
Determine if there is a specific time of year when projects can be launched where they are more likely to be funded.
Find out whether projects with "small" goal amounts are more likely to be successful as compared to projects with "large" goal amounts.
See if duration impacts whether a project will be successful or not. We want to study the impact of various durations like the time gap between the creation of project of Kickstarter and its official launch date - when the owners start accepting funds. We also study the impact of campaign length which is the difference between the launch date and the deadline - date on which owners stop accepting the funds.
In today's cloud centred business environment, security of cloud solutions is a critical issue. Since virtualization is the foundational element of cloud computing and helps to achieve the benefits of cloud computing, security from virtualization becomes a major goal for the cloud based systems.
Virtualization aims to create virtual versions of resources such as processors, memory, storage, network interfaces and devices for virtual machines (VMs), allowing the same set of resources to be shared among various VMs so that they can run together on the same hardware without knowing about each other's presence. Despite its indisputable benefits, data security vulnerabilities and performance degradation from the user's viewpoint, remain the main cause of concern, motivating fervent research involving hardware and software improvements for alleviating the two concerns. However, implementation and proper testing of these innovations is not easy with hardware, hybrid or API based software simulators, due to high implementation costs, absence of simulators capable of testing solutions that span multiple levels of hardware and software, and the different privilege levels of instructions. This is more relevant when the improvements proposed include instruction set modifications at different privilege levels.
In this paper, we propose a different simulation approach - multikernel simulation approach. This is different from the conventional software-based simulation techniques in that we utilize the different privilege levels of the various kernels running on the server, and leverage them to distribute the components and logic that need to be simulated, into different levels of software, to simulate the effect of using it on the required privilege level. To accomplish this, we identify unused bits in the kernel software and use them to simulate hardware conditions. We implement and demonstrate this simulation technique for the Extended- HyperWall and RSDM architecture [1], [2], which is a hardware-based solution to improve the security of virtual machines in a fully virtualized environment, in the presence of an untrusted hypervisor (an entity that manages VMs), against rollback based attacks. Our simulation works in a fully virtualized environment and demonstrates the security of the proposed enhancement, without hardware prototypes, in a cost effective manner.
The motivation behind the Self Evaluation project is to help students in assessing their preparation for the graduate-level Artificial Intelligence course. This project can assist students in understanding the course prerequisites. By using this evaluation portal, the students can make an informed decision as to whether they are sufficiently prepared to register for the class. The requirements for the project were provided by Dr. Duncan Walker, Professor, Texas A&M University. The key stakeholders of this project are prospective students, teaching assistants, and the course instructors.
Cyberbullying and online abuse have been continuously rising at an alarming rate. This is detrimental to the mental state of teenagers and one of the biggest factors leading to depression. Manually determining the toxicity of the comments in the plethora of data generated on a daily basis is an impossible task. Automating the detection and censorship of such toxic comments by the social media platforms can go a long way in solving this issue. But detection of toxic comments is a very daunting task because of various factors such as context, perception, vocabulary etc.
In this work, we propose including the human element to the system to work with along with the automated model to improve the results by requesting feedback. We are using a recurrent neural network (RNN) with a long-short term memory cell (LSTM) and a convolutional neural network (CNN).
Additionally, we use the pre-trained FastText embedding from Facebook and Global Vectors for Word Representation (GloVe) embedding as the backbone of the automated system and introduce the human feedback in the recurring training. Instead of doing a binary classification of comments, we will rather provide a range. In our human feedback system as well, we would request feedback on a scale rather than a simple yes/no.
Social Profiles
Linked-In
Join my professional network, check out my skills and see what my peers have to say about me
GitHub Repo
Do visit my github repo to check out all the projects that I have worked on.
Goodreads
Books I've been reading or read!