Work – Sajeeva Salgadoe

Selected Work

Real Estate Value Estimator

In this work, a prediction model to estimate the real estate value based on critical attributes including location, number of bedrooms, bathrooms, size of the property and the lot size is developed. The primary objective of this model is for potential sellers to evaluate the house value without intervention of a real estate agent.

The prediction model is developed using Python Scikit-Learn libraries. Prediction models performs a periodic updates using most recent real-estate data. Serialized prediction models are used by Django to provide interactive interface to the Apache web server. JavaScript, GoogleMaps API, D3 and WordPress templates are used to provide user-friendly experience to the users. Python Scrapy framework is used to collect real-estate data used in this work.

Frameworks & Software used in this project

Python
Django
Apache Web Server
Java Script
GoogleMaps API
D3
Python Scrapy

More details about the project and a demo can be found in the following links.

Demo Request

Service Availability Predictor

Fixed wireless Internet service availability of a particular location is dependent on several factors including, location, geographical data such as elevation, the surrounding and the area specific characteristics such as type of trees as well as RF noise in the location. Technicians have to do an onsite survey to determine the service availability of a particular location. In this work, a machine learning based prediction model is developed to determine the likelihood of service availability without doing an onsite survey. Several characteristics including available customers in the proximity, geographical data, and tower locations are used as input features to build a prediction model.

The application is developed using Python, Django, MySQL, Scikit-Learn, GoogleMaps API and JavaScript. The prediction process is fully streamlined and technicians are able to get a comprehensive report including propability of service availability, neighbourhood data, RF parameters of the location as well as the elevation path, once a potential customer submit a service availability request through the company web site.

Prediction model performs automatic re-training in regular intervals to provide most accurate estimations.

Frameworks & Software used in this project

Python
Django
MySQL
Java Script
GoogleMaps API
Scikit-Learn

More details about the project and a demo can be found in the following links.

Demo Request

Traffic Usage Calculator

This module is developed for an Internet service provider to monitor the customer usage patterns. Customers are able to monitor Internet traffic usage of their fixed-wireless highspeed connection in near-real time. Network Operations team also utilize these data/graphs to troubleshoot various issues including link utilizations, inteference and link-faults. Historical traffic data is used to build several prediction and regression analysis models to utilize in traffic engineering, infrastructure planning and troubleshooting. A near-realtime visual representation of traffic utilization of wireless links is developed using Javascript, D3, Django & Python. Data used in this work is extracted from customer premises devices (CPU) in near-realtime to produce more realistic results. The following software and frameworks are used to complete the project.

Python
MySQL
Django
Scikit-Learn
JavaScript
D3
Apache Web Server

Source code of critical elements and access to a demo, can be found in the following links

Demo Request

RF Property Monitor

RF Property Monitor provides valuable stats related to wireless radio links operating in both licensed and unlicensed frequencies. The project comprises of multiple sub-tasks pipelined together to exchange data between them. SNMP and low-level programming methods are used to retrieve wireless data from 1000+ nodes in near-realtime. Formatted data is fed into Django for real-time processing and a web interface is developed to produce graphical interpretation of the findings. The processed data is stored in a structured database for prediction modeling, troubleshooting, infrustructure maintenance and regression analysis. Similar to the previous project, Python, MySQL, Django, SciKit-Learn, Javascript and D3 are used to develop the project.

Link to the source code and a demo access can be found below.

Demo Request

News Aggregator

News Aggregator collects news items exclusively from HTML sources. The application uses multi-threaded scraping mechanism to collect text content of detailed news items from a list of selected website. This model is also capable of providing statistical analysis including most discussed words/phrases. The objective of this project is to design a NLP model to create news items based on frequently used words/phrases of the day and predict potential trends based on historical events. The following software packages & frameworks are used in this project.

Scrapy
Python
MySQL
PyTorch
JavaScript
D3
Apache/Django

Source code and a demo request can be found in the following link.

Demo Request

Emergency Alerts Generator

Emergency Alerts provide valuable safety information to travellers. The objective of this project is to develop a machine learning based Emergency alerts severity prediction model. In the first stage of this project, emergency alerts are collected from various resource including RSS Feeds, Web sites and other APIs. Subsequently, emergency alerts are manually labelled for the corresponding category and the severity. There are number of categories such as political unrest, severe weather, terrorist attacks, riots, health epidemics, accidents and communication interruptions are used in alert classification. Several software modules including

PHP
Python
PostgreSQL
Beautifulsoup
Apache

are used in this project. Source code related to some critical elements can be found in the following link.

Contact Extractor

Diplomatic representatives contacts including postal addresses, telephone numbers and email addresses of embassies, consulates and high commissions are stored in different formats such as text, html and PDF. The Contact Extractor was developed for an insurance company specializing in travel insurance to collect most accurate contact information of diplomatic representatives. The application has been developed using

Java
Python
PHP
PDF Parser
PostgreSQL
Beautifulsoup
Apache

Restricted portion of the source code and a sample of extracted data can be found in the following link.

Data SAmple

EHR Extractor

Electronic Health Record (EHR) contains medical history of a particular patient. However, some legacy systems are unable to provide electronic data of a particular test. In this work, data extraction module is developed to collect over 50 attributes of a cardiovascular test from a legacy cardiovascular instrument. Extracted data is further analyzed using machine learning methods to discover certain health anomalies of a person. The objective of the prediction model is to minimize radioactive exposure by reducing use of MRI, UltraSound, X-ray and other harmful screening techniques. The application is developed using Java programming language.

Restricted portion of the source code and a extracted data sample can be found in the following link.

Data Collection Projects

A majority of my recent work are related to predictive and statistical analysis and they are heavily dependent on reliable and accurate data. However, raw data is represented in various formats and different techniques had to be utilized to transform into usable application friendly inputs. A few different data collection methods used in various projects are listed below.

Auto Inventory Data

In this work, vehicle listing data is collected from a well known auto resale web portal. Web scrawler application is developed using Python Scrapy to navigate through vehicle listing pages recursively to collect all available information including auto make, model, condition, year, mileage, body type, drivetrain, fuel type, interior/exterior color, engine, transmission, doors and seats. A number of experiments including prediction model to approximate vehicle resale value based on above mentioned attributes and descriptive analytics such as market trends and demands are investigated. (This data is exclusively used for academic research and auto inventory data is not used in any commercial application)

Source code for web crawler module and a sample set of data extracted from the listing can be found in following links

Data SAmple

Real Estate Data

Real Estate data collected from north American housing market has been used in prediction and regression analysis. Web crawler application has been developed to scrape real estate listing site dynamically to retrieve most upto date property values. Python Scrapy module has been used to build a multi-threaded model to recursively travel through the listing site to collect most data related to each property.

Source code and a sample set of listing data can be obtained by following the below links.

Data SAmple

News Crawler

News Crawler is a web scraping application designed using Python Scrapy module. Multi-Threaded agent scrape through pre-defined set of URLs recursively to extract only the text content of each news item. Extracted unique news items are stored in a SQL database for further analysis.

Source code and a sample set of listing data can be obtain by following below links

Data SAmple

PDF Extractor

This project has been designed to extract printed copies of sales invoices of vehicle transactions. Scanned documents are stored in PDF format and Python based PDF scraper libraries are utilized to extract transaction details including VIN number, dates, costs, vehicle details and contact information. Object of this project was to design a machine learning model to identify specific values such as VIN, customer contact and sales price from different types of invoices.

Experimental Work

A number of experimental work has been done as part of my PhD research. Some of the interesting attempts related to machine learning and deep learning can be found in the following list.

Back Propagation

In machine learning, back propagation is used to compute and distribute the contributed cost/error of each controllable variable including weights and biases. In this work, the back propagation mechanism is developed from scratch in a feed forward neural network with various optimization and cost functions without utilizing any machine learning libraries.

Support Vector Machine

Support Vector Machines (SVM) is used heavily in machine learning to solve various classification problems. SVM is considered as a constraint optimization problem and in this experiment, SVM is implemented using Python (without using any ML libraries) to understand it’s behaviour.

K Nearest Neighbors

K-Nearest Neighbors (KNN) algorithm is a non-parametric algorithm mainly used in classification problems. In this work, KNN is implemented from scratch to understand the computations of this algorithm.

Classification model based on Bayes Theorem

Prediction models based on Bayes theorem are non-parametric probabilistic models, which are able to produce fast, but hightly accurate predictions. In this research, a Naive-Bayes prediction model is developed using Python inherited libraries.

Mean Shift

Similar to KNN, Mean Shift is a non-parametric feature-space analysis technique for locating the maxima of a density function. Mean-shift is a well known cluster analysis algorithm and it can be used in data preprocessing. In this work, Mean shift is implemented from scratch to understand the computation of maxima of a density function.

Feed Forward Neural Networks

Feed Forward Neural Nets (FNN) is a basic Neural Network model with multiple layers and multiple nodes in each layers. In this research, different aspects of neural networks including, activation functions, cost functions and different optimization techniques are implemented without using any machine learning libraries.

Recurrent Neural Networks

Recurrent Neural networks (RNN) are used to build prediction models with sequential and time series input data. In this work, a skeleton of a RNN is designed using Python to understand the RNN and the back propagation process associated with the algorithm.

Deep Learning experimental work with Pytorch

Pytorch has provided a number of libraries to build deep learning models. Capabilities of Pytorch is examined using publicly available data sets related to computer vision and NLP. Some of the experimental work related to CNN, RNN, LSTM and GRUs can be found in the following link.

Experimental work with TensorFlow/Keras

Similar to Pytorch, TensorFlow provides a rich set of libraries to work with deep models. TensorFlow also provides an interface to utilize user-friendly Keras high-level libraries. Some experiments related to CNN, RNN and other deep learning algorithms utilizing TF native libraries and Keras can be found in the following link.

Deep Learing experimental work with Apache Spark

Apache Spark is a cluster computing framework which provides machine learning libraries to work with unstructured big data sets. In this portion of research, capabilities of Spark in machine learning and streaming as well as supported frameworks such as Hadoop Distributed File System are investigated. Some interesting work related to Apache Spark can be found in the following link.

DEMO Access Request Form

Please complete the following form for access to a functioning version of the application. Most applications are operating in real-data and the protection of data privacy is highly appreciated.