Data Scientist III · Tucson, AZ

Vikas
Pal

Applying machine learning and data science to solve real clinical and industry problems — from predicting transplant outcomes to deploying models at scale.

1+
Publication
2+
Years clinical research
6+
Years in industry
Clinical AI
ML & Data Science focus
Tucson, Arizona · Open to remote & relocation

Turning complex
data into decisions

I'm a Data Scientist specializing in applying machine learning to high-stakes domains — particularly clinical medicine, where predictive accuracy has real consequences for patients.

My research and applied work spans image processing, data extraction pipelines, and predictive modeling. I'm published in Kidney International Reports (Elsevier), where my work focused on predicting post-transplant outcomes using ML — work that directly informs organ allocation decisions.

I'm actively seeking opportunities to collaborate with research teams, industry partners, and organizations building data-driven solutions at scale.

GitHub ↗ LinkedIn ↗ Google Scholar ↗ ORCID ↗
Core skills
Machine Learning Data Science Clinical AI Image Processing Data Extraction Predictive Modeling
Languages & tools
Python SQL R scikit-learn pandas NumPy Git Jupyter
Domains
Nephrology & Transplant Healthcare ML Research & Analysis
Education

Academic background

2023
Master of Science · Business Analytics & Data Science
University of Texas, Dallas
Specialization in machine learning, statistical modeling, and large-scale data analysis.
2022
Graduate Certificate · Applied Machine Learning & Data Science
University of Texas, Dallas
Courses on Advanced statistics for Data Science, Applied Machine Learning, Applied Deep Learning and Applied Natural Language Processing.
2016
PG Certificate · Big Data Analytics & Data Science
SP Jain School of Global Management, Mumbai
Advanced training in big data technologies, analytics, and applied data science methodologies.
2013
Bachelor of Engineering, IT
University of Mumbai
Foundation in quantitative and analytical thinking across science and mathematics.
Research

Publications

Kidney International Reports · Elsevier · 2025 Peer Reviewed · Open Access

Predicting Simultaneous Heart Kidney Allocation and Posttransplant Adverse Kidney Outcomes

Mutlu Mete, Mehmet U.S. Ayvaci, Ahmet B. Gungor, Faris Araj, Deepak Acharya, Benjamin Hippen, Xingxing S. Cheng, Miklos Z. Molnar, Tarek Alhamad, Enver Akalin, Neeraj Singh, Prince M. Anand, Gaurav Gupta, Matthias Peltz, Venkatesh K. Ariyamuthu, Abd A. Qannus, Iyad S. Mansour, Maryam Emami, Vikas Pal, Bekir Tanriover

DOI: 10.1016/j.ekir.2025.10.005 Manuscript ID: KIR-03-25-0356.R1
Transplantation & Transplant Surgery · Forthcoming In Review

Next publication in progress

Submitted to editorialoffice@journal.tts.org

Details to be announced upon acceptance
Work

Projects

Clinical & Research
01 —

Medicare Claims Cost Analysis

Led comprehensive analysis of kidney transplant Medicare claims using the U.S. Renal Data System (USRDS), spanning 2000–2020 across 100,000+ patients. Identified cost reduction strategies with potential to save the government millions in healthcare spending.

USRDS SQL Power BI HIPAA Cloud
100,000+ patients · 2000–2020 dataset · government cost reduction identified
02 —

Billion-Row Transplant Hospitalization Analysis

Managed and analyzed over 1 billion rows of pre- and post-transplant hospitalization and physician supply claims in a HIPAA-compliant cloud environment. Insights are being used to understand drug consumption impact, optimize donor-recipient matching, and improve patient quality of life.

Power BI Cloud (HIPAA) SQL
1 billion+ rows · donor-recipient optimization · cost-effectiveness insights
03 —

PDF Extraction for Pre/Post-Transplant Patient Data

Extracted and analyzed data from PDF reports containing both computer-generated text and handwritten clinical notes. Data covered pre-transplant and post organ-failure tests, including blood pump metrics, heart rate, and oxygen monitoring at each timestamp.

PDF Extraction OCR Python Image Processing
Handwritten + digital records · patient recovery trajectory analysis
Banking — India's 3rd Largest Bank
04 —

Credit Risk Scorecard

Built a preemptive scorecard for credit card non-delinquent customers to optimize collection costs and restrict flow into higher delinquency buckets. Used bureau and demographic variables with roll rate and vintage analysis to define target variables.

SAS Logistic Regression Roll Rate Analysis Vintage Analysis
Saved INR ~4.6M (~$55K) yearly in collection costs
05 —

Customer Acquisition Predictive Scorecard

Deployed logistic regression in SAS to identify potential customers likely to buy a product. Validated accuracy metrics, performed stress testing, and monitored model stability over time. Targeted top-3 deciles via scorecard-driven campaigns.

SAS Logistic Regression Decile Analysis
~6% customer base growth in 2 months
06 —

Cross-Sell Opportunity Identification

Identified cross-selling opportunities among existing customers through customer profiling and clustering based on product usage patterns. Surfaced actionable segments for targeted outreach.

Clustering Customer Profiling SQL
INR ~5M (~$60K) additional annual business opportunity identified
07 —

Digital Payment Gateway Impact Analysis

Analyzed digital payment gateway customers across POS, web-based payments, and third-party apps, benchmarked against mobile and net banking users. Quantified digital channel value to inform investment decisions.

SQL Power BI Cohort Analysis
40% increased funding secured for online gateway businesses
Analytics Consulting — Tech & Gaming Clients
08 —

Revenue Recovery via Statistical Modeling

Built a linear regression model for a major gaming client to pinpoint causes of revenue decline. Developed A/B testing strategies for Android and iOS users to validate and deploy interventions at scale.

Linear Regression A/B Testing Python Periscope
~$4M monthly savings · $1K per 4,000 subscribers
09 —

Sentiment & Consumer Analysis via Twitter API

Leveraged twitteR in R and SQL on Redshift to generate 360° client-centric metrics — interest segments, marketing buckets, sentiment scores, and peak active hours — for a major e-commerce client.

R Twitter API Redshift SQL NLP
2x increase in targeted consumer touch base
Get in touch

Open to research, consulting, and collaboration