Logo James's Peredutions
  • Home
  • About
  • Latest
  • Accomplishments
  • Selected Projects
  • Featured
  • All Posts
  • Tags
  • Search
  • Dark Theme
    Light Theme Dark Theme System Theme
Logo Inverted Logo
  • All Posts
  • Data and Analytics
  • Business and Consulting
  • Climate and Energy
  • Cognition and Learning
    • Book Notes
    • Course Notes
    • Other
  • Contemplations and Society
  • IT and Technology
    • How-Tos
    • Scripts
      • Bookmarklets
      • Userscripts
  • Projects
    • Android
    • Arduino
    • Python
    • Web
  • Cyber Security
    • Challenges
    • Operational Technology (OT)
    • Other
Hero Image
Visualising Decision Trees, Random Forests, Gradient Boosted Trees

I wrote a series of posts for Towards Data Science concerning visualising the various types of trees and using this to tune hyperparameters. Read the full series here: A Visual Guide to Tuning Decision-Tree Hyperparameters A Visual Guide to Tuning Random Forest Hyperparameters A Visual Guide to Tuning Gradient Boosted Tree Hyperparameters

  • Data Science
  • Regression
  • Decision Trees
  • Random Forests
  • Gradient Boosted Trees
  • Hyperparameters
  • Python
Tuesday, October 7, 2025 | 1 minute Read
Hero Image
UK House Prices Map

I built an interactive map for UK house prices. You can see things such as: What was the median property price in W1 in 2015? Where is the median price for a flat under £300,000? Where can I afford a detached house on 15x salary? How have office prices changed over time? It’s a bit slow as it’s hosted on a minimal GCP Cloud Run instance to save costs, and there is a lot of data. If it crashes, just refresh.

  • Data Science
  • Visualisation
  • Maps
  • Cost of Living
  • Housing
  • Python
Friday, September 19, 2025 | 1 minute Read
Hero Image
Kaggle: Predict the Introverts from the Extroverts

I’ve spent the last few months (and continue to spend them) working for an EV startup - as such, I’ve had a lot less time for personal projects such as this. The Kaggle notebook for this project can be found here: https://www.kaggle.com/code/jamesdeluk/ieclassifier It’s a quick and dirty one, not particularly tidy - I just felt the urge to have a play. Don’t judge me. Intro A simple challenge from Kaggle’s Playground series, called Predict the Introverts from the Extroverts. It can be found here: https://www.kaggle.com/competitions/playground-series-s5e7

  • Data Science
  • Classification
  • Clustering
  • Regression
  • Python
Friday, July 25, 2025 | 7 minutes Read
Hero Image
Basic APIs: Diamond Pricing with FastAPI, Docker, GitHub Actions, and Azure

This has been a long time coming! Basic APIs part one was posted back in January; since then I’ve started a new job and have had some contract/freelance work, so I didn’t get around to completing this. Also, since then, I’ve moved from Windows to Mac, and started using uv instead of pyenv/pip, so things may look a little different. The repo for this project can be found here: https://github.com/jamesdeluk/data-projects/tree/main/basic-apis/diamond-price The goal Send (POST) to an online API with diamond criteria, and get a predicted price as a response.

  • Data Science
  • Regression
  • Random Forest
  • APIs
  • Docker
  • FastAPI
  • Azure
  • Cloud
  • Python
Thursday, March 6, 2025 | 12 minutes Read
Hero Image
Bonus: DVLA Data Dashboard in Power BI

This project follows on from my one on analysing DVLA data. If you haven’t already read it, you may want to do so - you can find it here: https://www.jamesgibbins.com/analysing-dvla-data/ Intro Time for something different - a dashboard! Sure, Python is great for transforming and analysing data, but it’s not the most end-user-friendly†. This is why business intelligence (BI) / data visualisation tools exist, the most popular of which is Microsoft’s Power BI.

  • Data Analysis
  • Data Visualisation
  • Business Intelligence
Saturday, January 18, 2025 | 7 minutes Read
Hero Image
Analysing DVLA Data for Vintage Comebacks

GitHub repo: https://github.com/jamesdeluk/data-projects/tree/main/dvla-vehicle-statistics Intro The DVLA website provides vehicle licensing data, updated every few months. As a car and data enthusiast, I wanted to have a poke around. I started with two primary questions: What are the most popular vehicles in the UK? Which old vehicles are having a comeback? → This might suggest they will appreciate in value! Selecting the dataset There are a few different datasets, available here: https://www.gov.uk/government/statistical-data-sets/vehicle-licensing-statistics-data-files The one(s) I decided to use were:

  • Data Analysis
Friday, January 17, 2025 | 21 minutes Read
Hero Image
Basic APIs: Sentiment Analysis with Flask, Docker, and AWS

The repo for this project can be found here: https://github.com/jamesdeluk/data-projects/tree/main/basic-apis/sentiment-analysis The goal Send (POST) a word or phrase to an online API, and get a sentiment (positive or negative) and score as a response. Step 1: Build the Python script First we need the script that can take a word or phrase as an input and return a sentiment. import json from textblob import TextBlob def analyse(data): text = data.get('text', '') if not text: return json.dumps({'error': 'No text provided'}), 400 blob = TextBlob(text) sentiment_score = blob.sentiment.polarity return json.dumps({ 'input': text, 'sentiment_score': sentiment_score, 'sentiment': ( 'positive' if sentiment_score > 0 else 'negative' if sentiment_score < 0 else 'neutral' ) }) The API will be JSON-based, so this script is also. The function takes some text and uses TextBlob to give it a sentiment score between 1 (positive) and -1 (negative), which it returns, also in JSON form.

  • Data Science
  • Sentiment Analysis
  • APIs
  • Docker
  • Flask
  • AWS
  • Cloud
  • Python
Wednesday, January 8, 2025 | 7 minutes Read
Hero Image
Podcast Sentiment and Topic Analysis

Intro I am a bit of a podcast addict. One of the podcasts I listen to most, The Jordan Harbinger Show, has Feedback Friday every week. It’s effectively an agony aunt, except with two uncles. I always loved it, but over time I’ve felt it has become too negative - too many stories about addiction and abusive relationships. They’re interesting to discuss, but they’re a bit depressing. I wanted to test this hypothesis. Jordan is nice enough to provide all the transcripts on his website, so I thought I’d analyse the FBF episodes from the last year and see if they really are negative. While I had the data, I thought it would be interesting to extract common themes and topics too.

  • Data Science
  • Data Analysis
  • Natural Language Processing
  • Python
Thursday, January 2, 2025 | 11 minutes Read
Hero Image
Spotify Wrapped DIY

I was a bit disappointed with my Spotify Wrapped this year, and it seems I wasn’t the only one. The insights weren’t as interesting, the graphics weren’t as exciting, and some of it seemed irrelevant. Plus, of course, you only get one a year. So I thought I’d create my own. The code isn’t pretty; it’s something quick and dirty thrown together over a couple of hours. You can see it on my GitHub: https://github.com/jamesdeluk/data-projects/tree/main/spotify-wrapped-diy

  • Data Analysis
Wednesday, December 18, 2024 | 4 minutes Read
Hero Image
Disaster Tweets Natural Language Processing

Intro I have a dataset of tweets, which includes whether they are referring to a disaster or not. The goal is to build a model that takes a tweet and predicts if it is a disaster. This could be useful during an actual disaster to ensure only the most relevant ones are shown to emergency responders. The full code for this project can be found on my GitHub: https://github.com/jamesdeluk/data-projects/tree/main/nlp-with-disaster-tweets Exploring and cleaning the data I started by looking at the raw data in a text editor; as it was only a few hundred kilobytes, it was easy enough to do:

  • Data Science
  • Data Analysis
  • Natural Language Processing
  • Regression
  • Python
Wednesday, December 11, 2024 | 26 minutes Read
Hero Image
SQL Murder Mystery Solution and Walkthrough

I recently heard of SQL Murder Mystery (https://mystery.knightlab.com/), a website that uses gamification to learn/practise SQL skills. There’s been a murder, and by searching the SQL database you can find out whodunit. Seemed like a fun challenge. Here’s the prompt: A crime has taken place and the detective needs your help. The detective gave you the crime scene report, but you somehow lost it. You vaguely remember that the crime was a murder that occurred sometime on Jan.15, 2018 and that it took place in SQL City. Start by retrieving the corresponding crime scene report from the police department’s database.

  • Data Science
  • Data Analysis
  • SQL
Thursday, December 5, 2024 | 5 minutes Read
Hero Image
Customer Analysis Part IV: Brand Analytics and Elasticity

This is part four of a multi-part series. Part one, segmentation and clustering, can be found here. Part two, classification, is here. Part 3, purchase analytics, here. As before, the code below is simply snippets. The full code for this section can be found in the repo: https://github.com/jamesdeluk/data-projects/blob/main/customer-analysis/ca4_brands.ipynb Intro At last we reach out final part in this series. Here we’re looking at brands, quantities, and elasticities. Data The first thing is to reimport the data, the same as part three. I also created NumPy array of prices I’ll use later for testing and simulation. We know the prices in our dataset range from 1.10 to 2.80; for nice numbers, I picked 1.00 to 3.00, in 0.01 intervals:

  • Data Science
  • Data Analysis
  • Purchase Analytics
  • Customer Analysis
  • Regression
  • Python
Sunday, December 1, 2024 | 20 minutes Read
  • ««
  • «
  • 1
  • 2
  • »
  • »»
Contact me:
  • james@gibbins.me
  • jamgib

Home page image source: 홍지우