top of page

Chris Ballard

Data Science Consultant

Helping you make an impact with your unstructured data.
Natural Language Processing, Machine Learning and AI Strategy Consultancy
Justified AI logo
Photo of Chris Ballard speaking at Data Science Festival 2023

Who am I?

Principal Data Scientist and Data Science Consultant with 25 years experience building applied Machine Learning and Natural Language Processing systems, AI Strategy and leading Data Science and Machine Learning teams. I have implemented solutions using Large Language Models, Transformers, PyTorch and Python applied to many use cases including classification, record linkage and semantic search. My technical and product background working across diverse sectors gives me a unique perspective focused on business outcomes.

Using data to make a difference

I am passionate about using data to make a difference. Throughout all the sectors I have worked in - edtech, ecommerce, market research, climate tech and security - has been a desire to use my technical and business skills to enable those organisations to take advantage of the data they collect to make a meaningful impact.

Achievements

Get in touch

I'd love to hear from you. Please drop me an email, or fill in the form, and I'll get back to you.

If you have a project you'd like to discuss, please book a meeting with me. I'd be very happy to chat!

Testimonals

Chris is one of those people that can take a step back and think about the problem in a holistic manner and then propose a clear and understandable and technically feasible plan. If you need someone to tackle a difficult technical problem and then explain the solution to a non-technical person, Chris is who you want.

Daniel Linder, Senior Director of Data Science, Tessian

Chris delivered above and beyond expectations. His deep expertise of Natural Langage Recognition enabled the development of a solution now globally deployed that allowed Nielsen to keep pace with the fast growing volume of data

Deborah Fassi, Senior Vice President Customer Service Europe, Asia, MEA, NielsenIQ

Technically speaking, he is superb, able to understand complex details in a short timeframe and provide long-term solutions that can deliver a lot of benefits for any company or department. Strategically speaking, he is able to take one step back and understand the big picture and find a good tradeoff as a solution. As an individual, he cares about the people surrounding him and is always looking to help.

Antonio Hurtado, VP R&D, NielsenIQ

Few shot email topic classification and malicious conversation starter detection

Using few-shot learning, I trained a multi-label machine learning classifier to classify emails into topics from the text of the email. As there was no existing labelled data, I used SetFit to fine tune a Sentence Transformer using contrastive learning. The model was deployed in 8 weeks to an AWS SageMaker endpoint which is serving 50k requests per minute in production. The deployed model was able to detect malicious “conversation starter” phishing emails which were previously undetected with a precision of 90%, and reduced the level of spam by ~20%. This allowed the company to demonstrate detection improvements to potential clients as part of sales PoVs.

20%

reduction in amount of spam
Machine learning classifier for malicious email detection using email metadata

Led the research and deployment of the first full end to end machine learning phishing email detection algorithm at Tessian. Trained a CatBoost machine learning classifier based on features derived from email metadata, and developed an associated feature engineering pipeline based on PySpark. Devised a method to estimate the precision of the system on production data, which reduced time to deployment. The model detected 20% more phishing emails compared to the existing system. This enabled the company to demonstrate to potential clients improvements in the efficacy of the system when compared to competitors.

20%

increase in detected phishing emails
Climate policy document semantic search

Developed a prototype semantic search system using dense text passage embeddings, ElasticSearch and zero-shot classification. This allowed users to search complex government climate policy PDF documents to understand important climate change themes. Led the team which delivered a highly functional prototype with a NextJS front end to time despite a stretching target to deliver the system in 8 weeks. It enabled the company to demonstrate their capability at the COP 26 international climate conference and secure partnerships with potential funders and organisations.

8

weeks to develop prototype product and demonstrate at international conference
Machine learning product attribution

Responsible for the research and development of a machine learning system to automate the classification of attributes from product descriptions sourced from retailers. After demonstrating the potential of the system to senior leaders and securing funding for development, I delivered a scalable system using FastText which was able to automate the classification of thousands of product attributes with high accuracy. It was estimated to deliver ~$2M in cost savings when compared to manual data entry which resulted in the system being deployed and integrated as part of the company’s global data entry system.

$2M

annual reduction in manual data entry costs
Retailer receipt matching

Built a text data linkage system to match product descriptions obtained from consumer receipts to descriptions received from retailers. The system needed to be delivered quickly to remove a significant roadblock which would have prevented a major new service being made available to clients. I developed a baseline model using multiple matching features coupled with a Logistic Regression model. As part of this, I employed cross validation to overcome issues with domain shift when the model was applied to new retailers. The baseline model was developed in 4 weeks to demonstrate that the problem could be solved and secure additional funding for development of the full solution.

4

weeks to develop prototype to resolve blocker for product launch

My Expertise

Natural Language Processing

Multiclass and multilabel text classification, named entity recognition, semantic search, topic modelling, document layout analysis, parsing unstructured documents

LLMs and Transformers

Fine tuning transformers, sentence transformers, zero and few shot text classification

PyTorch and Hugging Face

Zero/few shot text classification using SetFit, custom PyTorch Seq2Seq, ConvNet models, fine tuning transformers for classification problems.

Python and SciPy

15 years experience with Python, sckit-learn, numpy, pandas. Development of ML pipelines using Luigi and Dagster. REST API endpoints using FastAPI.

Machine Learning

Experienced Machine Learning practitioner with an understanding of algorithms from the ground up. Reproducible research principles and application.

AWS SageMaker

SageMaker endpoints for model deployment, SageMaker notebooks/studio, labelling workflows.

Data and Databases

Querying and designing SQL and NoSQL storage, including SQL Server, PostgreSQL, MondoDB, ElasticSearch, PySpark, data lake storage in S3.

AI Strategy and Planning

Takes a business-first perspective. AI project planning and implementation, stakeholder management, business data and AI strategy planning.

Technical Communication

I enjoy and am able to explain complex technical projects to stakeholders including clients and C-level management.

Team Leadership

I have led large data science teams involved in applied ML research. I love to foster collaborative and fun working environments.

bottom of page