Starting from scratch
When I introduce myself as the Data Scientist of a credit card startup, people usually nod in approval, then proceed to remain rather confused.
Why does a startup care about Data Science? Why does a credit card care about Data Science? Also, what is Data Science anyways?
Even within Yonder, sometimes our new joiners might still remark: “James, I have no idea what you do, but it seems darn impressive.”
While I am humbled by the praise, this level of mysteriousness is truly unintended. Hopefully this blog will clear things up and show you what Data Science means for Yonder.
Alright, what is Data Science?
Many seem to think that Data Science is just plotting graphs and calculating averages. No. Boring. The great thing is, as Yonder’s first Data Scientist, I get to define it my way.
For me, Data Science is just statistics: It’s about seeing patterns in chaos, and using this insight to guide and augment Yonder’s business.
You see, at Yonder, aside from testing our founders (amusing as it is), we actually act out our company Vision and Mission: 1) to provide fair and conscious financial services, and 2) to make money rewarding and credit empowering.
The witchcraft of Behavioural Data Science.
This is the context of Data Science at Yonder: through seeing patterns in the chaos of human behaviours, we are able to make fairer and more conscious lending decisions, and personalise for our members to make their experiences more rewarding.
In other words, at Yonder, we practise the witchcraft of Behavioural Data Science.
Personally, this is a powerful combination between my degree training in Behavioural Sciences from Cambridge, and my research training in mathematical statistics: I can now use actual Machine Learning to help people get a fairer credit line. How cool is that?
But with great power comes great responsibility. If my statistics training taught me anything, it’s to be extremely careful with two things: Chance, and Causality.
We must be careful about Chance. Say we saw that people applying on Fridays tend to be a certain type, is this observation real, or is it just by chance? This is why we statistically test how unlikely our observations are just by chance. And no, don’t worry if you applied on Friday, we saw no statistically significant patterns there.
We must also be careful about Causality. Between 2000 and 2009, cheese consumption in the US came with more people dying from being tangled in their bedsheets. Does eating cheese make people tangled up in bedsheets? No stupid. Correlation is not Causation. Even if it is not by chance, there might still be an unseen factor causing both cheese eating and bedsheet tangles. Maybe it’s the weather, who knows?
Check out spurious correlations for more examples of statistically significant (i.e., unlikely to have occured by chance) correlations that are obviously not causal.
Now then, what Data Science magic do we get up to at Yonder? Since Data Science is about recognising patterns, there are 5 types of patterns that we care about:
- Tendency Patterns. Do members who go to cafes a lot also tend to go to pubs? And more generally, what are people’s habits?
- Temporal Patterns. What time of the day do people eat out or do shopping? What day of the week? Are there individual differences?
- Spatial Patterns. Where do Yonder members go for dining out? What about grocery shopping? Are their tube lines people love or hate?
- Textual Patterns. How are people talking about Yonder on social media? What sort of topics come up in our Member Support chats?
- Social Patterns. Do our members refer to friends similar to themselves? Who refers more than others? Are there social clusters of super-users?
To extract these patterns, we mostly use methods like regression models, longitudinal conjugates, sentence transformers, and network analyses. First we might use these simple methods to visualise data, and if there are intuitive patterns, we’d then adopt fancier methods, sometimes even borrowing from physics - it’s all just maths in the end.
Once these patterns are captured, we then use some linear algebra to merge them into a big input matrix, match it with some outcome variables (e.g., credit risk, churn), then use them to train a Machine Learning model that can make predictions from unseen data.
One of the most powerful machine learning method is Deep Neural Networks, where the model is trained (pattern in the input “x” is discovered) using unknowable layers of “neurones” that adapts to make the right pattern recognition. This is actually exactly how our brains learn to recognise patterns as well.
By “we”, what I really meant was “I”. For now, all of these are just me. As you can possibly imagine, a big challenge that comes with this is coordinating with the wider team - Sure, we can predict what a member wants for supper, but how do we build it into the product?
For this, we have been slowly building - rather, more like “growing” - a Standard Process for Data Science projects. Researchers might work alone, but Data Scientists mustn’t. Because a researcher’s work might never have any impact, but a Data Scientist’s work can directly impact thousands, if not millions and more.
Finding patterns in chaos
It’s honestly been great having inputs from everyone, learning automatic testing from our engineers, and documentation from our product managers. That’s another great thing about Yonder: we own and shape our functions, and everyone can and will help.
To summarise, at Yonder, Data Science is to see patterns in chaos, and thereby predict the future. So, in a way, I am the Oracle of Yonder. Yes, that should be my new job title.
We use lots of fancy methods, but we always try to focus on our mission and vision, which is to provide fair and conscious financial services and make it rewarding for our members.
It’s both terrifying and exciting that I am the one to start all this at Yonder. What would I like to build in the coming years? Definitely a track-record of smooth model deployment, definitely a top-notch team that produces consistent academic outputs.
Thus the prelude begins. Join us for the ride!