I'm

Rasmus Arpe Fogh Egebæk

Data Scientist, Machine Learning Engineer, Programming Enthusiast, Proud Father

About

I'm an ambitious programmer who specializes in machine learning and data analysis. My professional passion is to have a positive influence on the world through data centric solutions - whether that is by automating tasks, providing insights to stakeholders with machine learning or simplifying and explaining complex data through visualizations or statistics. I enjoy a steep learning curve and I'm quick at getting familiar with new business terminology and technologies.

I have professional experience with machine learning research, data pipelines, data visualization, development of machine learning models, preparing machine learning models for production and machine learning operations (MLOps) from my time as Chief Technical Officer of Alvenir and the pre-company DanSpeech.

I have strong software development skills from working in an IT consultancy firm and I know my way around microservice architecture, monolith architecture, databases, front-end development, cloud infrastructure, cloud services and various APIs, frameworks and programming languages.

I have experience working with agile methodoligies such as Scrum and SAFe and I find well-planned working procedures important in order to efficiently progress.

Skills

Highlighted skills

Python
Proficient

Python is my go-to language when doing machine learning and data analysis and I'm very proficient in writing clean Python code.

Kubernetes
Proficient

I used kubernetes (GKE and EKS) at Alvenir (and at DTU before we were Alvenir) to run our speech recognition platform. We managed our applications using Helm and Helmfile and I set everything up from cluster, custom helm charts, integrations and deployment scripts. During my time at Netcompany, I also had an encounter with OpenShift.

Machine Learning
Proficient

Since my second year of my studies, I have worked with machine learning. I have trained state-of-the-art ASR and NLP models, and deployed them to a production. I know how to optimize inference by e.g. using onnx or pre-build docker images. I'm proficient in many machine learning frameworks and can quickly learn new ones.

Set of skills

I have worked with many different technologies and more Python frameworks than reasonable to list. Here is curated list of my skills.

  • All
  • Programming Languages
  • Frameworks
  • Infrastructure
  • Other

Python

Kafka

Pytorch

Tensorflow

Numpy

DevOps

MLOPs

Pandas

onnx

Fast API

Django

(Huggingface) Transformers

Scikit-learn

Kubernetes

Seldon

Machine Learning

Data visualization

d3

Linux

Amazon Web Services (AWS)

Google Cloud Platform (GCP)

Docker

Helm

Helmfile

Terraform

Jenkins

Java

Kotlin

Groovy

Javascript

Typescript

Angular

Gradle

Ebean

Spring (Boot, Web, Cloud, Security)

Liquibase

Flyway

Hibernate

Bash

css/scss

html

Git

SQL

Timeline

Filter my timeline using the below buttons.

All Work Studies Personal

Open Source Projects

I am a big fan of open-source technology! I think sharing and collaborating is very important in order to move forward faster and to ensure that essential high-quality machine learning models (e.g. speech recognition models) are available for everyone and not just a few companies. I sometimes contribute to open-source libraries in my sparetime because I really enjoy the challenge of getting familiar with a completely new codebase. Below is a list of some of my open-source contributions that I have been a major part of.

  • All
  • Models
  • Packages
  • Datasets
punctfix

Punctuation restoration Python package for Danish, English and German. Punctation models based on BERT architecture.

danspeech

Python package for automatic speech recognition based on DeepSpeech 2 architecture.

Pretraining of Danish wav2vec2-large model

A Danish pre-training of wav2vec2-large architecture using ~120000 hours of speech data. Collaboration with Århus University.

Finetuning of Danish wav2vec2-large model

A finetuning with NST data (approximately 200 hours) and Danish part of Common Voice 9. Collaboration with Århus University.

Pretraining of Danish wav2vec2-base model

A Danish pre-training of wav2vec2-base architecture using ~1300hours of speech data.

Finetuning of Danish wav2vec2-base model

A finetuning with NST data (approximately 200 hours) of the wav2vec2-base model.

Danish finetuning of BERT for punctation restoration

A finetuning based on a custom filtered subset of Danish mc4 able to determine where to perform punctuation restoration.

Alvenir ASR Da evaluation

An evaluation dataset for danish speech recognition consisting of ~5hours of speech.

Contact

If you want to get in touch because:

  • You want to collaborate on open-source technology
  • You need my assistance / skills for something

Then feel free to write me an email at rasmus.arpe[at]gmail.com or DM me on LinkedIn. :-)