Die Auswertung von Stellenausschreibungen erfordert viel manuellen Aufwand und nötiges Domänenwissen, etwa über die gängigen Frameworks einer Programmiersprache. Mithilfe eines Proof of Concepts sollte gezeigt werden, wie gut sich Programmiersprachen und Frameworks automatisiert labeln lassen um den manuellen Aufwand zur Auswertung von Stellenausschreibungen zu reduzieren und Trends in der Nachfrage nach bestimmten Frameworks und Programmiersprachen zu identifizieren.
Activities
· Erstellung von statischen Annotationsregeln von Programmiersprachen & -frameworks. Für eine Liste möglicher Programmierframeworks wurde ein kleines Pythonskript erstellt, welches den Python Package Index (für Python-Frameworks) sowie den NPM Package Index (für Javascript-Frameworks) crawlt und in Annotationsregeln übersetzt.
· Mithilfe der statischen Annotationsregeln wurde die Datenbasis initial annotiert.
· Erstellung eines Trainingssatzes um eine NER-Komponente (Named Entity Recognition) mithilfe des NLP-Frameworks SpaCy zu trainieren. Als Ausgangsbasis wurde hierfür eine Stichprobe des vorannotierten Datensatzes erstellt und mithilfe des Tools Label Studio manuell ergänzt
· Training einer NER-Komponente mithilfe des NLP-Frameworks SpaCy. Das trainierte Modell wurde anschließend als Python-Paket deployt und in die bestehende Anwendung integriert.
Main focus
- Programming in Python
In software projects, there is often the danger of the formation of knowledge monopolies. The content of this project is to identify knowledge monopolies in the source code. For this purpose, meta-information on source code repositories is collected and processed with the software created in this project. An interactive visualisation was created from the processed data, which allows the user to identify possible knowledge monopolies The user navigates interactively through the project directory, where possible knowledge monopolies are highlighted in colour.
ActivitiesPython, pyspark, HTML, JavaScript, D3.js, Git, Zeppelin notebook,
Description
The content of this project was the new construction of an existing data warehouse. The data warehouse was used to generate reports on business development.
The motivation for the new setup was the high maintenance effort and the susceptibility to errors, as well as the lack of flexibility, as reports could only be generated once a day. The old solution was also unable to synchronise manual interventions in the production database, which occurred in day-to-day business, e.g. in the case of cancellations, with the data warehouse.
The newly built data warehouse enables the individual departments to generate reports with Microsoft Power BI in order to monitor business development or make strategic decisions. By implementing a history log, reports can be generated for every data status in the past.
Activities
- Maintenance and improvement of the existing Python scripts for generating the daily business reports. Among other things, the time required to generate the daily business report (Excel) was reduced from over 2 hours to about 1 minute.
- Building the new data warehouse based on a developed history protocol and Apache Airflow. The history log is a solution based on database triggers that records all operations in the production database. The built ETL pipeline in Apache Airflow transforms this history log into a data model that can be used to create reports using Microsoft Power BI.
- Building reports using Microsoft Power BI. In addition to the migration of the existing business reports, further visualisations were built for exploration, e.g. a map representation of the sales activities maintained in the CRM, which can be filtered by sales area/activity type.
- Support with the introduction of Power BI Cloud in the company. After a trial phase with Power BI Desktop, it was decided to switch to Power BI Cloud. For this, an operational concept was created and Power BI Gateway was installed and set up so that the on-premise database could be accessed for report creation. Training materials were also created and the users were subsequently trained.
- Migration of the reports to Power BI Cloud. The reports created in the trial phase were migrated to Power BI Cloud. The different data models were merged and additional security rules were implemented so that users can only access the data intended for them.
Main focus
- Programming in Python
- Exploratory data analysis
- Building reports using Microsoft Power BI
- Creation of new & migration of existing reports in Power BI Cloud
- Conduct training in Power BI Cloud
- Support with the introduction and administration of Power BI Cloud
Development environment, tools, methods
Python, git, Apache Airflow, Docker, Microsoft Power BI
Branch
B2B platform for the brokerage of vehicle financing
The content of this project is the improvement of an online marketplace for freight transports through machine learning. One component of the marketplace is a web interface through which freight forwarders can post offers for freight transport. The marketplace is to be improved in the future through the use of machine learning, for example to support customers in pricing or to identify faulty offers. By creating a prototype, the first step is to evaluate how well the prices of the freight transport offers can be predicted with the help of machine learning.
ActivitiesPython, pandas, sklearn, zeppelin notebook, git
Nutze unseren KI-Bot, um gezielt Fragen zu diesem Dienstleister zu stellen, Inspiration für dein Projekt zu sammeln oder passende Alternativen zu finden. Schnell, einfach und rund um die Uhr für dich da!