HealthFlex
×
  • Past Conferences
    • Conference 2022
    • Conference 2021
    • Conference 2020
  • Our Team
  • Blog
  • LinkedIn
  • Register
Natalie Vollert

Natalie Vollert

Data Science Consultant & Project Manager, Applied Statistics, Klagenfurt & Vienna, Austria

Natalie is a Data Science Consultant at TÜV Austria Data Intelligence located in Vienna. She is responsible for supporting companies in their digitization processes by developing custom data-driven solutions based on statistical methods. Her work is focused on gaining maximum value from data, starting from data engineering and feature extraction as a basis for machine learning algorithms, up to providing BI tools and software solutions to clients.

After finishing her master’s study in Technical Mathematics at the Alpen-Adria University of Klagenfurt in 2015 she started a research career at Carinthian Tech Research (CTR). Her main focus was on her Ph.D. work that dealt with the modeling of computer simulation output based on Gaussian process surrogates. In 2018 she changed her scope to the field of data science by starting to work with Applied Statistics. Additionally she finished her doctorial studies in 2020.

 

Technical Vision Talk: “Natural Language Processing for the classification of documents in an industrial environment”

Most companies, especially production facilities, are often in need to efficiently handle thousands to millions of documents including operating instructions, technical drawings, licensing documentation and more. Assuming that data warehouse concepts are already in place to centrally store this huge amount of data, it can still be quite cumbersome to find specific information of interest. For such big data problems, a full-text search cannot be performed any more. Thus, clustering documents in predefined groups is of particular interest to be able to access the required information. However, executing this classification manually is a very expensive and time-consuming task, so the benefit of replacing it with an AI tool appears obvious.

The basic idea is to retrieve information directly out of document texts, which is part of the field of natural language processing. The general workflow can be summarized as follows: 1. preprocessing the raw texts to generate a vocabulary; 2. generate features based on word counts or term frequency–inverse document frequency transformations; 3. use these features as input variables for machine learning algorithms. This approach will be demonstrated in more detail on an example, where machine-readable documents of different types need to be classified into approximately 200 groups describing the document content. A linear support vector machine was trained on 2.3 million documents and evaluated on 800.000 different documents for testing, achieving an accuracy of 90%. For the use case at hand the successful project completion was related to savings of 150k €/year capital expenditures for the client.

__________________

Fri. Oct 1 | 9:30 am – Technical Vision Talk: “Natural Language Processing for the classification of documents in an industrial environment”

 

WiDS Villach is an independent event organized by Olivia Pfeiler and Anita Kloss-Brandstätter in cooperation with AI Carinthia as part of the annual WiDS Worldwide conference organized by Stanford University and an estimated 200+ locations worldwide, which features outstanding women doing outstanding work in the field of data science. All genders are invited to attend all WiDS Worldwide conference events.

Join us in the heart of the Alps-Adriatic-region at Carinthia University of Applied Sciences!

office@widsvillach.org

Europastraße 4, 9524 Villach, Austria

Quick Links

  • Terms and Conditions
  • Legal Notice

Connect with us on LinkedIn

Watch us on YouTube

Copyright © 2022 all rights reserved
Content + Conversion by digitallotsen.at | Design by Plethora
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT