Zum Trainieren maschineller Lernverfahren zur Erkennung von Handschriften werden Textdaten mit korrespondierenden Bildern benötigt. Die Textdaten liegen häufig im TEI-Format das diverse Möglichkeiten eröffnet, um textuelle und semantische Phänomene auszuzeichnen, weiter können gar eigene Tags oder Auszeichnungsarten eingeführt werden. In diesem Beitrag wird ein im EU-Projekt READ entwickeltes parametrisierbares Tool beschrieben, das mit unterschiedlichen Auszeichnungsstilen in TEI umgehen kann und Textdateien auf Seitenbasis liefert, die zur Zuordnung von Text zu Bilddaten (text-to-image) genutzt werden können und somit zur Aufbereitung von Trainingsdaten für Modelle der Handschriftenerkennung dienen. Die gezeigten Beispiele und Anwendungen stammen alle aus Projekten, die ihre Daten für READ zur Verfügung stellten.
BibTex | DOI: 10.18420/infdh2018-11
I updated the API URLs to be more in line with conventional standards. All current API endpoints can now be found under /api/v1/
. For reference the current endpoints are:
In addition I have rewrote the OAI-PMH Django app and made a separate Git repository out of it. You can find it on Github.
In this thesis different architectures of Convolutional Neural Networks (CNN) and their suitability for object recognition were investigated by using the example of Egyptian hieroglyphs.
First, basic principles for artificial neural networks and the components of CNNs, such as convolutional layers, are introduced and explained, followed by explanations of the used data sets and the associated difficulties. We then present libraries for the concrete implementation and use of artificial neural networks and describe the sequence of the evaluation of object recognition.
The CNNs were trained and evaluated with different numbers of classes and the associated number of images. The experiments are divided by the three used training methods. For the first, the CNNs were trained with the help of autoencoders. For the second, the CNNs were trained block-wise, and for the third deeper network architectures were investigated. Various network architectures such as Residual Networks (ResNet) and Densely Connected Convolutional Networks, described in the literature, were implemented and evaluated.
The results of the experiments show that it is possible to train CNNs with up to 69 convolutional layers, to classify about 6500 different Egyptian hieroglyphs and finally to carry out an object recognition with very good results. The best results for object detection 0.92362 were achieved for a CNN with 6465 classes and 13 convolutional layers.
BibTex | PDF
In this thesis different machine learning algorithms are evaluated for the task of multi-label classification. The evaluation is done with the binary classifiers naive Bayes and support vector machine (SVM) and the multi-class classifier supervised latent Dirichlet allocation (SLDA). To enable naive Bayes and SVM to do multi-label classification the RAkEL transformation is used and for SLDA a topic model multi-label learner is developed and used.
The Reuters-21578 corpus is used. Since not all texts have labels and not all labels occur in sufficient frequency a selection of texts was used. Two corpora were created and used for classification.
The classification results show that the best results are archived with SVM. Naive Bayes and SLDA give very similar results, but SLDA has a very long runtime.
BibTex | PDF
I recently read an interresting post about the target="_blank"
vulnerability. This vulnerability leaves a user open to a very simple phishing attack and is quite unknown. When a link uses the target="_blank"
attribute not accompanied with the rel="noopener"
attribute or in the case of Firefox rel="noopener noreferrer"
the opening site gives the new site access to the existing window through the window.opener
API, allowing a few permissions. Some of these permissions are automatically negated by cross-domain restrictions, but window.location
is fair game.
To see this vulnerability in action you can use this link. It'll open the post in a new tab/window and redirect this window to an other page.
The code below shows the necessary code for the window.opener
API to redirect the opening site to a new location.
if ( window.opener ) {
window.opener.location = "https://jnphilipp.org/pages/page/gone-phishing/?referrer=" + document.referrer;
}
Because of that post, I removed all target="_blank"
attributes from the links. I had also a few other changes that had pilled up and which I hadn't gotten around to put online. Most are on the back end side. On front end side I changed manly the color of the sidebar.