Masterthesis: Objekterkennung mit Hilfe von Convolutional Neural Networks am Beispiel ägyptischer Hieroglyphen

Monday, 27. February 2017 in publications :: #thesis
en | de

In this thesis different architectures of Convolutional Neural Networks (CNN) and their suitability for object recognition were investigated by using the example of Egyptian hieroglyphs.

First, basic principles for artificial neural networks and the components of CNNs, such as convolutional layers, are introduced and explained, followed by explanations of the used data sets and the associated difficulties. We then present libraries for the concrete implementation and use of artificial neural networks and describe the sequence of the evaluation of object recognition.

The CNNs were trained and evaluated with different numbers of classes and the associated number of images. The experiments are divided by the three used training methods. For the first, the CNNs were trained with the help of autoencoders. For the second, the CNNs were trained block-wise, and for the third deeper network architectures were investigated. Various network architectures such as Residual Networks (ResNet) and Densely Connected Convolutional Networks, described in the literature, were implemented and evaluated.

The results of the experiments show that it is possible to train CNNs with up to 69 convolutional layers, to classify about 6500 different Egyptian hieroglyphs and finally to carry out an object recognition with very good results. The best results for object detection 0.92362 were achieved for a CNN with 6465 classes and 13 convolutional layers.

BibTex | PDF

Bachelorthesis: Multi-Label Klassifikation am Beispiel sozialwissenschaftlicher Texte

Saturday, 04. February 2017 in publications :: #thesis
en | de

In this thesis different machine learning algorithms are evaluated for the task of multi-label classification. The evaluation is done with the binary classifiers naive Bayes and support vector machine (SVM) and the multi-class classifier supervised latent Dirichlet allocation (SLDA). To enable naive Bayes and SVM to do multi-label classification the RAkEL transformation is used and for SLDA a topic model multi-label learner is developed and used.

The Reuters-21578 corpus is used. Since not all texts have labels and not all labels occur in sufficient frequency a selection of texts was used. Two corpora were created and used for classification.

The classification results show that the best results are archived with SVM. Naive Bayes and SLDA give very similar results, but SLDA has a very long runtime.

BibTex | PDF

Vulnerabilities and other stuff

Tuesday, 06. September 2016 in misc :: #coding, #stuff

I recently read an interresting post about the target="_blank" vulnerability. This vulnerability leaves a user open to a very simple phishing attack and is quite unknown. When a link uses the target="_blank" attribute not accompanied with the rel="noopener" attribute or in the case of Firefox rel="noopener noreferrer" the opening site gives the new site access to the existing window through the window.opener API, allowing a few permissions. Some of these permissions are automatically negated by cross-domain restrictions, but window.location is fair game.

To see this vulnerability in action you can use this link. It'll open the post in a new tab/window and redirect this window to an other page.

The code below shows the necessary code for the window.opener API to redirect the opening site to a new location.

:::javascript
if ( window.opener ) {
    window.opener.location = "https://jnphilipp.org/pages/page/gone-phishing/?referrer=" + document.referrer;
}

Because of that post, I removed all target="_blank" attributes from the links. I had also a few other changes that had pilled up and which I hadn't gotten around to put online. Most are on the back end side. On front end side I changed manly the color of the sidebar.

TIMA: EOL

Monday, 03. October 2016 in projects :: #coding, #science, #tima

I must sadly announce the end of life for TIMA. Or at least the end of the TIMA website at https://tima.jnphilipp.org. This is due to the practically non existent traffic and my inability to maintain the site. The EOL will be at the end of the month, the 30th of September 2016. I will upload a database dump with the associations to this post after the shutdown.

Update: So the EOL of the TIMA website is reached. As promised a dump of the associations can be downloaded here as a JSON-file. For each word the language, count, identifier and associations are given, here the count indicates how often the word was answered. An association has the same informations, but here the count indicates how often the association was given to the word.

New features

Wednesday, 09. March 2016 in misc :: #api, #coding

Over the last few weeks I added a few new features. The most extensive feature I added is the API. The API consists of two parts, the first is to retrieve the posts and projects as JSON. The other is an OAI-PMH endpoint, which returns XML. At the moment I only support the metadata in the Dublin Core format, but I plan to add CMDI. For details on the API I added a page to the project section. The second feature I added was inspired by this post about signing web content using PGP. I added signatures to the posts and projects which can be view in the source code and verified using my public key or with Keybase. On a side note, I got new certificates from Let’s Encrypt and forcing HTTPS now.