Viblo is a platform for technical knowledge sharing
Natural language processing
Are you curious about our services?
This is how we did it.
We employ the Graph-based N-Gram technique and other methods for handling Unicode characters to detect language written in a post. With the detected language, we can help Viblo users to find suitable posts to their need. Besides that, our language detector can effectively aid Elasticsearch engine to achieve more accurate results.
Up to now, there are many technical posts on Viblo (over 20K posts of about 100 different subjects), and the number of posts tends to increase very rapidly. As such, users often encounter the problem of information overload; users usually find it difficult to find suitable posts to their need. Therefore, a recommendation system has been built in order to suggest relevant technical posts on Viblo for users based on their interest.
With a huge numbers of published technical posts, we has to cope with the problem of plagiarism. We don't want the content of a published post is a copy-paste work. Because of that, we aim to build a system which is capable of detecting plagiarism automatically. After that, the integrity of a post can be easily verified.