Distributed System and Algorithm
Date:
Scalable all-pairs similarity search in metric space on distributed system;
Building Language Model based on Sougou Dataset.
Main Work
- Building rocks-cluster on 20 raw machines and configuring Hadoop and Spark.
- Reproducing a paper, “Scalable All-Pairs Similarity Search in Metric Spaces”, and using Hierachical Clustering Method to compress the result.
- Building a language model based on Sougou Dataset, and using smooth method to optimize it.
Technology
- Language: Java
- Rocks-Cluster
- Hadoop && Spark