-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
15 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,2 +1,16 @@ | ||
# Sequence-Clustering | ||
Cluster sequences into inliers and outliers and generate a novel prototypical sequence for each cluster. | ||
Cluster sequences into inliers/outliers and generate a novel prototypical sequence for each cluster. | ||
|
||
## Description | ||
Consider the following scenario: a process generates a set of sequences, each sequences is encoded as a sequence of characters. There is an undisclosed number of distinct processes so it should be possible to group the sequences into clusters of similar sequences. However, in addition some sequences have been generated by another unrelated process to form outliers. Each instance is either an inlier or an outlier. | ||
|
||
## Tasks | ||
1. Cluster the inliers into an appropriate number of groups. | ||
2. Generate a novel prototypical sequence for each cluster, i.e. a sequence that is the most representative for that cluster. Note that the prototypical sequence must be novel, i.e. not be one of the provided sequences. | ||
|
||
## Data | ||
A text file, `test.txt`, is provided which contains a random mixture of inlier and outlier sequences in no particular order. Each row of this contains an integer identifier for the sequence and the sequence itself. | ||
|
||
## Outputs | ||
1. Print for each sequence's identifier together with the cluster ID they belong to or that they are an outlier. | ||
2. Print one novel prototypical sequence for each cluster you have found. |