Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
ushashwat authored Apr 12, 2024
1 parent df2e439 commit 8066273
Showing 1 changed file with 15 additions and 1 deletion.
16 changes: 15 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,16 @@
# Sequence-Clustering
Cluster sequences into inliers and outliers and generate a novel prototypical sequence for each cluster.
Cluster sequences into inliers/outliers and generate a novel prototypical sequence for each cluster.

## Description
Consider the following scenario: a process generates a set of sequences, each sequences is encoded as a sequence of characters. There is an undisclosed number of distinct processes so it should be possible to group the sequences into clusters of similar sequences. However, in addition some sequences have been generated by another unrelated process to form outliers. Each instance is either an inlier or an outlier.

## Tasks
1. Cluster the inliers into an appropriate number of groups.
2. Generate a novel prototypical sequence for each cluster, i.e. a sequence that is the most representative for that cluster. Note that the prototypical sequence must be novel, i.e. not be one of the provided sequences.

## Data
A text file, `test.txt`, is provided which contains a random mixture of inlier and outlier sequences in no particular order. Each row of this contains an integer identifier for the sequence and the sequence itself.

## Outputs
1. Print for each sequence's identifier together with the cluster ID they belong to or that they are an outlier.
2. Print one novel prototypical sequence for each cluster you have found.

0 comments on commit 8066273

Please sign in to comment.