abstract |
According to one embodiment of the invention, a characteristic of a biological sequence is represented in a fingerprint comprising a set of bits and may also comprise counts, strings or consecutive values for the characteristic. The fingerprints can be used with machine learning and statistical methods. This is particularly advantageous for, but not limited to, drug discovery processes. The method allows performing structure activity relationship (SAR) and quantitative structure activity relationship (QSAR) studies on biological sequences. |