Hello.I want to write a pet project using Data Mining technologies, but apparently a little behind the times in this area.In general, I ask for advice on which technologies are better for this.
So, my task:
1) There is a set of several million text files in Russian, English and Ukrainian languages, which contain a set of signs in the form of a plain text description.I found a solution for English, but for Russian and Ukrainian there is nothing.
2) The original data will be stored on the server in the form of text files, the already prepared data will be stored in the database - the ID, a set of attributes and a link to the original file. 3) Data will be processed by several Data Mining algorithms(building a decision tree(CART or C4.5), classifications(kNN), clustering, etc.).Results will be transferred to the Web UI or via the REST API to the end user.
Worth the choice:
1) DBMS: I think to use PostgreSQL or MySQL.You can also try Mongo DB, but I have all the data structured, so I'm not sure if I need a NoSQL database.
2) Technology to search for signs in the text.I did not find anything suitable for Russian and Ukrainian, it seems I will have to parsit by keywords, and then check the quality manually.
3) Actually Data Mining solution.Found several libraries, for example:
But on the Internet there is very little description of their ability to make a choice.In contrast, I ponder over using the www.h2o.ai
service, but I am confused by its excessive complexity.
In addition, I would like to use one language for the whole backend, and not one Java module, another Python, etc.