本文共 43386 字,大约阅读时间需要 144 分钟。
Their training requires constantly increasing volume of samples, and they also do not be able to explain why a particular decision was made. Structural approaches to Machine Learning avoiding these drawbacks exist, the software implementation of one of which is described in the article. This is an English translation of by the author.
他们的培训需要不断增加的样本量,而且他们也无法解释为什么做出特定决定。 存在避免这些缺点的机器学习结构方法,本文描述了其中一种方法的软件实现。 这是作者的英文翻译。
We describe one of national approaches to Machine Learning, called «VKF-method of Machine Learning based on Lattice Theory». The origin and choice of the name are explained at the end of this article.
我们描述了一种称为“基于格理论的机器学习的VKF方法”的国家机器学习方法。 名称的由来和选择在本文末尾进行了说明。
The initial system was created by the author as a console C++ application, then it obtained MariaDB DBMS databases support (through the mariadb++ library), then it was converted into a CPython library (using the pybind11 package).
最初的系统由作者创建为控制台C ++应用程序,然后(通过mariadb ++库)获得了MariaDB DBMS数据库支持,然后将其转换为CPython库(使用pybind11软件包)。
Several datasets from the UCI machine learning repository were selected as approbation of the concept. The mushrooms dataset contains descriptions of 8124 mushrooms in North America, the system achieves 100% results. More precisely, the initial data was randomly divided into a training sample (2,088 edible and 1,944 poisonous mushrooms) and a test sample (2,120 edible and 1,972 poisonous mushrooms). After computing about 100 hypotheses about the causes of edibility, all test cases were predicted correctly. Since the algorithm uses a coupled Markov chain, a sufficient number of hypotheses may vary. It was often enough to generate 50 random hypotheses. Note that when generating the causes of poisonous fungi the number of required hypotheses is grouped around 120, however, all test cases are predicted correctly in this case too.
选择了UCI机器学习存储库中的几个数据集作为对该概念的认可。 蘑菇数据集包含北美8124个蘑菇的描述,系统可达到100%的结果。 更准确地说,将初始数据随机分为一个训练样本(2,088个可食用蘑菇和1,944个有毒蘑菇)和测试样本(2,120个可食用和1,972个有毒蘑菇)。 在计算出有关可食性原因的大约100个假设之后,所有测试用例都得到了正确的预测。 由于该算法使用耦合马尔可夫链,因此可能会有足够多的假设。 通常足以产生50个随机假设。 请注意,在生成有毒真菌的原因时,所需假设的数量大约为120,但是,在这种情况下,所有测试用例也都可以正确预测。
Kaggle.com has a competition where quite a few authors have achieved 100% accuracy. However most of the solutions are neural networks. Our approach allows the mushroom picker to remember about 50 rules. Moreover, the most features are insignificant, hence each hypothesis is a conjunction of a small number of values of essential features, which makes them easy to remember. After that, the human can go for mushrooms without being afraid to take a toadstool or skip an edible mushroom.
Kaggle.com有一个竞争性 ,许多作者都达到了100%的准确性。 但是,大多数解决方案是神经网络。 我们的方法允许蘑菇采摘者记住大约50条规则。 而且,大多数特征是微不足道的,因此每个假设都是少量基本特征值的结合,这使它们易于记忆。 在那之后,人类可以去采蘑菇,而不必担心会拿伞菌或跳过可食用的蘑菇。
Here is a positive hypothesis that leads to assumption of edibility of a mushroom:
这是一个导致蘑菇可食用性假设的正面假设:
[('gill_attachment', 'free'), ('gill_spacing', 'close'), ('gill_size', 'broad'), ('stalk_shape', 'enlarging'), ('stalk_surface_below_ring', 'scaly'), ('veil_type', 'partial'), ('veil_color', 'white'), ('ring_number', 'one'), ('ring_type', 'pendant')]
[('gill_attachment','free'),('gill_spacing','close'),('gill_size','broad'),('stalk_shape','enlarging'),('stalk_surface_below_ring','scaly') ,('veil_type','partial'),('veil_color','white'),('ring_number','one'),('ring_type','pendant')]]
Please note that only 9 of the 22 features are listed, since the similarity between the edible mushrooms that generate this cause is empty on the remaining 13 attributes.
请注意,仅列出了22个特征中的9个,因为产生该原因的可食用蘑菇之间的相似性在其余13个属性上为空。
Second dataset was SPECT Hearts. There, the accuracy of predicting test examples reached 86.1%, which turned out to be slightly higher than the results (84%) of the CLIP3 Machine Learning system (Cover Learning with Integer Programming, version 3), used by the authors of the data. I believe that due to the structure of the description of heart tomograms, which are already pre-encoded by binary attributes, it is unpossible to significantly improve the quality of the forecast.
第二个数据集是SPECT Hearts。 在那里,预测测试示例的准确性达到了86.1%,这比数据作者使用的CLIP3机器学习系统(带整数编程的封面学习,版本3)的结果(84%)略高。 。 我相信,由于心脏断层图像的描述结构已经通过二进制属性进行了预编码,因此不可能显着提高预测质量。
Recently the author discovered (and implemented) an extension of his approach to the processing of data described by a continuous (numeric) features. In some ways, his approach is similar to the C4.5 system of Decision Trees Learning. This variant was tested on the Wine Quality dataset. The data describes the quality of Portuguese wines. The results are encouraging: if you take high-quality red wines, the hypotheses fully explain their high ratings.
最近,作者发现(并实现了)他的方法的扩展,以处理由连续(数字)特征描述的数据。 在某些方面,他的方法类似于“决策树学习”的C4.5系统。 此变体已在Wine Quality数据集中进行了测试。 数据描述了葡萄牙葡萄酒的质量。 结果令人鼓舞:如果您选择高品质的红酒,则这些假说充分说明了它们的高评价。
Now students at Intelligent systems Department of RSUH is developing a serie of web servers for different research areas (using Nginx + Gunicorn + Django).
现在,RSUH智能系统系的学生正在为不同的研究领域(使用Nginx + Gunicorn + Django)开发一系列Web服务器。
Here I'll describe a different variant (based on aiohttp, aiojobs, and aiomysql). The aiomcache module is rejected due to well-known security problems.
在这里,我将描述一个不同的变体(基于aiohttp,aiojobs和aiomysql)。 由于众所周知的安全问题,aiomcache模块被拒绝了。
Proposed variant has several advantages:
提议的变体具有几个优点:
It has a obvious disadvatages (with respect to Django):
它有一个明显的缺点(相对于Django):
Each of the two options targets on different strategies for working with the web server. Synchronous strategy (in Django) is aimed at single-user mode, in which the expert works with a single database at each time. Although the probabilistic procedures of the VKF method are well parallelized, nevertheless, it is theoretically possible that Machine Learning procedures will take a significant amount of time. Therefore, the second option is aimed at several experts, each of whom can simultaneously work (in different browser tabs) with different databases that differ not only in data, but also in the way they are represented (different lattices on the values of discrete features, different significant regressions, and the number of thresholds for continuous ones). In this case, after starting the VKF computation in one tab, the expert can switch to another, where she will prepare or analyze the experiment with other data and/or parameters.
这两个选项中的每一个都针对使用Web服务器的不同策略。 同步策略(在Django中)针对单用户模式,其中专家每次都使用一个数据库。 尽管VKF方法的概率过程可以很好地并行化,但是从理论上讲,机器学习过程可能会花费大量时间。 因此,第二种选择针对的是几位专家,他们每个人都可以同时(在不同的浏览器选项卡中)使用不同的数据库,这些数据库不仅在数据方面而且在表示方式上也不同(离散要素值上的不同格子) ,不同的显着回归以及连续阈值的数量)。 在这种情况下,在一个选项卡中启动VKF计算后,专家可以切换到另一个选项卡,在那里她将使用其他数据和/或参数来准备或分析实验。
There is an auxiliary (service) database 'vkf' with two tables 'users' and 'experiments' to account for multiple users, experiments, and different stages they are at. The table 'user' stores the login and password of all registered users. The table 'experiments' saves a status of these tables in addition to the names of auxiliary and main tables for each experiment. We rejected aiohttp_session module, because we still need to use the Nginx proxy server to protect critical data.
有一个辅助(服务)数据库“ vkf”,其中有两个表“ users”和“ experiments”以说明多个用户,实验及其所处的不同阶段。 表“用户”存储所有注册用户的登录名和密码。 除了每个实验的辅助表和主表的名称外,“实验”表还保存了这些表的状态。 我们拒绝了aiohttp_session模块,因为我们仍然需要使用Nginx代理服务器来保护关键数据。
The structure of the table 'experiments' are the following:
表“实验”的结构如下:
It should be noted that there are some sequences of data preparation for ML experiments, which, unfortunately, differ radically for the discrete and continuous cases.
应该注意的是,机器学习实验有一些数据准备的顺序,不幸的是,对于离散和连续的情况,这些顺序是根本不同的。
The case of mixed attributes combines both types of requirements.
混合属性的情况结合了两种类型的需求。
discrete: => goodLattices (semi-automatic)
离散的:=> goodLattices(半自动)
discrete: goodLattices => goodEncoder (automatic)
离散的:goodLattices => goodEncoder(自动)
discrete: goodEncoder => goodTrains (semi-automatic)
离散的:goodEncoder => goodTrains(半自动)
discrete: goodEncoder, goodTrains => goodHypotheses (automatic)
离散量:goodEncoder,goodTrains => goodHypotheses(自动)
discrete: goodEncoder => goodTests (semi-automatic)
离散的:goodEncoder => goodTests(半自动)
discrete: goodTests, goodEncoder, goodHypotheses = > (automatic)
离散的:goodTests,goodEncoder,goodHypotheses =>(自动)
continuous: => goodVerges (manual)
连续的:=> goodVerges(手动)
continuous: goodVerges => goodTrains (manual)
连续的:goodVerges => goodTrains(手动)
continuous: goodTrains => goodComplex (automatic)
连续的:goodTrains => goodComplex(自动)
continuous: goodComplex, goodTrains => goodHypotheses (automatic)
连续的:goodComplex,goodTrains => goodHypotheses(自动)
continuous: goodVerges => goodTests (manual)
连续的:goodVerges => goodTests(手动)
continuous: goodTests, goodComplex, goodHypotheses = > (automatic)
连续:goodTests,goodComplex,goodHypotheses =>(自动)
Machine Learning library has name vkf.cpython-36m-x86_64-linux-gnu.so under Linux or vkf. cp36-win32.pyd under OS Windows. (36 is a version of Python that this library was built for).
在Linux或vkf下,机器学习库的名称为vkf.cpython-36m-x86_64-linux-gnu.so。 Windows OS下的cp36-win32.pyd。 (36是为此库构建的Python版本)。
The term «automatic» means using of this library, «semi-automatic» means usage of an auxiliary library 'vkfencoder.cpython-36m-x86_64-linux-gnu.so'. Finally, the «manual» mode corresponds to extern program to process data with continuous features and are now being transferred into the vkfencoder library.
术语“自动”表示使用此库,“半自动”表示使用辅助库“ vkfencoder.cpython-36m-x86_64-linux-gnu.so”。 最后,“手动”模式对应于extern程序,以处理具有连续特征的数据,现在正被传输到vkfencoder库中。
We follow the «View/Model/Control» paradigm during web server creation.
在Web服务器创建期间,我们遵循《视图/模型/控件》范例。
Python code is distributed between 5 files:
Python代码分布在5个文件之间:
File 'app.py' has a standard form:
文件“ app.py”具有标准格式:
#! /usr/bin/env pythonimport asyncioimport jinja2import aiohttp_jinja2from settings import SITE_HOST as siteHostfrom settings import SITE_PORT as sitePortfrom aiohttp import webfrom aiojobs.aiohttp import setupfrom views import routesasync def init(loop): app = web.Application(loop=loop) # install aiojobs.aiohttp setup(app) # install jinja2 templates aiohttp_jinja2.setup(app, loader=jinja2.FileSystemLoader('./template')) # add routes from api/views.py app.router.add_routes(routes) return apploop = asyncio.get_event_loop()try: app = loop.run_until_complete(init(loop)) web.run_app(app, host=siteHost, port=sitePort)except: loop.stop()
I don't think anything needs to be explained here. The next file is 'views.py':
我认为这里不需要解释任何事情。 下一个文件是“ views.py”:
import aiohttp_jinja2from aiohttp import web#, WSMsgTypefrom aiojobs.aiohttp import spawn#, get_schedulerfrom models import Userfrom models import Expertfrom models import Experimentfrom models import Solverfrom models import Predictorroutes = web.RouteTableDef()@routes.view(r'/tests/{name}', name='test-name')class Predict(web.View): @aiohttp_jinja2.template('tests.html') async def get(self): return {'explanation': 'Please, confirm prediction!'} async def post(self): data = await self.request.post() db_name = self.request.match_info['name'] analogy = Predictor(db_name, data) await analogy.load_data() job = await spawn(self.request, analogy.make_prediction()) return await job.wait()@routes.view(r'/vkf/{name}', name='vkf-name')class Generate(web.View): #@aiohttp_jinja2.template('vkf.html') async def get(self): db_name = self.request.match_info['name'] solver = Solver(db_name) await solver.load_data() context = { 'dbname': str(solver.dbname), 'encoder': str(solver.encoder), 'lattices': str(solver.lattices), 'good_lattices': bool(solver.lattices), 'verges': str(solver.verges), 'good_verges': bool(solver.good_verges), 'complex': str(solver.complex), 'good_complex': bool(solver.good_complex), 'trains': str(solver.trains), 'good_trains': bool(solver.good_trains), 'hypotheses': str(solver.hypotheses), 'type': str(solver.type) } response = aiohttp_jinja2.render_template('vkf.html', self.request, context) return response async def post(self): data = await self.request.post() step = data.get('value') db_name = self.request.match_info['name'] if step is 'init': location = self.request.app.router['experiment-name'].url_for( name=db_name) raise web.HTTPFound(location=location) solver = Solver(db_name) await solver.load_data() if step is 'populate': job = await spawn(self.request, solver.create_tables()) return await job.wait() if step is 'compute': job = await spawn(self.request, solver.compute_tables()) return await job.wait() if step is 'generate': hypotheses_total = data.get('hypotheses_total') threads_total = data.get('threads_total') job = await spawn(self.request, solver.make_induction( hypotheses_total, threads_total)) return await job.wait() @routes.view(r'/experiment/{name}', name='experiment-name')class Prepare(web.View): @aiohttp_jinja2.template('expert.html') async def get(self): return {'explanation': 'Please, enter your data'} async def post(self): data = await self.request.post() db_name = self.request.match_info['name'] experiment = Experiment(db_name, data) job = await spawn(self.request, experiment.create_experiment()) return await job.wait()
I have shortened this file by dropping classes that serve auxiliary routes:
我通过删除提供辅助路由的类来缩短此文件:
The remaining classes correspond routes that are responsible for Machine Learning procedures:
其余类对应于负责机器学习过程的路由:
To pass a large number of parameters to a form of vkf.html the system uses aiohttp_jinja2 construction
要将大量参数传递给vkf.html形式,系统使用aiohttp_jinja2构造
response = aiohttp_jinja2.render_template('vkf.html', self.request, context)return response
Note the usage of spawn from aiojobs.aiohttp:
请注意aiojobs.aiohttp中spawn的用法:
job = await spawn(self.request, solver.make_induction(hypotheses_total, threads_total))return await job.wait()
This is necessary to safely call coroutines defined in the file 'models.py', processing user and experiment data stored in a database managed by the MariaDB DBMS:
为了安全地调用文件'models.py'中定义的协程,处理存储在MariaDB DBMS管理的数据库中的用户数据和实验数据,这是必需的:
import aiomysqlfrom aiohttp import webfrom settings import AUX_NAME as auxNamefrom settings import AUTH_TABLE as authTablefrom settings import AUX_TABLE as auxTablefrom settings import SECRET_KEY as secretKeyfrom settings import DB_HOST as dbHostfrom control import createAuxTablesfrom control import createMainTablesfrom control import computeAuxTablesfrom control import inductionfrom control import predictionclass Experiment(): def __init__(self, dbName, data, **kw): self.encoder = data.get('encoder_table') self.lattices = data.get('lattices_table') self.complex = data.get('complex_table') self.verges = data.get('verges_table') self.verges_total = data.get('verges_total') self.trains = data.get('training_table') self.tests = data.get('tests_table') self.hypotheses = data.get('hypotheses_table') self.type = data.get('type') self.auxname = auxName self.auxtable = auxTable self.dbhost = dbHost self.secret = secretKey self.dbname = dbName async def create_db(self, pool): async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute("CREATE DATABASE IF NOT EXISTS " + str(self.dbname)) await conn.commit() await createAuxTables(self) async def register_experiment(self, pool): async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "INSERT INTO " + str(self.auxname) + "." + str(self.auxtable) sql += " VALUES(NULL, '" sql += str(self.dbname) sql += "', '" sql += str(self.encoder) sql += "', 0, '" #goodEncoder sql += str(self.lattices) sql += "', 0, '" #goodLattices sql += str(self.complex) sql += "', 0, '" #goodComplex sql += str(self.verges_total) sql += "', 0, " #goodVerges sql += str(self.verges_total) sql += ", '" sql += str(self.trains) sql += "', 0, '" #goodTrains sql += str(self.tests) sql += "', 0, '" #goodTests sql += str(self.hypotheses) sql += "', 0, '" #goodHypotheses sql += str(self.type) sql += "')" await cur.execute(sql) await conn.commit() async def create_experiment(self, **kw): pool = await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret) task1 = self.create_db(pool=pool) task2 = self.register_experiment(pool=pool) tasks = [asyncio.ensure_future(task1), asyncio.ensure_future(task2)] await asyncio.gather(*tasks) pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) class Solver(): def __init__(self, dbName, **kw): self.auxname = auxName self.auxtable = auxTable self.dbhost = dbHost self.dbname = dbName self.secret = secretKey async def load_data(self, **kw): pool = await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName) async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "SELECT * FROM " sql += str(auxTable) sql += " WHERE expName='" sql += str(self.dbname) sql += "'" await cur.execute(sql) row = cur.fetchone() await cur.close() pool.close() await pool.wait_closed() self.encoder = str(row.result()[2]) self.good_encoder = bool(row.result()[3]) self.lattices = str(row.result()[4]) self.good_lattices = bool(row.result()[5]) self.complex = str(row.result()[6]) self.good_complex = bool(row.result()[7]) self.verges = str(row.result()[8]) self.good_verges = bool(row.result()[9]) self.verges_total = int(row.result()[10]) self.trains = str(row.result()[11]) self.good_trains = bool(row.result()[12]) self.hypotheses = str(row.result()[15]) self.good_hypotheses = bool(row.result()[16]) self.type = str(row.result()[17]) async def create_tables(self, **kw): await createMainTables(self) pool = await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "UPDATE " sql += str(self.auxtable) sql += " SET encoderStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) async def compute_tables(self, **kw): await computeAuxTables(self) pool = await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "UPDATE " sql += str(self.auxtable) sql += " SET complexStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/vkf/' + self.dbname) async def make_induction(self, hypotheses_total, threads_total, **kw): await induction(self, hypotheses_total, threads_total) pool = await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret, db=self.auxname) async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "UPDATE " sql += str(self.auxtable) sql += " SET hypothesesStatus=1 WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) await conn.commit() await cur.close() pool.close() await pool.wait_closed() raise web.HTTPFound(location='/tests/' + self.dbname) class Predictor(): def __init__(self, dbName, data, **kw): self.auxname = auxName self.auxtable = auxTable self.dbhost = dbHost self.dbname = dbName self.secret = secretKey self.plus = 0 self.minus = 0 async def load_data(self, **kw): pool = await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName) async with pool.acquire() as conn: async with conn.cursor() as cur: sql = "SELECT * FROM " sql += str(auxTable) sql += " WHERE dbname='" sql += str(self.dbname) sql += "'" await cur.execute(sql) row = cur.fetchone() await cur.close() pool.close() await pool.wait_closed() self.encoder = str(row.result()[2]) self.good_encoder = bool(row.result()[3]) self.complex = str(row.result()[6]) self.good_complex = bool(row.result()[7]) self.verges = str(row.result()[8]) self.trains = str(row.result()[11]) self.tests = str(row.result()[13]) self.good_tests = bool(row.result()[14]) self.hypotheses = str(row.result()[15]) self.good_hypotheses = bool(row.result()[16]) self.type = str(row.result()[17]) async def make_prediction(self, **kw): if self.good_tests and self.good_hypotheses: await induction(self, 0, 1) await prediction(self) message_body = str(self.plus) message_body += " correct positive cases. " message_body += str(self.minus) message_body += " correct negative cases." raise web.HTTPException(body=message_body) else: raise web.HTTPFound(location='/vkf/' + self.dbname)
Again some auxiliary classes are ommited:
再次省略一些辅助类:
The remaining classes correspond to the main procedures:
其余的类对应于主要过程:
It is important to use create_pool() procedure from aiomysql. It creates multiple connections to a database simulteneously. To safe termination of database communication, the system uses insure_future() and gather() procedures from the asyncio module.
使用来自aiomysql的create_pool()过程很重要。 它同时创建到数据库的多个连接。 为了安全地终止数据库通信,系统使用asyncio模块中的insure_future()和collect()过程。
pool = await aiomysql.create_pool(host=self.dbhost, user='root', password=self.secret)task1 = self.create_db(pool=pool)task2 = self.register_experiment(pool=pool)tasks = [asyncio.ensure_future(task1), asyncio.ensure_future(task2)]await asyncio.gather(*tasks)pool.close()await pool.wait_closed()
Construction row = cur.fetchone() returns future, hence row.result() corresponds to record of the table from which field values can be extracted (for example, str(row.result()[2]) extracts the table name with encoding of discrete feature values).
构造row = cur.fetchone()返回future,因此row.result()对应于可以从中提取字段值的表的记录(例如str(row.result()[2])提取表名,其中离散特征值的编码)。
pool = await aiomysql.create_pool(host=dbHost, user='root', password=secretKey, db=auxName)async with pool.acquire() as conn: async with conn.cursor() as cur: await cur.execute(sql) row = cur.fetchone() await cur.close()pool.close()await pool.wait_closed()self.encoder = str(row.result()[2])
Key system parameters are imported from file '.env' or (if it absent) from file 'settings.py' directly.
关键系统参数是从文件“ .env”导入的,或者(如果不存在)从文件“ settings.py”导入的。
from os.path import isfilefrom envparse import envif isfile('.env'): env.read_envfile('.env')AUX_NAME = env.str('AUX_NAME', default='vkf')AUTH_TABLE = env.str('AUTH_TABLE', default='users')AUX_TABLE = env.str('AUX_TABLE', default='experiments')DB_HOST = env.str('DB_HOST', default='127.0.0.1')DB_HOST = env.str('DB_PORT', default=3306)DEBUG = env.bool('DEBUG', default=False)SECRET_KEY = env.str('SECRET_KEY', default='toor')SITE_HOST = env.str('HOST', default='127.0.0.1')SITE_PORT = env.int('PORT', default=8080)
It is important to note that localhost must be specified by ip address, otherwise aiomysql will try to connect to the database via a Unix socket, which may not work under OS Windows.
重要的是要注意,必须通过ip地址指定本地主机,否则aiomysql将尝试通过Unix套接字连接到数据库,该套接字在OS Windows下可能不起作用。
Finally, file 'control.py' has a form:
最后,文件“ control.py”具有以下形式:
import osimport asyncioimport vkfasync def createAuxTables(db_data): if db_data.type is not "discrete": await vkf.CAttributes(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": await vkf.DAttributes(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.Lattices(db_data.lattices, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async def createMainTables(db_data): if db_data.type is "continuous": await vkf.CData(db_data.trains, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.CData(db_data.tests, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await vkf.FCA(db_data.lattices, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.DData(db_data.trains, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.DData(db_data.tests, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await vkf.FCA(db_data.lattices, db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.FData(db_data.trains, db_data.encoder, db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.FData(db_data.tests, db_data.encoder, db_data.verges, db_data.dbname,'127.0.0.1', 'root', db_data.secret)async def computeAuxTables(db_data): if db_data.type is not "discrete": async with vkf.Join(db_data.trains, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as join: await join.compute_save(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) await vkf.Generator(db_data.complex, db_data.trains, db_data.verges, db_data.dbname, db_data.dbname, db_data.verges_total, 1, '127.0.0.1', 'root', db_data.secret)async def induction(db_data, hypothesesNumber, threadsNumber): if db_data.type is not "discrete": qualifier = await vkf.Qualifier(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) beget = await vkf.Beget(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": encoder = await vkf.Encoder(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async with vkf.Induction() as induction: if db_data.type is "continuous": await induction.load_continuous_hypotheses(qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.load_discrete_hypotheses(encoder, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.load_full_hypotheses(encoder, qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if hypothesesNumber > 0: await induction.add_hypotheses(hypothesesNumber, threadsNumber) if db_data.type is "continuous": await induction.save_continuous_hypotheses(qualifier, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.save_discrete_hypotheses(encoder, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.save_full_hypotheses(encoder, qualifier, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret)async def prediction(db_data): if db_data.type is not "discrete": qualifier = await vkf.Qualifier(db_data.verges, db_data.dbname, '127.0.0.1', 'root', db_data.secret) beget = await vkf.Beget(db_data.complex, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is not "continuous": encoder = await vkf.Encoder(db_data.encoder, db_data.dbname, '127.0.0.1', 'root', db_data.secret) async with vkf.Induction() as induction: if db_data.type is "continuous": await induction.load_continuous_hypotheses(qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "discrete": await induction.load_discrete_hypotheses(encoder, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "full": await induction.load_full_hypotheses(encoder, qualifier, beget, db_data.trains, db_data.hypotheses, db_data.dbname, '127.0.0.1', 'root', db_data.secret) if db_data.type is "continuous": async with vkf.TestSample(qualifier, induction, beget, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus = await tests.correct_positive_cases() db_data.plus = await tests.correct_positive_cases() #minus = await tests.correct_negative_cases() db_data.minus = await tests.correct_negative_cases() if db_data.type is "discrete": async with vkf.TestSample(encoder, induction, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus = await tests.correct_positive_cases() db_data.plus = await tests.correct_positive_cases() #minus = await tests.correct_negative_cases() db_data.minus = await tests.correct_negative_cases() if db_data.type is "full": async with vkf.TestSample(encoder, qualifier, induction, beget, db_data.tests, db_data.dbname, '127.0.0.1', 'root', db_data.secret) as tests: #plus = await tests.correct_positive_cases() db_data.plus = await tests.correct_positive_cases() #minus = await tests.correct_negative_cases() db_data.minus = await tests.correct_negative_cases()
I retain this file, since here you can see the names of and call order of arguments of the VKF method procedures from the library 'vkf.cpython-36m-x86_64-linux-gnu.so'. All arguments after dbname can be omitted, since the default values in the CPython library are set to standard values.
我保留了此文件,因为在这里您可以从库“ vkf.cpython-36m-x86_64-linux-gnu.so”中看到VKF方法过程的名称和调用顺序。 dbname之后的所有参数都可以省略,因为CPython库中的默认值设置为标准值。
Anticipating the question of professional programmers about why the logic of controlling the VKF experiment is brought out (through numerous if), and not hidden through polymorphism in types, we should answer this: unfortunately, dynamic typing of the Python language does not allow you to remove the decision about the type of object used, that is, in any case, this sequence of nested if will occur. Therefore, the author preferred to use explicit (C-like) syntax to make the logic as transparent (and efficient) as possible.
预见到专业程序员的问题,为什么控制VKF实验的逻辑为什么会被带出(如果有的话),而不是通过类型的多态性隐藏起来,我们应该回答:不幸的是,Python语言的动态类型不允许您删除有关使用的对象类型的决定,即在任何情况下都将发生此嵌套嵌套序列。 因此,作者更喜欢使用显式(类似于C的)语法,以使逻辑尽可能透明(且高效)。
Let me comment on the missing components:
让我对缺少的组件发表评论:
The author has been engaged in data mining for more than 30 years. After graduating Mathematics Department of Lomonosov Moscow State University, he was invited to a group of researchers under the leadership of Professor Victor K. Finn (VINITI SSSR Academy of Science). Viktor K. Finn has been researching plausible reasoning and its formalization by means of multi-valued logics since the early 80's of the last century.
作者从事数据挖掘已超过30年。 莫斯科国立罗蒙诺索夫大学数学系毕业后,他受邀参加由Victor K. Finn教授(VINITI SSSR科学院)领导的一组研究人员。 自上世纪80年代初以来,Viktor K. Finn一直在通过多值逻辑研究合理的推理及其形式化。
The key ideas proposed by V. K. Finn are the following:
VK Finn提出的关键思想如下:
It should be noted that V. K. Finn attributes some of his ideas to foreign authors. Perhaps only the logic of argumentation is rightfully considered to have been invented by himself. The idea of accounting for counter-examples V.K. Finn borrowed, according to him, from K.R. Popper. The origins of verification of the completeness of inductive generalization were (completely obscure, in my opinion) works of the American mathematician and logician C.S. Peirce. He considers the generation of hypotheses about causes using the similarity operation to be borrowed from the ideas of the British economist, philosopher and logician J.S. Mill. Therefore, he created a set of ideas called «JSM-method» in honor of J.S. Mill.
应当指出,VK Finn将他的某些观点归功于外国作家。 也许只有论辩的逻辑才被认为是他本人发明的。 据他介绍,反例会计的想法VK Finn向KR Popper借来了。 归纳概括的完整性的验证起源(在我看来,是完全模糊的)是美国数学家和逻辑学家CS皮尔斯的作品。 他认为使用相似性运算生成关于原因的假设是从英国经济学家,哲学家和逻辑学家JS Mill的思想中借鉴而来的。 因此,为了纪念JS Mill,他提出了一套名为“ JSM方法”的想法。
Strange, but much more useful ideas of Professor Rudolf Wille (Germany) which are appeared in the late 70-ies of the XX century and form a modern part of algebraic Lattice Theory (so-called Formal Concept Analysis, FCA) are not respected by Professor V.K. Finn. In my opinion, the reason for this is its unfortunate name, hence it is rejected by a person who graduated first from the faculty of philosophy, and then from the Engineer requalification program at Mathematics Department of Lomonosov Moscow State University.
鲁道夫·威尔(Rudolf Wille)教授(德国)提出的奇怪但有用得多的想法,在20世纪70年代后期出现,并形成了代数格理论的现代部分(所谓的形式概念分析,FCA),并未受到尊重。 VK Finn教授。 我认为,这样做的原因是它的名字很不幸,因此被首先从哲学系毕业,然后从罗蒙诺索夫莫斯科国立大学数学系的工程师资格认证课程毕业的人拒绝了。
As a continuation of the work of his teacher, the author named his approach «VKF-method» in his honor. However, in Russian language there is another interpretation — a probabilistic-combinatorial formal ('veroyatnostno kombinatnyi formalnyi') method of Machine Learning based on Lattice Theory.
作为老师工作的延续,作者以他的方式命名为“ VKF方法”。 但是,在俄语中还有另一种解释-基于格理论的机器学习的概率组合形式(“ veroyatnostno kombinatnyiformalnyi”)方法。
Now V. K. Finn's group works at Dorodnitsyn Computing Center of Russian Academy of Sciences and at Intelligent Systems Department of Russian State University for Humanities (RSUH).
现在,VK Finn的小组在俄罗斯科学院Dorodnitsyn计算中心和俄罗斯国立人文大学(RSUH)智能系统系工作。
For more information about the mathematics of the VKF-solver, see or his (the author is grateful to A. B. Verevkin and N. G. Baranets for organizing lectures and processing their recordings).
有关VKF解算器数学的更多信息,请参见或他 (作者感谢AB Verevkin和NG Baranets组织讲座和处理其录音)。
The full package of source files is stored on .
完整的源文件包存储在 。
Source files (in C++) for the vkf library are in the process of being approved for placement on savannah.nongnu.org. If the decision is positive, the download link will be added here.
vkf库的源文件(在C ++中)正在批准中,可放置在savannah.nongnu.org上。 如果决定是肯定的,则将在此处添加下载链接。
Finally, a final note: the author started learning Python on April 6, 2020. Prior to this, the only language he programmed in was C++. But this fact does not absolve him of charges of possible inaccuracy of the code.
最后,最后一点:作者于2020年4月6日开始学习Python。在此之前,他编程的唯一语言是C ++。 但是,这一事实并不能免除他对代码可能不正确的指控。
The author expresses his heartfelt gratitude to Tatyana A. Volkova for her support, constructive suggestions, and critical comments that made it possible to significantly improve the presentation (and even significantly simplfy the code). However, the author is solely responsible for the remaining errors and decisions made (even against her advice).
作者对Tatyana A. 的支持,建设性建议和批评性意见表示由衷的感谢, 支持,有建设性的建议和批评性评论使我们可以大大改善演示文稿(甚至大大简化代码)。 但是,作者对其余的错误和做出的决定(即使违反她的建议)承担全部责任。
翻译自:
转载地址:http://xadwd.baihongyu.com/