Meta Learning
This is a part of the series of blog posts related to the automated creation of Machine Learning Models, and Datasets used for training Neural Networks, and Model Agnostic Meta-Learning. If you are interested in the background of the story, you may scroll to the bottom of the post to get the links to previous blog posts. You may also head to Use SERP Data to Build Machine Learning Models page to get a clear idea of what kind of Automated Machine Learning Models you can create, or how to utilize them for meta-learning.
In previous weeks, I have showcased an example of a form to create machine learning algorithms. It was possible via storing hyperparameter meta-data of machine learning algorithms. This week, I will explain how to store the meta-data of every machine learning training and testing process to pave the way for meta-learning algorithms.
What is a Meta-Learning task?
Meta-learning is using the meta-data of previously acquired and categorized learning processes to tackle new learning tasks not encountered before by the machine learning algorithm. A meta-learning task is learning to learn.
Deep learning models today can’t do a variety of different tasks. For example, I have showcased an image classifier in the previous weeks. Although the adjustment of hyperparameters was easy for that specific task, I didn’t produce an effective image classification model. It was taking my time, and I just showcased how to make the process more automated. Even if I created convolutional neural networks with good optimization, I would still face the problem of the limited scope of objects to be classified. I would also resolve it by using reinforcement learning, exposing the model to constant training, and fetching its iterations whenever I need them. But it wouldn’t solve the issue of how to deal with objects of a close resemblance. I would use a previous subset of the deep learning training process to gain knowledge on the issue and apply a formula for that within the training process.
This list goes on and on. But it doesn’t resolve one issue. How can I spend less time tweaking the model? This is where meta-learning comes in as a savior. Meta learning approaches such as learning problems can be applied to reduce the time required to create a model drastically.
But, how? We would need to dive into how we as humans perceive such problems in the first place in order to apply it to meta-learning. For example; how do we know a machine learning training process has gone bad? By observing the loss function. The reason we look at the loss function is that it is a derived meta-data of how well deep neural networks perform in a specific task. How do we know that for sure? The results will give us statistically higher rates of prediction. This knowledge alone is sufficient to provide us with insight into how optimization-based meta-learning techniques should operate, by observing the effectiveness of the optimization on the loss function.
It is also essential to store the deep learning training process, and its subsequent accuracy results in an object in order to make a cross-comparison using meta-learning methods.
What is necessary for Meta-Learning?
In my humble opinion, alongside the general consensus, I have some other necessities I see that will be seen as a requirement for meta-learning.
Let me break down three common meta-learning approaches to see what they need:
Model Based Meta-Learning: Proposed models use internal or external memory of a machine learning process in order to achieve better learning. Meaning, that if you have a dog to classify, if you have classified other dogs, you may use the model you previously used to achieve that goal with ease automatically. The downside of concentrated meta-learning approaches of this kind is the necessity to tag objects.
Metrics Based Meta-Learning: Proposed models use different metrics to decide whether or not learning tasks are similar in their process. If you have to classify between a human being and a bird, you may use the previously acquired Mammal and Avian classification meta-data to achieve good results. However, the downside of this meta-learning type is the inclusion of a bat. A Mammal that looks like an Avian.
Optimization Based Meta-Learning: Proposed model use the metadata of optimization hyperparameters of a previously acquired deep learning training to maximize the outcome with meta-learning. A pure meta-learning approach of this kind will need a highly intensive machine learning process.
Now, to counter the weakness of Model Based Meta-Learning process, I have already suggested a solution in one of my old blog posts. I used SerpApi’s Google Images Scraper API to scrape images with a specific tag using the chips parameter to create datasets at scale (also only images with specific size to automate preprocessing).
I don’t have a full solution for countering the weakness of Metrics Based Meta-Learning. However, I have witnessed that many Search Engines move towards enriching their knowledge graphs, answer boxes, and related search items such as related questions, related searches, etc. which could help resemble the connection between new tasks. But, of course, this is a vague idea. You may take a look at the documentation with examples for SerpApi’s Google Knowledge Graph Scraper API, SerpApi’s Google Answer Box Scraper API, and other related documentation to get a better idea about how to utilize them in meta-learning. You may also Register to Claim Free Credits.
I don’t have a solution for countering the weakness of Optimization Based Meta-Learning as well. However, this week I will show how to store the machine learning training process with asynchronous calls which is vital for this kind of meta-learning approach. Asynchronous Processing in Computer Science refers to the distribution of tasks that run alongside while not affecting each other’s progress. It saves us from waiting for the training process to be over and running multiple calls in this context.
It is vital to have good machine learning frameworks that can store training examples and compare them in either external objects, or data points, and store meta-data of the training data for future use. It is also useful to utilize SGD(Stochastic Gradient Descent), RNN(Recurrent Neural Networks), Regression, Few-Shot Learning, etc. in one place with a generalized syntax to achieve Transfer learning with MAML (Model Agnostic Meta Learning). The aim of these blog post series is to achieve at least a percentage of what is proposed in this blog post. Once it is open-source on SerpApi’s Github Page, I hope my mistakes (especially on the front-end) will be covered with the help of others. Much like best performing programmers in the real-world, the aim is to have minimal need for customization when training a model for a specific problem, and also make the trained model with the ability to perform multi-task operations. Of course, at the initialization, I don’t expect a supervised learning model that can do multiple classification tasks to write a poem. But the ability to do meta-training, at least by human observation, cross-comparing benchmarks of different model parameters is an exciting step for someone who is trying to acquire new skills like me.
Storing Machine Learning Models
I have created an Attempt item to be stored in the storage server under the models scope:
1
2
3
4
5
6
7
8
9
10
class Attempt(BaseModel):
id: int | None = None
name: str | None = None
training_commands: dict = {}
training_losses: list = []
n_epoch: int = 0
testing_commands: dict = {}
accuracy: float = 0.0
status: str = "incomplete"
limit: int = 0
It houses a unique id to be used in the following weeks to call a training process, a name for the file name of the model, training commands as a dictionary we use to trigger a training, and training losses for observing the state of the training at each backpropagation, the number of epochs for creating a live visual graphic, testing commands for the dictionary that triggers the testing process, accuracy for storing the accuracy of the model, status to observe its state and limit for the limit used in the testing process.
Let’s initialize the class to communicate with the models database:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class ModelsDatabase:
def __init__(self):
username = "<Storage Server Username>"
password = "<Storage Server Password>"
bucket_name = "images"
auth = PasswordAuthenticator(
username,
password
)
timeout_opts = ClusterTimeoutOptions(kv_timeout=timedelta(seconds=10))
self.cluster = Cluster('couchbase://localhost', ClusterOptions(auth, timeout_options=timeout_opts))
self.cluster.wait_until_ready(timedelta(seconds=5))
cb = self.cluster.bucket(bucket_name)
self.cb_coll = cb.scope("model").collection("attempt")
def insert_attempt(self, doc: Attempt):
doc = doc.dict()
print("\nInsert CAS: ")
try:
key = doc["name"]
result = self.cb_coll.insert(key, doc)
print(result.cas)
except Exception as e:
print(e)
def get_attempt_by_name(self, name):
try:
sql_query = 'SELECT attempt FROM `images`.model.attempt WHERE name = $1'
row_iter = self.cluster.query(
sql_query,
QueryOptions(positional_parameters=[name]))
rows_arr = []
for row in row_iter:
rows_arr.append(row)
return rows_arr[0]['attempt']
except Exception as e:
print(e)
def get_attempt_by_id(self, id):
try:
sql_query = 'SELECT attempt FROM `images`.model.attempt WHERE id = $1'
row_iter = self.cluster.query(
sql_query,
QueryOptions(positional_parameters=[id]))
rows_arr = []
for row in row_iter:
rows_arr.append(row)
return rows_arr[0]['attempt']
except Exception as e:
print(e)
def update_attempt(self, doc: Attempt):
try:
key = doc.name
result = self.cb_coll.upsert(key, doc.dict())
except Exception as e:
print(e)
def get_latest_index(self):
try:
sql_query = 'SELECT COUNT(*) as latest_index FROM `images`.model.attempt'
row_iter = self.cluster.query(
sql_query,
QueryOptions())
for row in row_iter:
return row['latest_index']
except Exception as e:
print(e)
Also some helper endpoints in the main file:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
@app.post("/create_attempt")
def create_attempt(a: Attempt):
db = ModelsDatabase()
db.insert_attempt(a)
return {"status": "Success"}
@app.post("/find_attempt/")
def find_attempt(name: str):
db = ModelsDatabase()
attempt = db.get_attempt_by_name(name)
return attempt
@app.post("/update_attempt")
def update_attempt(a: Attempt):
db = ModelsDatabase()
db.update_attempt(a)
return {"status": "Success"}
@app.post("/latest_attempt_index/")
def return_index():
db = ModelsDatabase()
index = db.get_latest_index()
return {"status": index}
Let’s update the training endpoint to create a model object for us in the database:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
@app.post("/train/")
async def train(tc: TrainCommands, background_tasks: BackgroundTasks):
def background_training(tc):
if 'name' in tc.model and tc.model['name'] != "":
model = eval(tc.model['name'])
else:
model = CustomModel
try:
a = find_attempt(name = tc.model_name)
a["status"] = "Training"
a["training_losses"] = []
a = Attempt(**a)
update_attempt(a)
index = a.id
except:
index = return_index()['status']
a = Attempt(name=tc.model_name, training_commands = tc.dict(), status = "Training", n_epoch=tc.n_epoch, id=index)
create_attempt(a=a)
trainer = Train(tc, model, CustomImageDataLoader, CustomImageDataset, ImagesDataBase)
trainer.train()
model = None
try:
torch.cuda.empty_cache()
except:
pass
background_tasks.add_task(background_training, tc)
return {"status": "Complete"}
Let’s collect our losses in the training process (also lr scheduler for gradient steps):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
def train(self):
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
Epoch = [x for x in range(0,self.n_epoch)]
Loss = [0] * self.n_epoch
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(self.optimizer, 'min')
for epoch in range(self.n_epoch):
running_loss = 0.0
inputs, labels = self.loader.iterate_training()
inputs, labels = inputs.to(device), labels.to(device)
self.optimizer.zero_grad()
if torch.cuda.is_available():
self.model.cuda()
outputs = self.model(inputs).to(device)
else:
outputs = self.model(inputs)
loss = self.criterion(outputs, labels.squeeze())
loss.backward()
self.optimizer.step()
running_loss = running_loss + loss.item()
scheduler.step(running_loss)
from main import find_attempt, update_attempt
a = find_attempt(name = self.model_name)
a['training_losses'].append(running_loss)
a = Attempt(**a)
update_attempt(a)
if epoch % 5 == 4:
print(f'[Epoch: {epoch + 1}, Progress: {((epoch+1)*100/self.n_epoch):.3f}%] loss: {running_loss:.6f}')
running_loss = 0.0
torch.save(self.model.state_dict(), "models/{}.pt".format(self.model_name))
Another update on the testing process to be asynchronous, and also in communication with the storage server:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
@app.post("/test/")
async def test(tc: TestCommands, background_tasks: BackgroundTasks):
def background_testing(tc):
if 'name' in tc.model and tc.model['name'] != "":
model = eval(tc.model['name'])
else:
model = CustomModel
try:
a = find_attempt(name = tc.model_name)
a["testing_commands"] = tc.dict()
a["status"] = "Testing"
a = Attempt(**a)
update_attempt(a)
except:
return {"status": "No Model Attempt by that Name"}
tester = Test(tc, CustomImageDataset, ImagesDataBase, model)
accuracy = tester.test_accuracy()
a = find_attempt(name = tc.model_name)
a["accuracy"] = accuracy
a["status"] = "Complete"
a = Attempt(**a)
update_attempt(a)
model = None
try:
torch.cuda.empty_cache()
except:
pass
background_tasks.add_task(background_testing, tc)
return {"status": "Success"}
Here is the resulting Storage Item from the training:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
{
"id": 4,
"name": "american_dog_species_3",
"training_commands": {
"batch_size": 4,
"criterion": {
"name": "CrossEntropyLoss"
},
"image_ops": [{
"resize": {
"resample": "Image.ANTIALIAS",
"size": [500, 500]
}
}, {
"convert": {
"mode": "'RGB'"
}
}],
"label_names": ["American Hairless Terrier imagesize:500x500", "Alaskan Malamute imagesize:500x500", "American Eskimo Dog imagesize:500x500", "Australian Shepherd imagesize:500x500", "Boston Terrier imagesize:500x500", "Boykin Spaniel imagesize:500x500", "Chesapeake Bay Retriever imagesize:500x500", "Catahoula Leopard Dog imagesize:500x500", "Toy Fox Terrier imagesize:500x500"],
"model": {
"layers": [{
"in_channels": 3,
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 6
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"in_channels": "auto",
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 16
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"in_channels": "auto",
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 32
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"name": "Flatten",
"start_dim": 1
}, {
"in_features": 111392,
"name": "Linear",
"out_features": 120
}, {
"inplace": true,
"name": "ReLU"
}, {
"in_features": "auto",
"name": "Linear",
"out_features": 84
}, {
"inplace": true,
"name": "ReLU"
}, {
"in_features": "auto",
"name": "Linear",
"out_features": "n_labels"
}],
"name": ""
},
"model_name": "american_dog_species_3",
"n_epoch": 100,
"n_labels": 0,
"optimizer": {
"lr": 0.001,
"momentum": 0.9,
"name": "SGD"
},
"target_transform": {
"ToTensor": true
},
"transform": {
"Normalize": {
"mean": [0.5, 0.5, 0.5],
"std": [0.5, 0.5, 0.5]
},
"ToTensor": true
}
},
"training_losses": [2.1530826091766357, 2.2155375480651855, 2.212409019470215, 2.171882152557373, 2.193148374557495, 2.174982786178589, 2.2089200019836426, 2.166707992553711, 2.1700942516326904, 2.196320056915283, 2.228410243988037, 2.2278425693511963, 2.1531643867492676, 2.1904003620147705, 2.1973652839660645, 2.1950249671936035, 2.1686930656433105, 2.182337999343872, 2.2186434268951416, 2.2066121101379395, 2.172186851501465, 2.217101573944092, 2.2250301837921143, 2.22577166557312, 2.2089788913726807, 2.1954753398895264, 2.19649338722229, 2.1682443618774414, 2.2124178409576416, 2.1765542030334473, 2.15944766998291, 2.2267537117004395, 2.1671102046966553, 2.218825101852417, 2.2200405597686768, 2.1963484287261963, 2.199852705001831, 2.2375543117523193, 2.1804018020629883, 2.2097158432006836, 2.1749439239501953, 2.213040351867676, 2.2149901390075684, 2.1947004795074463, 2.164980411529541, 2.1940670013427734, 2.229835033416748, 2.2061691284179688, 2.2089390754699707, 2.207270622253418, 2.235719680786133, 2.185238838195801, 2.222529411315918, 2.1917202472686768, 2.214961528778076, 2.181013584136963, 2.2280330657958984, 2.2193360328674316, 2.2151079177856445, 2.1822409629821777, 2.181617498397827, 2.213880777359009, 2.2002997398376465, 2.221768379211426, 2.1861824989318848, 2.191596508026123, 2.2087886333465576, 2.1659762859344482, 2.1675500869750977, 2.1987595558166504, 2.2219362258911133, 2.2185418605804443, 2.2019474506378174, 2.2085072994232178, 2.168557643890381, 2.1841750144958496, 2.206641674041748, 2.165733814239502, 2.193709373474121, 2.2362961769104004, 2.1809918880462646, 2.1982641220092773, 2.237257242202759, 2.2146575450897217, 2.197037935256958, 2.193465232849121, 2.1990575790405273, 2.193073272705078, 2.2431421279907227, 2.204183578491211, 2.235936164855957, 2.221945285797119, 2.185289144515991, 2.1666038036346436, 2.1959757804870605, 2.171337604522705, 2.1832592487335205, 2.2154834270477295, 2.168503761291504, 2.2134923934936523],
"n_epoch": 100,
"testing_commands": {
"criterion": {
"name": "CrossEntropyLoss"
},
"ids": [],
"image_ops": [{
"resize": {
"resample": "Image.ANTIALIAS",
"size": [500, 500]
}
}, {
"convert": {
"mode": "'RGB'"
}
}],
"label_names": ["American Hairless Terrier imagesize:500x500", "Alaskan Malamute imagesize:500x500", "American Eskimo Dog imagesize:500x500", "Australian Shepherd imagesize:500x500", "Boston Terrier imagesize:500x500", "Boykin Spaniel imagesize:500x500", "Chesapeake Bay Retriever imagesize:500x500", "Catahoula Leopard Dog imagesize:500x500", "Toy Fox Terrier imagesize:500x500"],
"limit": 200,
"model": {
"layers": [{
"in_channels": 3,
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 6
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"in_channels": "auto",
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 16
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"in_channels": "auto",
"kernel_size": 5,
"name": "Conv2d",
"out_channels": 32
}, {
"inplace": true,
"name": "ReLU"
}, {
"kernel_size": 2,
"name": "MaxPool2d",
"stride": 2
}, {
"name": "Flatten",
"start_dim": 1
}, {
"in_features": 111392,
"name": "Linear",
"out_features": 120
}, {
"inplace": true,
"name": "ReLU"
}, {
"in_features": "auto",
"name": "Linear",
"out_features": 84
}, {
"inplace": true,
"name": "ReLU"
}, {
"in_features": "auto",
"name": "Linear",
"out_features": "n_labels"
}],
"name": ""
},
"model_name": "american_dog_species_3",
"n_labels": 0,
"target_transform": {
"ToTensor": true
},
"transform": {
"Normalize": {
"mean": [0.5, 0.5, 0.5],
"std": [0.5, 0.5, 0.5]
},
"ToTensor": true
}
},
"accuracy": 0.16500000000000006,
"status": "Complete",
"limit": 0
}
Conclusion
I am grateful to the reader for their attention and Brilliant People of SerpApi for making this blog post possible. I would like to share another opinion in this writing. I find the works of ICLR, ICML, and several publishings in arxiv fascinating (I plan to share my opinions on them in the following weeks). But I also see an important need for open source projects on meta-learning for fast adaptation of deep networks. I see no contradiction between professionals, and enthusiasts at this stage of the progress we are taking in artificial intelligence. I believe we can achieve levine speeds in humanity’s achievements if we can progress collectively in meta-learning. Whether it be, the automatic provision of datasets, or automatic optimization of deep learning, meta-learning has many points everyone can contribute. Keeping aside from the hype and generalization around the word meta-learning, or learning to learn, the subject is fascinatingly interesting and deeply logical in nature. I would love to wake up to a world where a model with multi task ability is still learning tasks by itself. Even the primitive version of such an achievement is exciting.
Originally published at https://serpapi.com on August 12, 2022.