{"id":1013329,"date":"2021-08-13T12:00:32","date_gmt":"2021-08-13T16:00:32","guid":{"rendered":"https:\/\/www.kdnuggets.com\/?p=131194"},"modified":"2021-08-13T12:00:32","modified_gmt":"2021-08-13T16:00:32","slug":"how-to-train-a-bert-model-from-scratch","status":"publish","type":"station","link":"https:\/\/platodata.io\/plato-data\/how-to-train-a-bert-model-from-scratch\/","title":{"rendered":"How to Train a BERT Model From Scratch"},"content":{"rendered":"\n<div id=\"post-header\">\n<h1 id=\"title\">How to Train a BERT Model From Scratch<\/h1>\n<div class=\"pagi\">\n<div class=\"pagi-left\">\n<a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/querying-granular-demographic-dataset.html\" rel=\"prev\">= <strong>Previous post<\/strong><\/a><\/div>\n<div class=\"pagi-right\">\n<\/div>\n<p>&nbsp;<br \/>&nbsp; <\/p>\n<div class=\"addthis_native_toolbox\"><\/div><\/div>\n<div class=\"tag-data\">Tags: <a href=\"https:\/\/www.kdnuggets.com\/tag\/bert\" rel=\"tag\">BERT<\/a>, <a href=\"https:\/\/www.kdnuggets.com\/tag\/huggingface\" rel=\"tag\">Hugging Face<\/a>, <a href=\"https:\/\/www.kdnuggets.com\/tag\/nlp\" rel=\"tag\">NLP<\/a>, <a href=\"https:\/\/www.kdnuggets.com\/tag\/python\" rel=\"tag\">Python<\/a>, <a href=\"https:\/\/www.kdnuggets.com\/tag\/training\" rel=\"tag\">Training<\/a><\/div>\n<p> <\/p>\n<p class=\"excerpt\"> Meet BERT\u2019s Italian cousin, FiliBERTo. <\/p>\n<\/div>\n<div id=\"post-header-ad\"> <\/div>\n<hr class=\"grey-line\"> <\/p>\n<div id=\"post-\" class=\"post\"> <!-- post_author James Briggs --> <\/p>\n<div align=\"right\"><a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/train-bert-model-scratch.html#comments\">comments<\/a><\/div>\n<p><b>By <a href=\"https:\/\/youtube.com\/c\/JamesBriggs\" target=\"_blank\" rel=\"noopener\">James Briggs<\/a>, Data Scientist<\/b><\/p>\n<p><center><br \/>\n<img decoding=\"async\" alt src=\"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/wp-content\/uploads\/2021\/08\/how-to-train-a-bert-model-from-scratch.jpg\" width=\"100%\"><br \/>\n<span>BERT, but in Italy \u2014 image by author<\/span><br \/>\n<\/center><br \/>\n&nbsp;<\/p>\n<p>Many of my articles have been focused on BERT \u2014 the model that came and dominated the world of natural language processing (NLP) and marked a new age for language models.<\/p>\n<p>For those of you that may not have used transformers models (eg what BERT is) before, the process looks a little like this:<\/p>\n<ul>\n<li><code>pip install transformers<\/code>\n<\/li>\n<li>Initialize a pre-trained transformers model \u2014&nbsp;<code>from_pretrained<\/code>.\n<\/li>\n<li>Test it on some data.\n<\/li>\n<li><em>Maybe<\/em>&nbsp;fine-tune the model (train it some more).\n<\/li>\n<\/ul>\n<p>Now, this is a great approach, but if we only ever do this, we lack the understanding behind creating our own transformers models.<\/p>\n<p>And, if we cannot create&nbsp;our own transformer models \u2014 we must rely on there being a pre-trained model that fits our problem, this is not always the case:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" alt src=\"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/wp-content\/uploads\/2021\/08\/how-to-train-a-bert-model-from-scratch-1.jpg\" width=\"80%\"><br \/>\n<span>A few comments asking about non-English BERT models<\/span><br \/>\n<\/center><br \/>\n&nbsp;<\/p>\n<p>So in this article, we will explore the steps we must take to build our own transformer model \u2014 specifically a further developed version of BERT, called RoBERTa.<\/p>\n<h2>An Overview<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nThere are a few steps to the process, so before we dive in let\u2019s first summarize what we need to do. In total, there are four key parts:<\/p>\n<ul>\n<li>Getting the data\n<\/li>\n<li>Building a tokenizer\n<\/li>\n<li>Creating an input pipeline\n<\/li>\n<li>Training the model\n<\/li>\n<\/ul>\n<p>Once we have worked through each of these sections, we will take the tokenizer and model we have built \u2014 and save them both so that we can then use them in the same way we usually would with&nbsp;<code>from_pretrained<\/code>.<\/p>\n<h2>Getting The Data<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nAs with any machine learning project, we need data. In terms of data for training a transformer model, we really are spoilt for choice \u2014 we can use almost any text data.<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" width=\"660\" height=\"383\" src=\"https:\/\/www.youtube.com\/embed\/GhGUZrcB-WM\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><br \/>\n<span>Video walkthrough for downloading OSCAR dataset using HuggingFace\u2019s datasets library<\/span><br \/>\n<\/center><br \/>\n&nbsp;<\/p>\n<p>And, if there\u2019s one thing that we have plenty of on the internet \u2014 it\u2019s unstructured text data.<\/p>\n<p>One of the largest datasets in the domain of text scraped from the internet is the OSCAR dataset.<\/p>\n<p>The OSCAR dataset boasts a huge number of different languages \u2014 and one of the clearest use-cases for training from scratch is so that we can apply BERT to some less commonly used languages, such as Telugu or Navajo.<\/p>\n<p>Unfortunately, the only language I can speak with any degree of competency is English \u2014 but my girlfriend is Italian, and so she \u2014 Laura, will be assessing the results of our Italian-speaking BERT model \u2014 FiliBERTo.<\/p>\n<p>So, to download the Italian segment of the OSCAR dataset we will be using HuggingFace\u2019s&nbsp;<code>datasets<\/code>&nbsp;library \u2014 which we can install with&nbsp;<code>pip install datasets<\/code>. Then we download OSCAR_IT with:<\/p>\n<p>Let\u2019s take a look at the&nbsp;<code>dataset<\/code>&nbsp;object.<\/p>\n<p>Great, now let\u2019s store our data in a format that we can use when building our tokenizer. We need to create a set of plaintext files containing just the&nbsp;<code>text<\/code>&nbsp;feature from our dataset, and we will split each&nbsp;<em>sample<\/em>&nbsp;using a newline&nbsp;<code>n<\/code>.<\/p>\n<p>Over in our&nbsp;<code>data\/text\/oscar_it<\/code>&nbsp;directory we will find:<\/p>\n<p><center><br \/>\n<img decoding=\"async\" alt=\"A screenshot displaying a Windows explorer window full of&nbsp;.txt files\u200a\u2014\u200arepresenting the plaintext OSCAR data\" src=\"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/wp-content\/uploads\/2021\/08\/how-to-train-a-bert-model-from-scratch-2.jpg\" width=\"100%\"><br \/>\n<span>The directory containing our plaintext OSCAR files<\/span><br \/>\n<\/center><br \/>\n&nbsp; <\/p>\n<h2>Building a Tokenizer<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nNext up is the tokenizer! When using transformers we typically load a tokenizer, alongside its respective transformer model \u2014 the tokenizer is a key component in the process.<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" width=\"660\" height=\"383\" src=\"https:\/\/www.youtube.com\/embed\/JIeAB8vvBQo\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><br \/>\n<span>Video walkthrough for building our custom tokenizer<\/span><br \/>\n<\/center><br \/>\n&nbsp;<\/p>\n<p>When building our tokenizer we will feed it all of our OSCAR data, specify our vocabulary size (number of tokens in the tokenizer), and any special tokens.<\/p>\n<p>Now, the RoBERTa special tokens look like this:<\/p>\n<p>So, we make sure to include them within the&nbsp;<code>special_tokens<\/code>&nbsp;parameter of our tokenizer\u2019s&nbsp;<code>train<\/code>&nbsp;method call.<\/p>\n<p>Our tokenizer is now ready, and we can save it file for later use:<\/p>\n<p>Now we have two files that define our new&nbsp;<em>FiliBERTo&nbsp;<\/em>tokenizer:<\/p>\n<ul>\n<li><em>merges.txt<\/em>&nbsp;\u2014 performs the initial mapping of text to tokens\n<\/li>\n<li><em>vocab.json<\/em>&nbsp;\u2014 maps the tokens to token IDs\n<\/li>\n<\/ul>\n<p>And with those, we can move on to initializing our tokenizer so that we can use it as we would use any other&nbsp;<code>from_pretrained<\/code>&nbsp;tokenizer.<\/p>\n<h2>Initializing the Tokenizer<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nWe first initialize the tokenizer using the two files we built before \u2014 using a simple&nbsp;<code>from_pretrained<\/code>:<\/p>\n<p>Now our tokenizer is ready, we can try encoding some text with it. When encoding we use the same two methods we would typically use,&nbsp;<code>encode<\/code>&nbsp;and&nbsp;<code>encode_batch<\/code>.<\/p>\n<p>From the encodings object&nbsp;<code>tokens<\/code>&nbsp;we will be extracting the&nbsp;<code>input_ids<\/code>&nbsp;and&nbsp;<code>attention_mask<\/code>&nbsp;tensors for use with FiliBERTo.<\/p>\n<h2>Creating the Input Pipeline<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nThe input pipeline of our training process is the more complex part of the entire process. It consists of us taking our raw OSCAR training data, transforming it, and loading it into a&nbsp;<code>DataLoader<\/code>&nbsp;ready for training.<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" width=\"660\" height=\"383\" src=\"https:\/\/www.youtube.com\/embed\/heTYbpr9mD8\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><br \/>\n<span>Video walkthrough of the MLM input pipeline<\/span><br \/>\n<\/center><br \/>\n&nbsp; <\/p>\n<h2>Preparing the Data<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nWe\u2019ll start with a single sample and work through the preparation logic.<\/p>\n<p>First, we need to open our file \u2014 the same files that we saved as&nbsp;<em>.txt<\/em>&nbsp;files earlier. We split each based on newline characters&nbsp;<code>n<\/code>&nbsp;as this indicates the individual samples.<\/p>\n<p>Then we encode our data using the&nbsp;<code>tokenizer<\/code>&nbsp;\u2014 making sure to include key parameters like&nbsp;<code>max_length<\/code>,&nbsp;<code>padding<\/code>, and&nbsp;<code>truncation<\/code>.<\/p>\n<p>And now we can move onto creating our tensors \u2014 we will be training our model through masked-language modeling (MLM). So, we need three tensors:<\/p>\n<ul>\n<li><strong><em>input_ids<\/em><\/strong>&nbsp;\u2014 our&nbsp;<em>token_ids<\/em>&nbsp;with ~15% of tokens masked using the mask token&nbsp;<code>&lt;mask&gt;<\/code>.\n<\/li>\n<li><strong><em>attention_mask<\/em><\/strong>&nbsp;\u2014 a tensor of&nbsp;<strong>1<\/strong>s and&nbsp;<strong>0<\/strong>s, marking the position of \u2018real\u2019 tokens\/padding tokens \u2014 used in attention calculations.\n<\/li>\n<li><strong><em>labels<\/em><\/strong>&nbsp;\u2014 our&nbsp;<em>token_ids<\/em>&nbsp;with&nbsp;<strong>no<\/strong>&nbsp;masking.\n<\/li>\n<\/ul>\n<p>If you\u2019re not familiar with MLM, I\u2019ve explained it&nbsp;<a href=\"https:\/\/towardsdatascience.com\/masked-language-modelling-with-bert-7d49793e5d2c\" rel=\"noopener\" target=\"_blank\">here<\/a>.<\/p>\n<p>Our&nbsp;<code>attention_mask<\/code>&nbsp;and&nbsp;<code>labels<\/code>&nbsp;tensors are simply extracted from our&nbsp;<code>batch<\/code>. The&nbsp;<code>input_ids<\/code>&nbsp;tensors require more attention however, for this tensor we mask ~15% of the tokens \u2014 assigning them the token ID&nbsp;<code>3<\/code>.<\/p>\n<p>In the final output, we can see part of an encoded&nbsp;<code>input_ids<\/code>&nbsp;tensor. The very first token ID is&nbsp;<code>1<\/code>&nbsp;\u2014 the&nbsp;<code>[CLS]<\/code>&nbsp;token. Dotted around the tensor we have several&nbsp;<code>3<\/code>&nbsp;token IDs \u2014 these are our newly added&nbsp;<code>[MASK]<\/code>&nbsp;tokens.<\/p>\n<h2>Building the DataLoader<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nNext, we define our&nbsp;<code>Dataset<\/code>&nbsp;class \u2014 which we use to initialize our three encoded tensors as PyTorch&nbsp;<code>torch.utils.data.Dataset<\/code>&nbsp;objects.<\/p>\n<p>Finally, our&nbsp;<code>dataset<\/code>&nbsp;is loaded into a PyTorch&nbsp;<code>DataLoader<\/code>&nbsp;object \u2014 which we use to load our data into our model during training.<\/p>\n<h2>Training the Model<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nWe need two things for training, our&nbsp;<code>DataLoader<\/code>&nbsp;and a model. The&nbsp;<code>DataLoader<\/code>&nbsp;we have \u2014 but no model.<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" width=\"660\" height=\"383\" src=\"https:\/\/www.youtube.com\/embed\/35Pdoyi6ZoQ\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><br \/>\n<\/center> <\/p>\n<h2>Initializing the Model<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nFor training, we need a raw (not pre-trained)&nbsp;<code>BERTLMHeadModel<\/code>. To create that, we first need to create a RoBERTa config object to describe the parameters we\u2019d like to initialize FiliBERTo with.<\/p>\n<p>Then, we import and initialize our RoBERTa model with a language modeling (LM) head.<\/p>\n<h2>Training Preparation<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nBefore moving onto our training loop we need to set up a few things. First, we set up GPU\/CPU usage. Then we activate the training mode of our model \u2014 and finally, initialize our optimizer.<\/p>\n<h2>Training<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nFinally \u2014 training time! We train just as we usually would when training via PyTorch.<\/p>\n<p>If we head on over to Tensorboard we\u2019ll find our loss over time \u2014 it looks promising.<\/p>\n<p><center><br \/>\n<img decoding=\"async\" alt src=\"https:\/\/wordpress-1016567-4521551.cloudwaysapps.com\/wp-content\/uploads\/2021\/08\/how-to-train-a-bert-model-from-scratch-3.jpg\" width=\"100%\"><br \/>\n<span>Loss \/ time \u2014 multiple training sessions have been threaded together in this chart<\/span><br \/>\n<\/center><br \/>\n&nbsp; <\/p>\n<h2>The Real Test<\/h2>\n<p>&nbsp;<br \/>\n&nbsp;<br \/>\nNow it\u2019s time for the real test. We set up an MLM pipeline \u2014 and ask Laura to assess the results. You can watch the video review at 22:44 here:<\/p>\n<p><center><br \/>\n<iframe loading=\"lazy\" width=\"660\" height=\"383\" src=\"https:\/\/www.youtube.com\/embed\/35Pdoyi6ZoQ\" title=\"YouTube video player\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture\" allowfullscreen><\/iframe><br \/>\n<\/center><\/p>\n<p>We first initialize a&nbsp;<code>pipeline<\/code>&nbsp;object, using the&nbsp;<code>'fill-mask'<\/code>&nbsp;argument. Then begin testing our model like so:<\/p>\n<p><em>\u201cciao&nbsp;<\/em><strong><em>come<\/em><\/strong><em>&nbsp;va?\u201d&nbsp;<\/em>is the right answer! That\u2019s as advanced as my Italian gets \u2014 so, let\u2019s hand it over to Laura.<\/p>\n<p>We start with&nbsp;<em>\u201cbuongiorno, come va?\u201d<\/em>&nbsp;\u2014 or&nbsp;<em>\u201cgood day, how are you?\u201d<\/em>:<\/p>\n<p>The first answer, \u201cbuongiorno, chi va?\u201d means \u201cgood day, who is there?\u201d \u2014 eg nonsensical. But, our second answer is correct!<\/p>\n<p>Next up, a slightly harder phrase,&nbsp;<em>\u201cciao, dove ci incontriamo oggi pomeriggio?\u201d<\/em>&nbsp;\u2014 or&nbsp;<em>\u201chi, where are we going to meet this afternoon?\u201d<\/em>:<\/p>\n<p>And we return some more positive results:<\/p>\n<div style=\"width:98%; overflow:auto; padding-left:10px; padding-bottom:10px; padding-top:10px; background:#d3d3d3\">\n<pre><code>\u2705 \"hi, where do we see each other this afternoon?\"\n\u2705 \"hi, where do we meet this afternoon?\"\n\u274c \"hi, where here we are this afternoon?\"\n\u2705 \"hi, where are we meeting this afternoon?\"\n\u2705 \"hi, where do we meet this afternoon?\"<\/code><\/pre>\n<\/div>\n<p>Finally, one more, harder sentence,&nbsp;<em>\u201ccosa sarebbe successo se avessimo scelto un altro giorno?\u201d<\/em>&nbsp;\u2014 or \u201cwhat would have happened if we had chosen another day?\u201d:<\/p>\n<p>We return a few good more good answers here too:<\/p>\n<div style=\"width:98%; overflow:auto; padding-left:10px; padding-bottom:10px; padding-top:10px; background:#d3d3d3\">\n<pre><code>\u2705 \"what would have happened if we had chosen another day?\"\n\u2705 \"what would have happened if I had chosen another day?\"\n\u2705 \"what would have happened if they had chosen another day?\"\n\u2705 \"what would have happened if you had chosen another day?\"\n\u274c \"what would have happened if another day was chosen?\"<\/code><\/pre>\n<\/div>\n<p>Overall, it looks like our model passed Laura\u2019s tests \u2014 and we now have a competent Italian language model called FiliBERTo!<\/p>\n<p>That\u2019s it for this walkthrough of training a BERT model from scratch!<\/p>\n<p>We\u2019ve covered a lot of ground, from getting and formatting our data \u2014 all the way through to using language modeling to train our raw BERT model.<\/p>\n<p>I hope you enjoyed this article! If you have any questions, let me know via&nbsp;<a href=\"https:\/\/twitter.com\/jamescalam\" rel=\"noopener\" target=\"_blank\">Twitter<\/a>&nbsp;or in the comments below. If you\u2019d like more content like this, I post on&nbsp;<a href=\"https:\/\/www.youtube.com\/c\/jamesbriggs\" rel=\"noopener\" target=\"_blank\">YouTube<\/a>&nbsp;too.<\/p>\n<p>Thanks for reading!<\/p>\n<p>&nbsp;\n<\/p>\n<h3>70% Off! Natural Language Processing: NLP With Transformers in Python<\/h3>\n<p>\nTransformer models are the de-facto standard in modern NLP. They have proven themselves as the most expressive\u2026<br \/>\n&nbsp;<\/p>\n<p><em>*All images are by the author except where stated otherwise<\/em><\/p>\n<p>&nbsp;<br \/>\n<b>Bio: <a href=\"https:\/\/youtube.com\/c\/JamesBriggs\" target=\"_blank\" rel=\"noopener\">James Briggs<\/a><\/b> is a data scientist specializing in natural language processing and working in the finance sector, based in London, UK. He is also a freelance mentor, writer, and content creator. You can reach the author via email (<a href=\"mailto:jamescalam94@gmail.com\">jamescalam94@gmail.com<\/a>).<\/p>\n<p><a href=\"https:\/\/towardsdatascience.com\/how-to-train-a-bert-model-from-scratch-72cfce554fc6\" target=\"_blank\" rel=\"noopener\">Original<\/a>. Reposted with permission.<\/p>\n<p><b>Related:<\/b><\/p>\n<ul class=\"three_ul\">\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/04\/apply-transformers-any-length-text.html\">How to Apply Transformers to Any Length of Text<\/a>\n<\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/understanding-bert-hugging-face.html\">Understanding BERT with Hugging Face<\/a>\n<\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2020\/11\/topic-modeling-bert.html\">Topic Modeling with BERT<\/a>\n<\/li>\n<\/ul>\n<div id=\"disqus_thread\"><\/div><\/div>\n<div class=\"page-link\"><\/div>\n<div class=\"pagi\">\n<hr class=\"grey-line\">\n<div class=\"pagi-left\">\n<a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/querying-granular-demographic-dataset.html\" rel=\"prev\">= <strong>Previous post<\/strong><\/a><\/div>\n<div class=\"pagi-right\">\n<\/div>\n<div>\n<hr class=\"grey-line\"><\/p>\n<table class=\"latn\" align=\"center\" cellpadding=\"3\" cellspacing=\"10\" width=\"100%\">\n<tr>\n<th class=\"thb\" colspan=\"2\"><span style=\"font-size: 22px; font-weight:700; text-align:center\">Top Stories Past 30 Days<\/span><\/th>\n<\/tr>\n<tr>\n<td valign=\"top\" width=\"50%\">\n<table cellpadding=\"3\" cellspacing=\"2\">\n<tr>\n<th><b>Most Popular<\/b><\/th>\n<\/tr>\n<tr>\n<td>\n<ol class=\"three_ol\">\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/top-6-data-science-online-courses.html\" onclick=\"ga(\" x pbc><b>Top 6 Data Science Online Courses in 2021<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/data-scientists-machine-learning-engineers-luxury-employees.html\" onclick=\"ga(\" x pbc><b>Data Scientists and ML Engineers Are Luxury Employees<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/google-advice-learning-data-science.html\" onclick=\"ga(\" x pbc><b>Advice for Learning Data Science from Google\u2019s Director of Research<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/github-copilot-open-source-alternatives-code-generation.html\" onclick=\"ga(\" x pbc><b>GitHub Copilot Open Source Alternatives<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/geometric-foundations-deep-learning.html\" onclick=\"ga(\" x pbc><b>Geometric foundations of Deep Learning<\/b><\/a> <\/li>\n<\/ol>\n<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<td valign=\"top\">\n<table cellpadding=\"3\" cellspacing=\"2\">\n<tr>\n<th><b>Most Shared<\/b><\/th>\n<\/tr>\n<tr>\n<td>\n<ol class=\"three_ol\">\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/learn-productive-data-science.html\" onclick=\"ga(\" x pbc><b>Why and how should you learn \u201cProductive Data Science\u201d?<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/deep-learning-gpu-accelerate-data-science-data-analytics.html\" onclick=\"ga(\" x pbc><b>Not Only for Deep Learning: How GPUs Accelerate Data Science &amp; Data Analytics<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/bootstrap-modern-data-stack-terraform.html\" onclick=\"ga(\" x pbc><b>Bootstrap a Modern Data Stack in 5 minutes with Terraform<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/gpu-powered-data-science-deep-learning-rapids.html\" onclick=\"ga(\" x pbc><b>GPU-Powered Data Science (NOT Deep Learning) with RAPIDS<\/b><\/a> <\/li>\n<li><a href=\"https:\/\/www.kdnuggets.com\/2021\/07\/become-analytics-engineer-90-days.html\" onclick=\"ga(\" x pbc><b>Become an Analytics Engineer in 90 Days<\/b><\/a> <\/li>\n<\/ol>\n<\/td>\n<\/tr>\n<\/table>\n<\/td>\n<\/tr>\n<\/table>\n<\/div><\/div>\n<p> <!--#content end--> <a href=\"https:\/\/www.kdnuggets.com\/2021\/08\/train-bert-model-scratch.html\">Source: https:\/\/www.kdnuggets.com\/2021\/08\/train-bert-model-scratch.html<\/a><\/p>\n","protected":false},"author":1,"featured_media":1013330,"template":"","meta":{"_eb_attr":"","type":"","auto_type":false,"post":"","stream":"","stream_url":"","waveform_data":[],"duration":0,"start":0,"end":0,"bpm":0,"downloadable":false,"download_url":"","purchase_title":"","purchase_url":"","post-count-all":0,"like_count":0,"download_count":0,"editor_note":"","copyright":"","captions":[],"sources":[]},"genre":[42022],"station_tag":[43552,43567,3761,4046,3718,4135,5038,5128,4526,4244,3886,4422,5619,6978,4152,3891,3642,8865,7060,3792,15624,4816,3899,5201,7061,4818,3729,4490,3843,3772,5175,5474,4178,4491,13838,4311,3952,3732,3694,4185,3650,3651,4186,4191,3653,5282,3805,4572,5487,4318,4010,5388,4965,4856,4089,3737,6194,4877,4093,4094,43589,3706,43581,16265,16762,3916,4099,4207,3663,45665,3863,4802,4020,24360,5492,4433,4923,4118,4580,3976,3712,3779,4860,3714,4032,3927,4128,4361,4970,3671,4036,3813,3750,7291,8067,4465,7482,3984,7234,3874,4631,3941,5037,3816,3935,3878,5152,4927,468],"artist":[42028],"mood":[],"activity":[],"_links":{"self":[{"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/station\/1013329"}],"collection":[{"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/station"}],"about":[{"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/types\/station"}],"author":[{"embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/users\/1"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/media?parent=1013329"}],"wp:term":[{"taxonomy":"genre","embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/genre?post=1013329"},{"taxonomy":"station_tag","embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/station_tag?post=1013329"},{"taxonomy":"artist","embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/artist?post=1013329"},{"taxonomy":"mood","embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/mood?post=1013329"},{"taxonomy":"activity","embeddable":true,"href":"https:\/\/platodata.io\/wp-json\/wp\/v2\/activity?post=1013329"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}