
{"id":15198,"date":"2022-09-06T15:59:29","date_gmt":"2022-09-06T19:59:29","guid":{"rendered":"https:\/\/ipullrank.com\/?p=15198"},"modified":"2025-07-31T16:07:21","modified_gmt":"2025-07-31T20:07:21","slug":"generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","status":"publish","type":"post","link":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","title":{"rendered":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2"},"content":{"rendered":"\n<p>My 2020 MozCon presentation was a short film created by the iPullRank team.<\/p>\n\n\n\n<p>I\u2019m not going to spoil it because I\u2019d rather you watched it, but the movie is one part \u201cBatman: The Animated Series\u201d and one part \u201cMr. Robot\u201d presented in a mixed-media format.<\/p>\n\n\n\n<p><em>If you haven\u2019t seen it, we\u2019ve just released a Director\u2019s Cut as well as all the related resources and code (all the tactics and code are real) from the film, so&nbsp;<a href=\"https:\/\/ipullrank.com\/runtime\" target=\"_blank\" rel=\"noreferrer noopener\">please have a look<\/a>. We made it for you!<\/em><\/p>\n\n\n\n<p>What I want to highlight today, though, is the scene toward the end of the film wherein the concept of scalable text generation is explored.<\/p>\n\n\n\n<p>Have a look:<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-embed-handler wp-block-embed-embed-handler wp-embed-aspect-16-9 wp-has-aspect-ratio\"><div class=\"wp-block-embed__wrapper\">\n<iframe title=\"GPT-2 Tactic Excerpt from iPullRank&#039;s Runtime\" width=\"800\" height=\"450\" src=\"https:\/\/www.youtube.com\/embed\/ZIJZ4f7My08?feature=oembed&#038;enablejsapi=1&#038;origin=https:\/\/ipullrank.com\" frameborder=\"0\" allow=\"accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture; web-share\" referrerpolicy=\"strict-origin-when-cross-origin\" allowfullscreen><\/iframe>\n<\/div><\/figure>\n\n\n\n<p>In this scene we\u2019re depicting our protagonist, Casey Robins, figuring out how to programmatically generate copy for ecommerce category pages and incorporating data into that copy based on the JSON object used to populate that page.<\/p>\n\n\n\n<p>Yeah, that was a mouthful, but it\u2019s the coolest tactic that I\u2019ve devised in the past five years, so bear with me!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">DataToText Is Still Academic, But Here\u2019s a Hack<\/h2>\n\n\n\n<p>As I mention in the dialog, there\u2019s a field of Natural Language Generation study called DataToText wherein people are taking structured data and using it to generate copy.<\/p>\n\n\n\n<p>In academic research, engineers have highlighted use cases like giving recaps on sports games and also for generating copy for ecommerce product pages.<\/p>\n\n\n\n<p>Here\u2019s an example of copy generated for a sports game recap from the paper,&nbsp;<a href=\"https:\/\/arxiv.org\/pdf\/1912.10011v1.pdf\" target=\"_blank\" rel=\"noreferrer noopener\">A hierarchical model for data to text generation<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/a-hierarchical-model-for-data-to-text-generation-5f187d780a47b.png\" alt=\"A hierarchical model for data to text generation\" class=\"wp-image-375287\"\/><\/figure>\n\n\n\n<p>Here\u2019s an example of copy being generated for a product detail page from the paper,&nbsp;<a href=\"https:\/\/www.aclweb.org\/anthology\/P19-4009\/\" target=\"_blank\" rel=\"noreferrer noopener\">Storytelling from Structured Data and Knowledge Graphs<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/storytelling-from-structured-data-and-knowledge-graphs-5f187d972ebe5.png\" alt=\"Storytelling from Structured Data and Knowledge Graphs\" class=\"wp-image-375288\"\/><\/figure>\n\n\n\n<p>Naturally those use cases are a direct reflection of things that would support scalable content creation for SEO.<\/p>\n\n\n\n<p>So, I figured DataToText would be ready to roll and I could just hand some structured data off to an API and be all set.<\/p>\n\n\n\n<p>So I skim-read&nbsp;<a href=\"https:\/\/paperswithcode.com\/task\/data-to-text-generation\" target=\"_blank\" rel=\"noreferrer noopener\">a few of these papers<\/a>&nbsp;and tried to run some of the code.<\/p>\n\n\n\n<p>I was, frankly, out of my technical depth and not willing to commit to reading thoroughly enough to truly figure out how to do it.<\/p>\n\n\n\n<p><em>What do you want from me? I have two children and I\u2019m responsible for two businesses during a global pandemic.<\/em><\/p>\n\n\n\n<p>So, instead, I\u2019ve identified a shortcut based on what I already know how to do.<\/p>\n\n\n\n<p>Many ecommerce sites are built on Single Page Applications.<\/p>\n\n\n\n<p>This means that there is an API endpoint somewhere that the client-side code accesses in order to populate its content when a page is being constructed or updated.<\/p>\n\n\n\n<p>By design, many of those API endpoints are open and available to us and the authentication is often little to none.<\/p>\n\n\n\n<p>We can use these same endpoints to gather features and derived data points to generate unique and relevant content.<\/p>\n\n\n\n<p>We can use this data to develop a series of sentences with a significant amount of variance to get the data into a paragraph.<\/p>\n\n\n\n<p>Then we can use a natural language generation library (hello GPT-2!) to complete those paragraphs.<\/p>\n\n\n\n<p>Varying the length of the paragraphs and where that varied sentence falls in a given paragraph will yield a wealth of completely unique and relevant content that features our key data points.<\/p>\n\n\n\n<p>Ok, but before we get into how we do that, let\u2019s talk about how we got here.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How I Was Wrong About the Future of Text Generation<\/h2>\n\n\n\n<p>When I gave a talk on Machine Learning at SearchLove a few years back, I said that SEOs should avoid ML-driven content creation because it wasn\u2019t good enough.<\/p>\n\n\n\n<p>There have been well-known instances of this type of content going sideways when used to generate financial reports and sports recaps.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/did-a-robot-reporter-screw-up-aps-netflix-earnings-story-5f187e677c35a.png\" alt=\"Did a robot reporter screw up AP's Netflix earnings story\" class=\"wp-image-375292\"\/><\/figure>\n\n\n\n<p>There have also been companies such as Narrative Science and Automated Insights that have occupied this space for years.<\/p>\n\n\n\n<p>Those solutions heavily rely on templates with variation and the insertion of variables rather than the generation of truly unique text.<\/p>\n\n\n\n<p>At the time I gave that talk, I didn\u2019t think we\u2019d have anything viable for effective and distinct copy generation for at least another decade.<\/p>\n\n\n\n<p>I was wrong and I later corrected myself two years ago at TechSEOBoost. I\u2019d predicted that in the next five years that text generation would be ubiquitous.<\/p>\n\n\n\n<p>Well, the time to begin to capitalize on this technology for SEO is right now.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">An Incomplete &amp; Self-Referential History of Content Generation in SEO<\/h2>\n\n\n\n<p>Earlier in my SEO career, I didn\u2019t know much about the SEO community at large, so I reinvented the wheel a few times.<\/p>\n\n\n\n<p>I say this because the concept of content spinning is explained effectively by how I arrived at it.<\/p>\n\n\n\n<p>In 2007, there was a song by the rapper Canibus called \u201cPoet Laureate Infinity.\u201d<\/p>\n\n\n\n<p>The core premise of that song was that the emcee had recorded five versions of a 200 bar verse that you could use to form a new verse.<\/p>\n\n\n\n<p>The verses were multi-tracked on top of each other, but bar one in one version rhymed with bar two in another verse and so on.<\/p>\n\n\n\n<p>The song was presented with a visual interface on the Canibus.com website that allowed the listener to toggle which version of the verse was playing at any given moment.<\/p>\n\n\n\n<p>Effectively, you could have infinite permutations of the 200 bar verse you heard based on what you toggled.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/poet-laureate-infinity-5f187eb65445f.png\" alt=\"Poet Laureate Infinity\" class=\"wp-image-375293\"\/><\/figure>\n\n\n\n<p>At the time that was released, I was working for a company that had a network of sites only really differing only by their location and company name. I thought to myself, we can apply the same concept to content.<\/p>\n\n\n\n<p>So, after using some math from my Discrete Structures textbook to build a business case, I\u2019d worked with a copywriter on my team (what\u2019s up, Jacques!) to write five versions of a piece of content.<\/p>\n\n\n\n<p>He made all the paragraphs the same number of sentences and all the sentences in each version work with the sentences in the next.<\/p>\n\n\n\n<p>I\u2019d also instructed him to leave markers in it for locations and company names so that we could populate them into the content.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/array-1-5f187f2f78b38.jpg\" alt=\"an example of an array\" class=\"wp-image-375294\"\/><\/figure>\n\n\n\n<p>I put them all in a multi-dimensional array, randomized them, and did a string replace on the markers and generated copy.<\/p>\n\n\n\n<p>I also compared the edit distance to other versions of the content that were generated and only spit out a version that was at least 70% unique.<\/p>\n\n\n\n<p>This was 2007, before algorithm updates named after animals were on the scene, so the tactic worked like magic.<\/p>\n\n\n\n<p>Naturally, the SEO community is full of very smart people who\u2019d already arrived at this same conclusion without the help of a nerdy rapper.<\/p>\n\n\n\n<p>Some of the SEOs that I now call friends, were doing more sophisticated implementations of this using Markov chains and variants on the word and phrase level.<\/p>\n\n\n\n<p>I\u2019d later come to know that there are programs that do this using what is called \u201cspintax\u201d as seen below.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/blog_images_code_quote-5f187f4c1f285.jpg\" alt=\"Blog_Images_Code_Quote\" class=\"wp-image-375295\"\/><\/figure>\n\n\n\n<p>This tactic is known as content spinning.<\/p>\n\n\n\n<p>Oftentimes, it is used on other people\u2019s articles to duplicate existing articles by just changing the words around.<\/p>\n\n\n\n<p>Content spinning as I described above is in the bucket of tactics that people reference when they talk about \u201cSEO content.\u201d<\/p>\n\n\n\n<p>It tends to be very keyword-rich, repetitive, and generally not for human consumption.<\/p>\n\n\n\n<p>In a lot of cases, the content does not even make sense because people don\u2019t check the output before publishing.<\/p>\n\n\n\n<p>Both \u201cSEO content\u201d and content spinning are not tactics I recommend.<\/p>\n\n\n\n<p>However, the reality is that there are many ecommerce sites out there that have seen significant organic search traffic improvements from madlibbed copy and duplicate content.<\/p>\n\n\n\n<p>Case in point, in the graph below, you\u2019re seeing the growth of visibility for a large ecommerce site.<\/p>\n\n\n\n<p>The green line represents all of its organic search visibility.&nbsp;The light blue line represents the visibility for a directory that houses nothing but duplicate content.<\/p>\n\n\n\n<p>Those lines get closer together after the deployment of a few sentences of madlibbed copy.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/example-ecommerce-site-visibility-5f187f76d92e8.png\" alt=\"Example ecommerce site visibility\" class=\"wp-image-375296\"\/><\/figure>\n\n\n\n<p>If you\u2019ve done SEO for large websites with high authority, you know that just following best practices means losses in opportunity.<\/p>\n\n\n\n<p>However, being that this does work so well, it\u2019s worth considering a much better way that can yield content with more utility for the user.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Enter GPT-2<\/h2>\n\n\n\n<p>Natural Language Generation is what content spinning wanted to be when it grew up.<\/p>\n\n\n\n<p>There are still tools out there like&nbsp;<a href=\"https:\/\/rosaenlg.org\/rosaenlg\/1.1.0\/index.html\" target=\"_blank\" rel=\"noreferrer noopener\">RosaeNLG<\/a>&nbsp;that refer to themselves as natural language generation tools, but they are simply good content spinners.<\/p>\n\n\n\n<p>The Natural Language, Processing, Understanding, and Generation fields have evolved dramatically in recent years on the backs of neural networks and the ubiquity of processing power.<\/p>\n\n\n\n<p>If you\u2019ve paid attention to Google Search\u2019s announcements regarding natural language in the past couple of years, you may have heard of&nbsp;<a href=\"https:\/\/huggingface.co\/blog\/bert-101\">BERT<\/a>.<\/p>\n\n\n\n<p>BERT stands for Bidirectional Encoder Representations from Transformers.<\/p>\n\n\n\n<p>The keyword here is \u201ctransformer.\u201d<\/p>\n\n\n\n<p>To oversimplify, Transformer technology is built on the idea of learning from content and using that learning to determine probabilities of what the next word is most likely to be based on the previous word or the previous series of words.<\/p>\n\n\n\n<p>This is the technology behind the predictive text in your Gmail and in your text messages, if you\u2019re on Android.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/the-transformer-model-5f187fc493966.png\" alt=\"The Transformer model\" class=\"wp-image-375297\" style=\"width:658px;height:800px\"\/><\/figure>\n\n\n\n<p>GPT has revolutionized the text generation space in using that concept to create copy.<\/p>\n\n\n\n<p>GPT-2 is the second generation of Elon Musk\u2019s OpenAI team\u2019s Generative Pretrained Transformer library and it is capable of writing copy that you will find very difficult to distinguish from a human.<\/p>\n\n\n\n<p>When they first announced GPT-2, it was in the wake of the fake news outbreak, and they said it was too dangerous to release to the public. (Note: the antagonist in Runtime references this directly).<\/p>\n\n\n\n<p>Ultimately, they did release it, but with fewer parameters than they\u2019d trained it on.<\/p>\n\n\n\n<p><a href=\"https:\/\/openai.com\/blog\/gpt-2-1-5b-release\/\" target=\"_blank\" rel=\"noreferrer noopener\">Rather than 1.5 billion parameters<\/a>, they originally released it with a maximum of 774 million parameters.<\/p>\n\n\n\n<p>Despite that, the available pre-trained models are very good at generating copy and you can fine-tune the model based on any data you want to provide it.<\/p>\n\n\n\n<p>In the film, Casey fine-tunes the model by using scraped copy from a competitor\u2019s website.<\/p>\n\n\n\n<p>If your site has a substantial amount of copy, you should train it on that.<\/p>\n\n\n\n<p>GPT-2 works in practice by giving it a prompt and some parameters to tell it to generate a certain amount of words.<\/p>\n\n\n\n<p>With the tactic that I\u2019m describing, I\u2019m recommending that you use content spinning as your mechanism of feeding it the prompt.<\/p>\n\n\n\n<p>Effectively, you get your data into a sentence using your content spinner and then feed that spun content into GPT-2 as a prompt.<\/p>\n\n\n\n<p>From there you\u2019d generate&nbsp;<strong>n<\/strong>&nbsp;words and then pull your next sentence and keep doing that until you have however much copy you want.<\/p>\n\n\n\n<p>It is important to note that GPT and its successors are not the only models for this type of text generation.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Google has one called\u00a0<a href=\"https:\/\/ai.googleblog.com\/2020\/02\/exploring-transfer-learning-with-t5.html\" target=\"_blank\" rel=\"noreferrer noopener\">T5<\/a>\u00a0that was trained on a cleaned version of the CommonCrawl.<\/li>\n\n\n\n<li>Salesforce has one called\u00a0<a href=\"https:\/\/blog.einstein.ai\/introducing-a-conditional-transformer-language-model-for-controllable-generation\/\" target=\"_blank\" rel=\"noreferrer noopener\">CTRL<\/a>\u00a0trained on 1.6 billion parameters.<\/li>\n\n\n\n<li>Microsoft has one called\u00a0<a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/turing-nlg-a-17-billion-parameter-language-model-by-microsoft\/\" target=\"_blank\" rel=\"noreferrer noopener\">TuringNLG<\/a>\u00a0which is trained on 17 billion parameters.<\/li>\n<\/ul>\n\n\n\n<p>So, it\u2019s not just Elon Musk that is about to cause an explosion in machine-created content that floods the SERPs.<\/p>\n\n\n\n<p>GPT-2 perhaps has the simplest implementations that I\u2019ve come across due to its popularity.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/text-generation-models-5f187ff012b3a.png\" alt=\"text generation models\" class=\"wp-image-375298\"\/><\/figure>\n\n\n\n<p>And while I\u2019m on the subject of different models, I\u2019d like to give a shoutout to&nbsp;<a href=\"https:\/\/www.ranksense.com\/author\/admin\/\">Hamlet Batista<\/a>&nbsp;who has been leading the thoughts in our space around practical applications of NLP and NLG.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How to Find the Data That Populates a Page<\/h2>\n\n\n\n<p>In the film, Casey identifies the endpoint and HTTP request from the Network tab in Chrome that contains the data points used to build the category page.<\/p>\n\n\n\n<p>You shouldn\u2019t have to do this for your own site because you should have direct access to the data model.<\/p>\n\n\n\n<p>For illustrative purposes, you\u2019d typically find that data by limiting your Network transactions to XHR.<\/p>\n\n\n\n<p>Once you\u2019ve rifled through the AJAX requests and found the ones with all the data, right-click the request in Chrome to pull the details.<\/p>\n\n\n\n<p>Here\u2019s an example from Nike.com. This is their Men\u2019s Basketball page.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/nike-men---s-basketball-page-5f18801d826d7.png\" alt=\"Nike Men\u2019s Basketball page\" class=\"wp-image-375299\"\/><\/figure>\n\n\n\n<p>Nike.com is built with the Single Page Application framework React.&nbsp; The site has an API endpoint that lives at https:\/\/api.nike.com\/cic\/browse\/v1.<\/p>\n\n\n\n<p>That endpoint is used to populate products on pages.<\/p>\n\n\n\n<p>Within Chrome, you can get the exact details of the HTTP request as a fetch, NodeJS, or cURL command used to ping that endpoint by right-clicking the request.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/nike-products-endpoint-curl-5f18804b5ca2c.gif\" alt=\"nike-products-endpoint-curl\" class=\"wp-image-375300\"\/><\/figure>\n\n\n\n<p>In this example the cURL request looks like this:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/curl-request-example-5f18806b473aa.png\" alt=\"cURL request example\" class=\"wp-image-375301\"\/><\/figure>\n\n\n\n<p>Granted, some sites are sophisticated in that they don\u2019t allow you to easily access these endpoints from a command line and Nike is one such site.<\/p>\n\n\n\n<p>So, you\u2019d likely want to use Puppeteer to load the page,&nbsp;<a href=\"https:\/\/www.npmjs.com\/package\/puppeteer-request-spy\" target=\"_blank\" rel=\"noreferrer noopener\">intercept and save the XHR responses<\/a>&nbsp;from api.nike.com.<\/p>\n\n\n\n<p>The JSON payload you\u2019re looking for tends to have several nested nodes.<\/p>\n\n\n\n<p>You can see this in the Network tab by going through the individual requests in the waterfall.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/nike-products-endpoint-5f18808331749.gif\" alt=\"nike-products-endpoint\" class=\"wp-image-375302\"\/><\/figure>\n\n\n\n<p>If we zoom in a bit, we can see the features in the data model that Nike makes available in the construction of this page.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/nike-data-model-example-5f1880ac96238.png\" alt=\"Nike data model example\" class=\"wp-image-375303\"\/><\/figure>\n\n\n\n<p>Consider in this case that we have the following available to us:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Color Description<\/strong>: The high-level colorway of the sneaker.<\/li>\n\n\n\n<li><strong>inStock, isJersey, isMemberExclusive, isNBA, isNFL, isSustainable:<\/strong>\u00a0Boolean features to indicate key characteristics of the item.<\/li>\n\n\n\n<li><strong>Price<\/strong>: In case you know someone who works for Nike, you now know the employee price.<\/li>\n\n\n\n<li><strong>Subtitle<\/strong>: Type of the item.<\/li>\n\n\n\n<li><strong>Title<\/strong>: The name of the item.<\/li>\n<\/ul>\n\n\n\n<p>Now you have the endpoint. You can take a look at a series of different pages to better understand the parameters in the request so you can extract what you need.<\/p>\n\n\n\n<p>Review the data thoroughly and think through which data points you can use directly and which data points you can derive from those.<\/p>\n\n\n\n<p>For instance, if you have prices for 10 items, you can derive an average price from that data point. If you have in-stock numbers, you can potentially derive popularity.<\/p>\n\n\n\n<p>In the example above, we could derive an understanding of the sustainability of the sneakers and put together some language related to that.<\/p>\n\n\n\n<p>To simplify, these variables could be used to build a sentence like:<\/p>\n\n\n\n<p><em>Looking for the best Basketball Shoe? We\u2019ve got the Kyrie 6 Trophies.<\/em><\/p>\n\n\n\n<p>The template for that sentence becomes:<\/p>\n\n\n\n<p><em>Looking for the best {subtitle}? We\u2019ve got the {title}.<\/em><\/p>\n\n\n\n<p>From that sentence GPT-2 could generate this paragraph:<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/gpt-2-sample-paragraph-5f1880e2479d5.png\" alt=\"GPT-2 sample paragraph\" class=\"wp-image-375304\" style=\"width:671px;height:519px\"\/><\/figure>\n\n\n\n<p>You can play around with this idea on the&nbsp;<a href=\"https:\/\/transformer.huggingface.co\/doc\/gpt2-large\" target=\"_blank\" rel=\"noreferrer noopener\">WriteWithTransformer.com<\/a>&nbsp;site by the HuggingFace team.<\/p>\n\n\n\n<p>Keep in mind that this site allows you to generate copy using the GPT-2 large model rather than the 355mm parameter version being used in the examples below.<\/p>\n\n\n\n<p>If you get stuck on identifying variants of the phrases you\u2019re preparing, you can use data from&nbsp;<a href=\"https:\/\/paraphrase.org\/\" target=\"_blank\" rel=\"noreferrer noopener\">Paraphrase.org<\/a>.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/paraphrase.org-sample-data-5f18815f86c3c.png\" alt=\"Paraphrase.org sample data\" class=\"wp-image-375305\" style=\"width:800px;height:394px\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">How to Generate Copy for Category Pages<\/h2>\n\n\n\n<p>Now for the moment, you\u2019ve been waiting for.<\/p>\n\n\n\n<p>Here\u2019s a&nbsp;<a href=\"https:\/\/colab.research.google.com\/drive\/1co0VdAhRTem_ZBfDXK3q3tGrogTzSkEf?usp=sharing\" target=\"_blank\" rel=\"noreferrer noopener\">Colab notebook<\/a>&nbsp;using&nbsp;<a href=\"https:\/\/github.com\/minimaxir\/gpt-2-simple\" target=\"_blank\" rel=\"noreferrer noopener\">GPT-2-Simple<\/a>&nbsp;to illustrate the text generation concept.<\/p>\n\n\n\n<p>To paraphrase Britney Muller, it\u2019s all just Control+Enter.<\/p>\n\n\n\n<p>Here\u2019s the step-by-step explanation of what needs to be done to get to this point:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">1. Review Your Site\u2019s Data Model<\/h3>\n\n\n\n<p>Looking at the data that is available will drive how you develop the data-injected blocks.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">2. Generate a Series of Sentences That Incorporate Those Data Points<\/h3>\n\n\n\n<p>Identify phrase variants from Paraphrase.org.<\/p>\n\n\n\n<p>This is an optional step if you run out of phrases or don\u2019t have a copywriter you can work with, you can download the data from the Paraphrase.org project to get a list of phrases by breaking your phrases down into n-grams.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">3. Collect\/Scrape as Much Text Content as You Can That Is Relevant to Your Space<\/h3>\n\n\n\n<p>If your site has a wealth of copy, you can pull that.&nbsp;Otherwise, pull from your competitors.<\/p>\n\n\n\n<p>In this case, I pulled from the #main &gt; div &gt; section:nth-child(4) element on Foot Locker\u2019s category pages and fed it to the model.<\/p>\n\n\n\n<p>You save everything you scrape into one text file with end-of-page markers at the end of each page. In this case, I\u2019ve used \u201c&lt;|endoftext&gt;|.\u201d<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>4. Fine-Tune a GPT-2 Model<\/strong><\/h3>\n\n\n\n<p>Feed the text file into GPT-2 to build a model that you can use going forward.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">5. Populate the Sentences With Data From the Data Model<\/h3>\n\n\n\n<p>Generate your individual sentences through content spinning using a library like wink-nlp-utils.<\/p>\n\n\n\n<p>It has a function called composeCorpus that can be used like any other content spinning tool.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">6. Use Your Data-Driven Sentences as Prompts<\/h3>\n\n\n\n<p>Each of those sentences is then fed to GPT-2 as prompts to generate as much copy as you\u2019d like.<\/p>\n\n\n\n<p>You can vary the length of the generated content and the placement.<\/p>\n\n\n\n<p>For instance, you could place one sentence and say give me 50 words or place two sentences and then prompt it to give you 200 words.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/use-your-data-driven-sentences-as-prompts-5f188196de2b5.png\" alt=\"Use your data driven sentences as prompts\" class=\"wp-image-375306\"\/><\/figure>\n\n\n\n<p>Change the copy in red to see what happens when you add your own prompt.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>7. Review &amp; Edit<\/strong><\/h3>\n\n\n\n<p>Once you\u2019ve got your copy, make sure it makes sense and edit whatever doesn\u2019t, or have the model generate another option.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>8. Deploy<\/strong><\/h3>\n\n\n\n<p>Once the copy meets your brand voice, tone, and user expectations, put it on the site and watch your keyword-relevance soar.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Can You Do This With Editorial Content?<\/h2>\n\n\n\n<p>This was already examined by&nbsp;<a href=\"https:\/\/www.vox.com\/future-perfect\/2019\/8\/30\/20840194\/ai-art-fiction-writing-language-gpt-2\" target=\"_blank\" rel=\"noreferrer noopener\">Vox<\/a>&nbsp;and the&nbsp;<a href=\"https:\/\/www.newyorker.com\/magazine\/2019\/10\/14\/can-a-machine-learn-to-write-for-the-new-yorker\" target=\"_blank\" rel=\"noreferrer noopener\">New Yorker<\/a>, but I was curious about it as I started writing this post.<\/p>\n\n\n\n<p>So, I scraped all the blog posts that I\u2019ve written about digital marketing and used them to fine-tune a GPT-2 model using 355m parameters.<\/p>\n\n\n\n<p>I took some of the beginnings of sentences that I wrote in this post and looked to see what it might give me.<\/p>\n\n\n\n<p>Here are some examples of what it spit out:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-style-default is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong>When I gave a talk on Machine Learning at SearchLove a few years back, I said that SEOs should avoid ML-driven content creation because it wasn\u2019t good enough.<\/strong>\u00a0I also said at the time that SEOs should avoid using models that didn\u2019t align with the viewpoints of the people that the content is for.<\/em><\/p>\n\n\n\n<p><em>I was obviously incredibly misguided, and very much the definition of an optimizer lies within those words. However, I also fundamentally believe that the results of using those words are that we waste time and Google is right to weed out the bad actors.<\/em><\/p>\n\n\n\n<p><em>I talk at length about how I arrived at this position, why I changed my mind and why I changed my recommendation at the end of June. I also share a secret weapon that I use in the incubation phase of every New SEO project.<\/em><\/p>\n\n\n\n<p><em>It is precisely this shared interest in content that makes the two of us unique and why so many people tell us they want to help us grow together. I imagine there are plenty of opportunities for us to share more about what we come up with together, or at the very least, have someone print out our whiteboard based presentations.<\/em><\/p>\n<\/blockquote>\n\n\n\n<p>This one was especially funny to me:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-style-default is-layout-flow wp-block-quote-is-layout-flow\">\n<p><em><strong>Earlier in my SEO career, I didn\u2019t know much about the SEO community at large, so I reinvented the wheel a few times.<\/strong>\u00a0I said I wanted to be a renowned speaker, and I did. I made a lot of friends while I was doing it. It was a great experience; I learned a lot, and many people I now interact with everyday are incredibly talented.<\/em><\/p>\n\n\n\n<p><em>But it was the internet that got me. It was through one of my very first interactions that I learned about the Social Side of SEO.<\/em><\/p>\n<\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\">Getting This Into Production<\/h2>\n\n\n\n<p>Another point I make in the film is that Python is great, but JavaScript is much easier to get into production.<\/p>\n\n\n\n<p>This is not to say that Python is impossible to get into production. Rather, it is reflective of how all tech stacks for websites natively support JavaScript.<\/p>\n\n\n\n<p>If the site is a Single Page Application, you can be sure that NodeJS is configured and every modern browser runs JavaScript.<\/p>\n\n\n\n<p>However, you can\u2019t be sure that the server has Python installed. If it does, you\u2019d need to build an API on top of getting the libraries to work.<\/p>\n\n\n\n<p>That said, the pipeline that is relatively easy to deploy would be in spaCyJS with the Hugging Face transformer add-on.<\/p>\n\n\n\n<p>This gives us easy access to GPT-2. Effectively, spaCyJS is a NodeJS API for spaCy\u2019s primary Python library. Using that, it is not difficult to use this pipeline to build API endpoint.<\/p>\n\n\n\n<p>However, to keep this as easy as possible, we can create a serverless&nbsp;<a href=\"https:\/\/github.com\/minimaxir\/gpt-2-cloud-run\" target=\"_blank\" rel=\"noreferrer noopener\">API using Google Cloud Run<\/a>.<\/p>\n\n\n\n<p>Once you\u2019ve configured the API, you can send text as prompts and get back data.<\/p>\n\n\n\n<p>You\u2019d definitely want to pre-process and edit this content rather than populating directly to a site the copy on the fly.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Some Caveats<\/h2>\n\n\n\n<p>The most important thing to know about GPT-2 is that it is trained on web pages, so the text that is generated is not always factually accurate.<\/p>\n\n\n\n<p>The more parameters you use and the bigger your training set is, the better it becomes.<\/p>\n\n\n\n<p>Here are a few more details that the devil may be hiding out in:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>This Type of Content Can Be Detected<\/strong><\/h3>\n\n\n\n<p>When OpenAI first developed the GPT technology, they said they were not going to release it because it was too dangerous.<\/p>\n\n\n\n<p>In typical Elon Musk fashion, they have walked that back, recently developing a third generation of the software they are releasing commercially.<\/p>\n\n\n\n<p>That said, there are also a series of tools available to identify generated content. Here are a few examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><a href=\"https:\/\/huggingface.co\/openai-detector\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/huggingface.co\/openai-detector\/<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/github.com\/openai\/gpt-2-output-dataset\/tree\/master\/detector\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/github.com\/openai\/gpt-2-output-dataset\/tree\/master\/detector<\/a><\/li>\n\n\n\n<li><a href=\"https:\/\/gltr.io\/\" target=\"_blank\" rel=\"noreferrer noopener\">https:\/\/gltr.io\/<\/a><\/li>\n<\/ul>\n\n\n\n<p>As these mechanisms for content generation become more popular, it would only make sense for search engines to consider using these detection libraries as part of content classifiers.<\/p>\n\n\n\n<p>I imagine it being added to subsequent iterations of the Panda algorithm if Google determines text generation to be a problem.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Editorial Teams Are Not Dead<\/h3>\n\n\n\n<p>Natural Language Generation does not supplant editorial teams \u2013 yet.<\/p>\n\n\n\n<p>As of now, you cannot generate copy and not expect to have a writing professional review it. GPT-2 focuses on word probabilities rather than identifying brand voice and tone.<\/p>\n\n\n\n<p>So, you should use it as a first draft for editorial staff to review and adjust before deploying live.<\/p>\n\n\n\n<p>Although, it is worth knowing that Google has indicated that its algorithms don\u2019t solve for accuracy.<\/p>\n\n\n\n<p>It is possible that the features of inaccurate content could satisfy processing and ranking algorithms.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">You Should Not Deploy This Ad Nauseam<\/h3>\n\n\n\n<p>Fight the urge to generate hundreds of millions of pages.<\/p>\n\n\n\n<p>Content blocks such as these are best deployed through A\/B testing to make sure they truly have a positive impact.<\/p>\n\n\n\n<p>Although, again, there are sites that have seen meaningful traffic impacts from three sentences of Madlib copy, so who am I to tell you not to blast the index with 9 billion pages of original, relevant, and potentially valuable content?<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Tools in the NLG Space<\/h2>\n\n\n\n<p>As mentioned above, the key players in the content generation space have historically been Automated Insights and Narrative Science.<\/p>\n\n\n\n<p>At the time of this writing, those companies are not doing what I\u2019ve described in this post.<\/p>\n\n\n\n<p>In fact, folks at Narrative Science are&nbsp;<a href=\"https:\/\/www.zdnet.com\/article\/we-wont-have-the-sexiest-ai-but-everything-it-says-is-true-says-narrative-science\/\" target=\"_blank\" rel=\"noreferrer noopener\">actively against<\/a>&nbsp;creating on top of technology like GPT-2 because it does not generate factual content.<\/p>\n\n\n\n<p>That\u2019s certainly not a reason to not use the technology.<\/p>\n\n\n\n<p>So here are a couple of other companies that I\u2019m aware of that are using tech similar to what I\u2019ve described here.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong><a href=\"https:\/\/inferkit.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">InferKit<\/a>:<\/strong>\u00a0The person behind InferKit previously had a demo site called TalkToTransformer which allowed you to see what GPT-2 would generate based on the prompt that you provided it. He recently took that site down and built a SaaS API that does exactly what I\u2019ve demonstrated above. InferKit allows you to use the core GPT-2 model or fine-tune it using training content that you provide.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/openai.com\/blog\/openai-api\/\" target=\"_blank\" rel=\"noreferrer noopener\">OpenAI<\/a>:<\/strong>\u00a0Elon Musk\u2019s team recently indicated that they will be rolling out a commercial API for GPT-3 soon.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/blog.marketmuse.com\/how-marketmuse-uses-ai-to-enhance-your-content\/\" target=\"_blank\" rel=\"noreferrer noopener\">MarketMuse First Draft<\/a>:<\/strong>\u00a0Closer to home, orbiting the SEO space is Marketmuse. The company released their First Draft product which uses NLG to 300-500 word drafts presumably based on the content briefs you create.<\/li>\n<\/ul>\n\n\n\n<p>You can certainly expect that there will be plenty more companies that will pop up in the near future for this type of content.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">GPT-3 Is Here &amp; It Is&nbsp;<em>Amazing<\/em><\/h2>\n\n\n\n<p>The latest version, GPT-3 boasts 175 billion parameters where GPT-2 was trained on only 1.5 billion parameters.<\/p>\n\n\n\n<p>As you might imagine, with that much of a library of text to learn from, it\u2019s even better than what we\u2019ve just played with.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter is-resized\"><img decoding=\"async\" src=\"https:\/\/cdn.searchenginejournal.com\/wp-content\/uploads\/2020\/07\/gpt-3--5f1881cf5db55.png\" alt=\"GPT-3\" class=\"wp-image-375307\" style=\"width:346px;height:316px\"\/><\/figure>\n\n\n\n<p>In fact, this content generation use case for SEO is actually a bit pedestrian compared to what people are figuring out now with GPT-3.<\/p>\n\n\n\n<p>As of late, my timeline is full of developers who have gotten early access to GPT-3; they are using it to not just write copy, but to generate code and build search engines.<\/p>\n\n\n\n<figure class=\"wp-block-embed is-type-rich is-provider-twitter wp-block-embed-twitter\"><div class=\"wp-block-embed__wrapper\">\n<blockquote class=\"twitter-tweet\" data-width=\"550\" data-dnt=\"true\"><p lang=\"en\" dir=\"ltr\">I made a fully functioning search engine on top of GPT3. <br><br>For any arbitrary query, it returns the exact answer AND the corresponding URL.<br><br>Look at the entire video. It&#39;s MIND BLOWINGLY good. <br><br>cc: <a href=\"https:\/\/twitter.com\/gdb?ref_src=twsrc%5Etfw\">@gdb<\/a> <a href=\"https:\/\/twitter.com\/npew?ref_src=twsrc%5Etfw\">@npew<\/a> <a href=\"https:\/\/twitter.com\/gwern?ref_src=twsrc%5Etfw\">@gwern<\/a> <a href=\"https:\/\/t.co\/9ismj62w6l\">pic.twitter.com\/9ismj62w6l<\/a><\/p>&mdash; Paras Chopra (@paraschopra) <a href=\"https:\/\/twitter.com\/paraschopra\/status\/1284801028676653060?ref_src=twsrc%5Etfw\">July 19, 2020<\/a><\/blockquote><script async src=\"https:\/\/platform.twitter.com\/widgets.js\" charset=\"utf-8\"><\/script>\n<\/div><\/figure>\n\n\n\n<p>The more I think about it, the use case for propaganda generation is alarming considering the upcoming election, but that\u2019s a problem outside my wheelhouse.<\/p>\n\n\n\n<p>I expect that there will be many creative use cases that will change the web in the coming months.<\/p>\n\n\n\n<p>Suffice to say, if you aren\u2019t considering using this tech, your competitors are and I\u2019m encouraging them to.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">This Is the Future of This Type of Content<\/h2>\n\n\n\n<p>Over the past two years, you may have heard me talk a lot about what I call \u201c<a href=\"https:\/\/www.slideshare.net\/ipullrank\/technical-content-optimization-149169748\" target=\"_blank\" rel=\"noreferrer noopener\">Technical Content Optimization<\/a>.\u201d<\/p>\n\n\n\n<p>We have leveraged text analysis concepts such as Latent Dirichlet Allocation and Named Entity Recognition to inform how we optimize sites and create content.<\/p>\n\n\n\n<p>I believe there is a world wherein Attention and Transformer technologies are used against a SERP or a corpus from the Common Crawl and a given site\u2019s data model to generate the perfectly optimized content against a query.<\/p>\n\n\n\n<p>I\u2019m imagining a world where you could ingest the text content on every page that ranks for your target keyword and use it to train your model.<\/p>\n\n\n\n<p>Then you could give it a prompt and it writes content that scores perfectly.<\/p>\n\n\n\n<p>At that point, though, it\u2019s all algorithms vs algorithms and it will be back on us humans to stand out creatively.<\/p>\n\n\n\n<p>It won\u2019t be about whether or not we have the resources to create content.<\/p>\n\n\n\n<p>It will be about how do we generate content that penetrates filter bubbles?<\/p>\n\n\n\n<p>The real question is, are you ready?<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/ipullrank.com\/resources\/guides-ebooks\/machine-learning-guide\"><img fetchpriority=\"high\" decoding=\"async\" width=\"801\" height=\"301\" src=\"https:\/\/ipullrank.com\/wp-content\/uploads\/2021\/07\/Ebook-Blog-Tile-blue.jpg\" alt=\"\" class=\"wp-image-13756\" srcset=\"https:\/\/ipullrank.com\/wp-content\/uploads\/2021\/07\/Ebook-Blog-Tile-blue.jpg 801w, https:\/\/ipullrank.com\/wp-content\/uploads\/2021\/07\/Ebook-Blog-Tile-blue-300x113.jpg 300w, https:\/\/ipullrank.com\/wp-content\/uploads\/2021\/07\/Ebook-Blog-Tile-blue-768x289.jpg 768w\" sizes=\"(max-width: 801px) 100vw, 801px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>My 2020 MozCon presentation was a short film created by the iPullRank team. I\u2019m not going to spoil it because I\u2019d rather you watched it, but the movie is one part \u201cBatman: The Animated Series\u201d and one part \u201cMr. Robot\u201d presented in a mixed-media format. If you haven\u2019t seen it, we\u2019ve just released a Director\u2019s [&hellip;]<\/p>\n","protected":false},"author":3,"featured_media":19466,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"content-type":"","footnotes":""},"categories":[1,260,26],"tags":[],"diagnosis-deliverable":[224],"class_list":["post-15198","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized","category-relevance-engineering","category-seo","diagnosis-deliverable-keyword-portfolio"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank<\/title>\n<meta name=\"description\" content=\"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank\" \/>\n<meta property=\"og:description\" content=\"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\" \/>\n<meta property=\"og:site_name\" content=\"iPullRank\" \/>\n<meta property=\"article:published_time\" content=\"2022-09-06T19:59:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-07-31T20:07:21+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png\" \/>\n\t<meta property=\"og:image:width\" content=\"699\" \/>\n\t<meta property=\"og:image:height\" content=\"400\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Mike King\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@ipullrankagency\" \/>\n<meta name=\"twitter:site\" content=\"@ipullrankagency\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mike King\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"22 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#article\",\"isPartOf\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\"},\"author\":{\"name\":\"Mike King\",\"@id\":\"https:\/\/ipullrank.com\/#\/schema\/person\/82831a4b9f4b8be81d5a9bfed4cb9b20\"},\"headline\":\"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2\",\"datePublished\":\"2022-09-06T19:59:29+00:00\",\"dateModified\":\"2025-07-31T20:07:21+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\"},\"wordCount\":4552,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/ipullrank.com\/#organization\"},\"image\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png\",\"articleSection\":[\"Content\",\"Relevance Engineering\",\"SEO\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\",\"url\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\",\"name\":\"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank\",\"isPartOf\":{\"@id\":\"https:\/\/ipullrank.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage\"},\"image\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage\"},\"thumbnailUrl\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png\",\"datePublished\":\"2022-09-06T19:59:29+00:00\",\"dateModified\":\"2025-07-31T20:07:21+00:00\",\"description\":\"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.\",\"breadcrumb\":{\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage\",\"url\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png\",\"contentUrl\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png\",\"width\":699,\"height\":400},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ipullrank.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ipullrank.com\/#website\",\"url\":\"https:\/\/ipullrank.com\/\",\"name\":\"iPullRank\",\"description\":\"Digital Marketing Agency in NYC\",\"publisher\":{\"@id\":\"https:\/\/ipullrank.com\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ipullrank.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/ipullrank.com\/#organization\",\"name\":\"iPullRank\",\"url\":\"https:\/\/ipullrank.com\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ipullrank.com\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2025\/03\/Logo_-_Layers.svg\",\"contentUrl\":\"https:\/\/ipullrank.com\/wp-content\/uploads\/2025\/03\/Logo_-_Layers.svg\",\"width\":177,\"height\":36,\"caption\":\"iPullRank\"},\"image\":{\"@id\":\"https:\/\/ipullrank.com\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/x.com\/ipullrankagency\",\"https:\/\/www.linkedin.com\/company\/ipullrank\/\",\"https:\/\/www.youtube.com\/@iPullRankSEO\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ipullrank.com\/#\/schema\/person\/82831a4b9f4b8be81d5a9bfed4cb9b20\",\"name\":\"Mike King\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/ipullrank.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/d57e62b40de6db99771f85cbce3ab1d29071b8cd0d643c8dcf2fc55818e1769f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/d57e62b40de6db99771f85cbce3ab1d29071b8cd0d643c8dcf2fc55818e1769f?s=96&d=mm&r=g\",\"caption\":\"Mike King\"},\"description\":\"Mike King is the Founder and CEO of iPullRank. Deeply technical and highly creative, Mike has helped generate over $4B in revenue for his clients. A rapper and recovering big agency guy, Mike's greatest clients are his two daughters: Zora and Glory.\",\"url\":\"https:\/\/ipullrank.com\/author\/ipullrank\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank","description":"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","og_locale":"en_US","og_type":"article","og_title":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank","og_description":"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.","og_url":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","og_site_name":"iPullRank","article_published_time":"2022-09-06T19:59:29+00:00","article_modified_time":"2025-07-31T20:07:21+00:00","og_image":[{"width":699,"height":400,"url":"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png","type":"image\/png"}],"author":"Mike King","twitter_card":"summary_large_image","twitter_creator":"@ipullrankagency","twitter_site":"@ipullrankagency","twitter_misc":{"Written by":"Mike King","Est. reading time":"22 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#article","isPartOf":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2"},"author":{"name":"Mike King","@id":"https:\/\/ipullrank.com\/#\/schema\/person\/82831a4b9f4b8be81d5a9bfed4cb9b20"},"headline":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2","datePublished":"2022-09-06T19:59:29+00:00","dateModified":"2025-07-31T20:07:21+00:00","mainEntityOfPage":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2"},"wordCount":4552,"commentCount":0,"publisher":{"@id":"https:\/\/ipullrank.com\/#organization"},"image":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage"},"thumbnailUrl":"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png","articleSection":["Content","Relevance Engineering","SEO"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#respond"]}]},{"@type":"WebPage","@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","url":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2","name":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2 - iPullRank","isPartOf":{"@id":"https:\/\/ipullrank.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage"},"image":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage"},"thumbnailUrl":"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png","datePublished":"2022-09-06T19:59:29+00:00","dateModified":"2025-07-31T20:07:21+00:00","description":"Interested in scalable text generation? Learn how to programmatically generate copy for ecommerce category pages using a transformer-based language model.","breadcrumb":{"@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#primaryimage","url":"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png","contentUrl":"https:\/\/ipullrank.com\/wp-content\/uploads\/2022\/09\/Frame-1597879965.png","width":699,"height":400},{"@type":"BreadcrumbList","@id":"https:\/\/ipullrank.com\/generate-data-driven-copy-for-ecommerce-category-pages-with-gpt-2#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ipullrank.com\/"},{"@type":"ListItem","position":2,"name":"How to Generate Data-Driven Copy for Ecommerce Category Pages with GPT-2"}]},{"@type":"WebSite","@id":"https:\/\/ipullrank.com\/#website","url":"https:\/\/ipullrank.com\/","name":"iPullRank","description":"Digital Marketing Agency in NYC","publisher":{"@id":"https:\/\/ipullrank.com\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ipullrank.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/ipullrank.com\/#organization","name":"iPullRank","url":"https:\/\/ipullrank.com\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ipullrank.com\/#\/schema\/logo\/image\/","url":"https:\/\/ipullrank.com\/wp-content\/uploads\/2025\/03\/Logo_-_Layers.svg","contentUrl":"https:\/\/ipullrank.com\/wp-content\/uploads\/2025\/03\/Logo_-_Layers.svg","width":177,"height":36,"caption":"iPullRank"},"image":{"@id":"https:\/\/ipullrank.com\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/x.com\/ipullrankagency","https:\/\/www.linkedin.com\/company\/ipullrank\/","https:\/\/www.youtube.com\/@iPullRankSEO"]},{"@type":"Person","@id":"https:\/\/ipullrank.com\/#\/schema\/person\/82831a4b9f4b8be81d5a9bfed4cb9b20","name":"Mike King","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/ipullrank.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/d57e62b40de6db99771f85cbce3ab1d29071b8cd0d643c8dcf2fc55818e1769f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/d57e62b40de6db99771f85cbce3ab1d29071b8cd0d643c8dcf2fc55818e1769f?s=96&d=mm&r=g","caption":"Mike King"},"description":"Mike King is the Founder and CEO of iPullRank. Deeply technical and highly creative, Mike has helped generate over $4B in revenue for his clients. A rapper and recovering big agency guy, Mike's greatest clients are his two daughters: Zora and Glory.","url":"https:\/\/ipullrank.com\/author\/ipullrank"}]}},"_links":{"self":[{"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/posts\/15198","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/comments?post=15198"}],"version-history":[{"count":0,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/posts\/15198\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/media\/19466"}],"wp:attachment":[{"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/media?parent=15198"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/categories?post=15198"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/tags?post=15198"},{"taxonomy":"diagnosis-deliverable","embeddable":true,"href":"https:\/\/ipullrank.com\/wp-json\/wp\/v2\/diagnosis-deliverable?post=15198"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}