Waiting..
Auto Scroll
Sync
Top
Bottom
Select text to annotate, Click play in YouTube to begin
00:00:00
in a previous video I showed you how to describe websites using chat GPT giving simple prompts for example scrapbook titles from bookstorescribe.com but this simple prompts don't always help as a
00:00:12
script website with chat GPT and in this video I'm going to show you how to scrape any website using tab gbt but not this version that you're seeing but this one the playground version that is way
00:00:24
faster than the basic version and doesn't have limitations and in this video we're going to script Amazon and Twitter creating the right prompt so let's get started okay to script websites like Amazon and Twitter we
00:00:36
cannot just copy and paste that link and say scrape this website for example with python we cannot do that because uh chat DBT is not able at least right now it's
00:00:48
not able to create that code to scrape this website but what we have to do is give some instructions so it understands how we want to scrape the website and just explain you this much better I want
00:01:01
to use this website that has a very basic HTML code to show you how this works so first I'm going to delete all of this and to do this you don't have to know so
00:01:13
much uh so much python or JavaScript but what you have to do is have some knowledge of HTML so let's do this first let's go to this website sublightscript.com movies and what we
00:01:26
have to do is right click and click on inspect so we click on inspect and then we have the uh the developer tools with the with the HTML code so what we have to do first is identify the element that
00:01:39
represents the text that we want to struct for example here I want to track the movie titles in this list so I'm gonna select this first element and we get this so we're going to track we want
00:01:51
to struct these a elements and as we can see these a elements are inside this rectangle that is represented by this element that has that UL tag and the
00:02:03
scripts hyphen list class so what we have to do is tell chat gbt to locate this element uh this element that has the Dual tag and then extract the text
00:02:15
inside the a elements so let's create the right instructions so first I'm going to copy the link of the website and say something like this so we write a script this website and here I have
00:02:28
the website that I copied before and here I'm indicating with python and beautiful soup you can change the libraries it doesn't matter for this simple example so first we start with this and then we have to specify the
00:02:42
elements that we want to scrape so in this case we want that first the this element that is the rectangle that contains all the the movie titles and then the a elements so we have to do is
00:02:55
write the following so I previously created an instruction and now I'm going to explain you so first locate the element with tab ul and class scripts list and this is the element that
00:03:07
represents the rectangle that contains all the movie titles and then I'm saying a script all the a elements inside so basically we're giving instructions to chat DPT on how to scrape this
00:03:21
website so first a rectangle and then the a elements that contains the text that we want to struct and finally what I'm going to say is get the text attribute because so far we only have
00:03:35
the a element but we want the text so get that text attribute and print so these are all the instructions and we created the right prompt that we need to get the code that scrapes this website
00:03:48
so I'm going to click on submit and then Chad DBT is gonna generate the code very fast as you can see we already finished with the cool generation and now I'm gonna copy this code
00:04:00
I'm gonna uh paste it here so I paste it here and hopefully this is gonna work it doesn't always work at the first tripod as you can see here we got the movie title so we got all the movie
00:04:13
titles listed in this website so we got all the movie titles that are here and that's how it works so now that you understand how this approach works now let's create the data inside Amazon and
00:04:27
also Twitter and approach is going to be exactly the same so we only need to give the right instructions so let's try with Amazon first so let's say that you want to extract some uh book titles so you go
00:04:40
here to amazon.com and then you write self help books and then you have the results and here what you have to do to extract the movie titles is again right click and then click on inspect then
00:04:54
what we have to do is click on this button we select this one and this is the element that represents the book titles for each book that you see here so for book number one two three and so on so all the books that you see here
00:05:07
have the same HTML element and is this one and we got this by clicking here and then selecting the title of the book so great now that we have this element we
00:05:21
have to tell chat DBT that we want to locate this element and by the way in this case I'm not gonna use a beautiful soup but I'm gonna tell Charlie that I want to use selenium and python in this
00:05:35
case selenium because it's not possible to scrape Amazon with beautiful soup but we have to tell chat DBT to use selenium so let's do this first again I'm going to copy that link that we used for this
00:05:48
uh that we got after searching self-help books then we go to playground we delete this and we give the first instruction so first let's create this website and this is the website or sorry the link
00:06:02
that we got so I copied and pasted it then I'm saying do it with python selenium and chrome driver and here I'm specifying Chrome driver because if you don't specify sometimes chat gbt
00:06:15
generates a code with Firefox so Firefox driver in my case I don't have Firefox driver I only have Chrome driver so I specify Chrome driver and then what we have to do is tell chat DBT to create an
00:06:28
expat that contains this element so this is the element that we want to locate and if you don't know selenium in selenium we'll have to create expats and basically you can build an expat with
00:06:41
the elements that you see here the attack name the attribute name and the attribute value that is in Orange so let's tell Cat GPT to do all of this so here I have instruction and I'm gonna
00:06:54
copy and paste it and then I'm going to explain what it means first wait five seconds in selenium we can wait some seconds and I'm telling to wait five seconds and then the part I was telling
00:07:08
you before locate all the elements with the following export so here I'm saying all the elements because I want to struct multiple books and I want to locate all the elements that are similar
00:07:19
to this this one highlighted in blue and then I say the span tag and basically this is the span tag that we have here then that class attribute and this is
00:07:32
that class that we have here so we're only describing the name of the elements that we have here and finally we have to specify the attribute name which is in Orange so basically we're only this
00:07:44
describing the elements that are here and then chat TPT is gonna build the export with these elements so that's everything we have to do and finally I'm
00:07:55
gonna tell get the text attribute and print it so get the text attribute because we don't want only the element but the text that is inside in this case here the text is the name of that book
00:08:08
all right now that we have all the instructions and that the prompt is completed I'm going to press on submit and hope that chart gbt is gonna create the right code that excrates this website so
00:08:20
as I told you before it doesn't always work and you have to try again when it doesn't work bad hopefully it's gonna work this time so now I'm gonna delete the code that I generated for the previous website and let's test this out
00:08:33
so I'm going to run this and here it tells me that I need the path for the Chrome driver and as you might remember uh we need to introduce a path and here I pasted my path so you have to copy and
00:08:46
paste a path where your Chrome driver is located obviously cat GPT doesn't know where your Chrome driver is located in your computer so you have to do this manually and it's the only thing that you have to do manually so now that I
00:08:59
copied and pasted my path I run this then I go to the left and uh probably this is gonna open the Chrome driver is going to open with the Amazon website then we wait five seconds
00:09:11
and hopefully all the book titles are gonna be a script so we wait five seconds and then I go to the right and well now as we can see here we have all
00:09:24
the book titles so for example the first one is stop overthinking nine steps to eliminate stress so here I go and we have this book so stop our thinking nice steps to eliminate stress and we can see
00:09:38
that all the books here works great and we print them here so we have all the books successful script and we did this without writing code but only giving the right instruction to chat GPT okay
00:09:51
finally I'm gonna show you how to describe tweets from Twitter and first what we have to do is go to twitter.com and then go to the search Twitter bar and here write anything you want so I'm going to write chat gbt and then I'm
00:10:04
gonna get all the tweets related to this word so first I have this one for example and what we have to do is inspect again the website so I right click and then click on inspect so we get the element that represents the
00:10:18
Tweet here so for example we got this element and this is not the element I actually want but the one I want is this one because this represents the whole text before we only got one wordpad here
00:10:31
this one that I'm highlighting right now represents the whole tweet so let's tell chat DBT to build an expat that locates this element so if we do that we're going to be able to extract all the
00:10:45
tweets from Twitter so let's identify First what are the elements that we're gonna use so first the attack is div and then we have to choose one attribute we have zero we have length we have class
00:10:59
and we have some others in this case I'm going to use Link because this length represents length language and all the tweets are gonna be in one language for example this one is in English and this
00:11:12
length attribute is going to be always present and that's why I'm gonna choose this one so I'm gonna tell chat DBT to create an expert that has the Deep tag and the link attribute name so let's do
00:11:25
this first I'm going to copy that the link that I got after uh searching the word chat DBT and then I'm gonna start creating that prompt so I go to playground again
00:11:38
and then I write the following so let's create this website this is the link I copied before and then I say using python selenium and chrome driver all right that's very easy now let's
00:11:49
continue and then I'm going to say something like this maximize the window in this case I'm just testing uh to maximize the window you can do this with selenium and that's why I indicated this
00:12:01
then I say wait 15 seconds and in this case I say wait 15 seconds because I'm using a version of selenium that is pretty slow and I'm doing this because as you as you might have noticed uh the
00:12:15
code generated with chat GPT is written in selenium 3 and I installed a version of soleno3 that is not so efficient you can try installing a version of selenium 3 that is very efficient I don't know
00:12:27
which version is that but unfortunately I had a version that is a bit slow but anyway then we have to continue with instruction and then I say locate all the elements that have the following X again I indicate all the elements
00:12:40
because I want all the tweets that have the following XPath then I say that dick tag and well that the Deep tag is because I have this div attack and then I have to indicate the length attribute
00:12:53
name and then I say uh the attribute name link and that everything we have to do and then I'm gonna finish this saying print the text inside these elements so I get that tweets that were uh that are
00:13:06
here in this page so now I'm gonna click on submit and hopefully we're gonna generate the code that does the job so now chat Deputy is generating all the code very fast so I'm gonna copy and I'm
00:13:20
gonna paste this here on pycharm I'm just gonna make sure that I copied everything now you go to Party charm and then I'm gonna test this out but before I do this I'm gonna copy and paste the the path of
00:13:34
my Chrome driver okay I just pasted that path of my Chrome driver and now I'm gonna test this out so I'm gonna run this and then I go to the left and now Chrome driver is going to be open so let's give it a second so now it
00:13:46
maximizes the window as I indicated instructions then it's going to open the link of Twitter and then it's gonna extract all the tweets that are in this first page so something you have to know
00:13:59
is that on Twitter we can only extract the tweets that are visible so we're gonna extract only the first tweet I think or maybe the first the second and the third and what you have to do to strike all the tweets is scroll down so
00:14:12
let's see if this is true so as you can see we only started the first tweet and this happens because on Twitter it works like that you have to load all the tweets by scrolling down and only then you have to strike all the tweets that's
00:14:26
in case you want to strike all the Tweets in the page in this case for me for this demonstration is enough to strike the first tweet as you can see here but in case you want to strike all the Tweets in the page what you have to
00:14:37
do here is add one more instruction saying something like scroll down x times and then scrape the tweets or scrape the data so that's something that I'm Gonna Leave it to you and you can test it out on your own and that's it
00:14:50
for this video in this video we learn how to create prompts to script websites using chat GPT hey thanks for watching this video don't forget to give it a like And subscribe to this channel for more content like this that's it for
00:15:04
this video I'll see you on the next one
End of transcript