Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6028761
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T04:47:16+00:00 2026-05-23T04:47:16+00:00

everyone. I need to parse a webpage which has java cookies set for every

  • 0

everyone.

I need to parse a webpage which has java cookies set for every link. I can parse the normal search and every product is shown and imported to a mysql database.

I was able to scrape from a search result every product with its elements with this code:

This is what I have:

    require 'rubygems'
    require 'logger'
    require 'mechanize'
    require 'mysql2'
    
    agent = WWW::Mechanize.new{|a| a.log = Logger.new(STDERR) }
    #agent.set_proxy('a-proxy', '8080')
    agent.read_timeout = 60
    
    def add_cookie(agent, uri, cookie)
      uri = URI.parse(uri)
      Mechanize::Cookie.parse(uri, cookie) do |cookie|
        agent.cookie_jar.add(uri, cookie)
      end
    end
    
    
    # get main page
    page = agent.get "http://www.site.com.mx"
    
    # get login form
    form = page.forms.first
    form.correo_ingresar = "user"
    form.password = "password"
    
    # submit login form
    page = agent.submit form
    
    # parse cookies
    myarray = page.body.scan(/SetCookie\(\"(.+)\", \"(.+)\"\)/)
    
    # set session cookies
    myarray.each do |item|
      add_cookie(agent, 'http://www.site.com.mx', "#{item[0]}=#{item[1]}; path=/; domain=www.site.com.mx")
    end
    # show 1000 search results per page
    add_cookie(agent, 'http://www.site.com.mx', "tampag=1000; path=/; domain=www.site.com.mx")
    
    # order results
    add_cookie(agent, 'http://www.site.com.mx', "orden_articulos=existencias asc; path=/; domain=www.site.com.mx")
    
    # section results
    add_cookie (agent, 'http://www.site.com.mx', "codigoseccion_buscar=14; path=/; domain=www.site.com.mx")
    
    # get main page
    page = agent.get "http://www.site.com.mx/tienda/index.php"
    
    search_form = page.forms.first
    
    search_result = agent.submit search_form
    
    doc = Nokogiri::HTML(search_result.body)
    
    rows = doc.css("table.articulos tr")
    
    i = 0
    details = rows.collect do |row|
      detail = {}
      [
        [:sku, 'td[3]/text()'],
        [:desc, 'td[4]/text()'],
        [:qty, 'td[5]/text()'],
        [:qty2, 'td[5]/p/b/text()'],
        [:price, 'td[6]/text()']
      ].collect do |name, xpath|
        detail[name] = row.at_xpath(xpath).to_s.strip
      end
      i = i + 1
      detail
    end
    
    # walk through paginator links
    links = doc.css("a.paginar").map {|l| "http://www.site.com.mx#{l['href']}"}.uniq!
    
    links.each do |l|
        page = agent.get l
    
        doc = Nokogiri::HTML(page.body)
    
        rows = doc.css("table.articulos tr")
    
        rows.each do |row|
            detail = {}
            [
                    [:sku, 'td[3]/text()'],
                    [:desc, 'td[4]/text()'],
                    [:qty, 'td[5]/text()'],
                    [:qty2, 'td[5]/p/b/text()'],
                    [:price, 'td[6]/text()']
            ].collect do |name, xpath|
                    detail[name] = row.at_xpath(xpath).to_s.strip
            end
            details << detail
        end
    end
    
    # update db
    client = Mysql2::Client.new(:host => "localhost", :username => "myusername", :password => "mypassword", :database => "mydatabase")
    
    details.each do |d|
        if d[:sku] != ""
            price = d[:price].split
    
            if price[1] == "D"
                currency = 144
            else
                currency = 168
            end
    
            cost = price[0].gsub(",", "").to_f
    
            if d[:qty] == ""
                qty = d[:qty2]
            else
                qty = d[:qty]
            end 
    
            results = client.query("SELECT * FROM jos_vm_product WHERE product_sku = '#{d[:sku]}' LIMIT 1;")
            if results.count == 1
                product = results.first
    
                            client.query("UPDATE jos_vm_product SET product_sku = '#{d[:sku]}', product_name = '#{d[:desc]}', product_desc = '#{d[:desc]}', product_in_stock = '#{qty}' WHERE product_id = 
    #{product['product_id']};")
    
                client.query("UPDATE jos_vm_product_price SET product_price = '#{cost}', product_currency = '#{currency}' WHERE product_id = '#{product['product_id']}';")
            else
                client.query("INSERT INTO jos_vm_product(product_sku, product_name, product_desc, product_in_stock) VALUES('#{d[:sku]}', '#{d[:desc]}', '#{d[:desc]}', '#{qty}');")
                last_id = client.last_id
    
                client.query("INSERT INTO jos_vm_product_price(product_id, product_price, product_currency) VALUES('#{last_id}', '#{cost}', #{currency});")
            end
        end
    end

Now I dont want to search I want to parse from the Categories list:
link to main page:http://www.site.com.mx/tienda/articulos.php?opcion=lineas&seccion_mostrar=11
this shows a table like this (everything contains links)
The top name: ACCESORIOS is a link to the category ACCESORIOS, and the bold names listed bellow is the subcategories, and the ones bellow the bold names are brands. If I click on ACCESORIOS it will show every brand and every subcategory mixed up, and so on.

ACCESORIOS
Accesorios Multimedia(6)
ACTECK DE MEXICO (5), MANHATTAN (1)
Accesorios P/impres. Punto De Venta(1)
EPSON CORPORATION (1)
Accesorios Para Cableados De Patch Panels(1)
INTELLINET NETWORK SOLUTIONS (1)
Accesorios Para Camaras Digitales(1)
MANHATTAN (1)
Accesorios Para Computadoras De Escritorio(32)
ACTECK DE MEXICO (2), GENERICA (1), MANHATTAN (28), TARGUS (1)
Accesorios Para Computadoras Portatiles(60)
ACTECK DE MEXICO (3), GENIUS (2), HP COMERCIAL (2), HP IMPRESION (1), MANHATTAN (17), PERFECT CHOICES (32), SOLIDEX (1), TARGUS (1), TECH ZONE (1)
Accesorios Para Ipod(3)
ACTECK DE MEXICO (1), PERFECT CHOICES (2)
Accesorios Para Mesas(3)
MANHATTAN (2), PERFECT CHOICES (1)
Accesorios Para Redes(13)
INTELLINET NETWORK SOLUTIONS (5), MANHATTAN (8)
Accesoriso Para Celulares(14)
BLACKBERRY (14)
Adaptador Bluetooth(6)
ACTECK DE MEXICO (1), MANHATTAN (2), PERFECT CHOICES (3)
Adaptadores Para Mouse Y Teclado(3)
MANHATTAN (2), PERFECT CHOICES (1)
Audifono/diademas Y Microfonos(49)
ACTECK DE MEXICO (14), BTO (1), GENIUS (3), LOGITECH (2), MANHATTAN (11), PERFECT CHOICES (18)

Here is the code for the Table that has cookies for each link, that is why I have been having a hard time scraping this.

    <table width="95%" cellspacing="0" cellpadding="3" border="0">
    <tbody>
    <tr>
    <td valign="top" align="left" style="font-family: verdana; font-size: 12px" colspan="2"><a onClick="fijar_filtro('codigoseccion_buscar','11')" href="javascript:void(0)" class="busquedas"><b>ACCESORIOS</b></a></td>
    </tr>
    <tr>
    <td width="20" valign="top" align="left"></td>
    <td valign="top" align="left" style="font-family: verdana; font-size: 12px"><a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','338')" href="javascript:void(0)" class="busquedas"><b>Accesorios Multimedia</b>(6)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','338');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','602');" href="javascript:void(0)" class="busquedas">ACTECK DE MEXICO (5)</a>, <a onClick="SetCookie('codigolinea_buscar','338');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','540')" href="javascript:void(0)" class="busquedas"><b>Accesorios P/impres. Punto De Venta</b>(1)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','540');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','106');" href="javascript:void(0)" class="busquedas">EPSON CORPORATION (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','542')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Cableados De Patch Panels</b>(1)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','542');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','635');" href="javascript:void(0)" class="busquedas">INTELLINET NETWORK SOLUTIONS (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','361')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Camaras Digitales</b>(1)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','361');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','277')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Computadoras De Escritorio</b>(32)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','277');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','602');" href="javascript:void(0)" class="busquedas">ACTECK DE MEXICO (2)</a>, <a onClick="SetCookie('codigolinea_buscar','277');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','530');" href="javascript:void(0)" class="busquedas">GENERICA (1)</a>, <a onClick="SetCookie('codigolinea_buscar','277');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (28)</a>, <a onClick="SetCookie('codigolinea_buscar','277');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','586');" href="javascript:void(0)" class="busquedas">TARGUS (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','357')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Computadoras Portatiles</b>(60)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','602');" href="javascript:void(0)" class="busquedas">ACTECK DE MEXICO (3)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','167');" href="javascript:void(0)" class="busquedas">GENIUS (2)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','694');" href="javascript:void(0)" class="busquedas">HP COMERCIAL (2)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','107');" href="javascript:void(0)" class="busquedas">HP IMPRESION (1)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (17)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','532');" href="javascript:void(0)" class="busquedas">PERFECT CHOICES (32)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','212');" href="javascript:void(0)" class="busquedas">SOLIDEX (1)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','586');" href="javascript:void(0)" class="busquedas">TARGUS (1)</a>, <a onClick="SetCookie('codigolinea_buscar','357');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','691');" href="javascript:void(0)" class="busquedas">TECH ZONE (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','1302')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Ipod</b>(3)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','1302');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','602');" href="javascript:void(0)" class="busquedas">ACTECK DE MEXICO (1)</a>, <a onClick="SetCookie('codigolinea_buscar','1302');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','532');" href="javascript:void(0)" class="busquedas">PERFECT CHOICES (2)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','1175')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Mesas</b>(3)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','1175');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (2)</a>, <a onClick="SetCookie('codigolinea_buscar','1175');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','532');" href="javascript:void(0)" class="busquedas">PERFECT CHOICES (1)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','292')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Redes</b>(13)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','292');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','635');" href="javascript:void(0)" class="busquedas">INTELLINET NETWORK SOLUTIONS (5)</a>, <a onClick="SetCookie('codigolinea_buscar','292');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (8)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','1378')" href="javascript:void(0)" class="busquedas"><b>Accesoriso Para Celulares</b>(14)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','1378');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','714');" href="javascript:void(0)" class="busquedas">BLACKBERRY (14)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','1313')" href="javascript:void(0)" class="busquedas"><b>Adaptador Bluetooth</b>(6)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','1313');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','602');" href="javascript:void(0)" class="busquedas">ACTECK DE MEXICO (1)</a>, <a onClick="SetCookie('codigolinea_buscar','1313');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (2)</a>, <a onClick="SetCookie('codigolinea_buscar','1313');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','532');" href="javascript:void(0)" class="busquedas">PERFECT CHOICES (3)</a><br>
    <br>
    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','555')" href="javascript:void(0)" class="busquedas"><b>Adaptadores Para Mouse Y Teclado</b>(3)</a><br>
    <a onClick="SetCookie('codigolinea_buscar','555');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','585');" href="javascript:void(0)" class="busquedas">MANHATTAN (2)</a>, <a onClick="SetCookie('codigolinea_buscar','555');SetCookie('codigoseccion_buscar','11');fijar_filtro('codigomarca_buscar','532');" href="javascript:void(0)" class="busquedas">PERFECT CHOICES (1)</a><br>
    </td>
    </tr>
    </tbody>
    </table>

so the question is what do I add to my code to be able to access every link? if it uses java cookies.

Cookies used:
Name , Value Ranges
codigoseccion_buscar, 11-30
codigomarca_buscar, 100-736
codigolinea_buscar, 15-1385

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T04:47:17+00:00Added an answer on May 23, 2026 at 4:47 am

    I managed to scrape one of those links contents by adding cookies to my Ruby code:

        # set cookies
        add_cookie(agent, 'http://www.site.com.mx', "codigoseccion_buscar=11; path=/; domain=www.site.com.mx")
    
        add_cookie(agent, 'http://www.site.com.mx', "codigolinea_buscar=; path=/; domain=www.site.com.mx")
    
        add_cookie(agent, 'http://www.site.com.mx', "codigomarca_buscar=; path=/; domain=www.site.com.mx")
    
        add_cookie(agent, 'http://www.site.com.mx', "textobuscar=; path=/; domain=www.site.com.mx")
    

    weird thing was that if I only added one of those cookies it would not work. so I had to add all , even tho they dont have any values, because every link has a cookie, so that way it would delete or clear saved cookie.

    now I need to scrape those cookies use it as variable and do a loop or something, anybody can help me?

    <a onClick="SetCookie('codigomarca_buscar','');fijar_filtro('codigolinea_buscar','542')" href="javascript:void(0)" class="busquedas"><b>Accesorios Para Cableados De Patch Panels</b>(1)</a><br>
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I can't create a NT account for everyone that need access to the reports.
Problem is, that I need to get non-standard font (not everyone has it on
Hey everyone. I need to write a POSIX program to search through an entire
i need to use base 64 to encrypt some data. but while everyone can
I need to grant rights to Windows user group Everyone to the HKCR hive
I need to find a reg ex that only allows alphanumeric. So far, everyone
Everyone has accidentally forgotten the WHERE clause on a DELETE query and blasted some
Everyone (at least everyone who uses a compiled language) has faced compilation errors but
Everyone has this huge massively parallelized supercomputer on their desktop in the form of
I need to parse an HTML file and i've got something like this: <TAG1>

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.