Universal parser of sites

Studio Si2 has developed a universal Parser MultiParser to extract data from most sites. The software consists of a browser extension and a server part that implements the parsing process. It is a handy tool that allows you to retrieve and analyze information on web pages. It is designed to help you automate the data collection process and save time when performing repetitive tasks.

With the help of our extension, you can select and extract text data from the web page in just a few clicks. You can also set up parsing rules to specify what data you are interested in, and the extension will automatically extract this data from the analyzed pages. Currently, the parser is free with some “suming” restrictions on the volume of recovered data.

In addition, our extension provides the ability to save and export the data to various formats, such as CSV or JSON, for further processing and use. The user can download the file with the extracted data to his computer.

With a browser extension for data parsing, you will be able to improve the effectiveness of your tasks related to the analysis of information on web pages, as well as reduce the time spent on manual data collection. The browser extension for the Google Chrome browser can be downloaded from this site.

Sequence of user actions when working with parser:
1. Go to the site from which you need to collect information
2. Insert our browser extension
3. Prepare the task for parsing, filling the layouts on all expansion bookmarks
4. Send a task to the server (Start button). The parsing process can take a long time! After its completion, the "Start" and "Download" buttons are unlocked
5. Download the resulting file (zip archive) with the "Download" button by the computer within 3 days. After this period, the file will be unavailable for download.

You can use parser without registration. But this is inconvenient: the project of parsing the site disappears when the expansion is removed from the browser.
Without registration, you can create no more than 2 projects. When registering on the site, a user account is created, where the prepared tasks for parsing are stored for all sites, as well as the resulting files that can be downloaded later. 5 projects can be created at registration. Registration is free. You only need to specify the Name/Login and the email address.
Restrictions: the maximum number of projects without registration is 2, with registration - 5; the maximum number of retried data (cards) without registration - 200, with registration - 500.

Procedure for creating a project and preparing tasks in browser extension

Launch a site of interest. Enable the extension: click on the extension icon on the right in the browser address bar. The maximum number of projects for site parsing depends on the tariff plan: without registration - 2, with registration - 5; the maximum number of retried data (cards) without registration - 200, with registration - 500. In each project, you can prepare several tasks for parsing (not limited).
The extension is multilingual. The language is installed automatically in line with the browser interface language. So far, 2 languages have been implemented: English and Russian.
Recommendations for working with extesion can be viewed by clicking on the "Help" button, which will open the corresponding page on the si2.biz website.
tab1.webp
Tab "Base urls"
Contains the initial pages for parsing.
Fill in the fields "Insert name" and "Insert link" and press the "Add" button. To fill the "Insert link" field, you can click LeftClick on this field or simply copy the contents of the browser address bar. The URL must be present (https://)! Field "Insert name" - "decorative", filled by hand. You can enter several basic URLs in the table, one of them you need to select in the Active Base URL field (falling list) to start parsing. You can also choose an active basic URL on the desired line of the table.
tab2.webp
Tab "Subcats"
If the base category (base URL) has subcategories, check the "Is subcategories" checkbox and fill in the "Subcat 1" and "Subcat 2" fields on the "Level 1" tab: click on the Subcat 1 field (focus), then CTRL + LeftClick on any subcategory. The field in focus is highlighted with a frame. The following lines should appear: Href =, ClassList =, Xpath = - the program has recorded a block with a link to a subcategory. Similarly, click on the Subcat 2 field, then CTRL + LeftClick on any other subcategory. Then click on the "Subcats block" button so that the program records the block on the page with all subcategories. The following lines will appear under the button: ClassLis t=, Xpath =, Blocks count on page =, Subcats count on page =.
If the categories on the page also have subcategories, then go to the page of any category again, on the "Subcats" tab, select the "Level 2" tab and repeat all the actions similar to the "Level 1" tab .
tab3.webp
Tab "Cards"
If for the scrapping there is enough data in the product card in the general list of cards (no need to open a screen with a detailed description of the product), nothing needs to be filled on this tab.
If you need to follow the link to the page of the detailed description of the object (card) for the scratching, we note the "Follow Link links" checkbox. Floor the fields "Card link 1" and "Card link 2" : click on the field "Card link 1" (focus - the field is distinguished by a frame.), then on any card CTRL + LeftClick. Lines should appear: Href =, ClassList =, Xpath =, Cards counts on page = - the program fixed the block with a link to the card.
Similarly click on the field "Card link 2", then CTRL + LeftClick on any other card.
Then click on the "Cards block" button so that the program fixes the block on the page with all the cards. Under the button will appear the lines: ClassList=, Xpath =, Blocks count on page =, Cards count on page =.
tab4.webp
Tab "Details"
Contains a list of characteristics of objects (cards of goods) subject to crab. 2 options are possible.
When you go to the detailed description page of the object (card): To fill the table sequentially click (CTRL + LeftClick) in the fields-characteristics of the object (card) - the field "Insert Xpath" is filled in, manually fit the name of this field / character studies in the "Insert name" field and click the "Add" button. It is possible when some characteristics are placed on the Doppen. screens that are opened by the link. For example, a detailed description of the product. This option is also implemented: we click on the "Additional page 2" tab, click on the "Insert link title" box, click CTRL + LeftClick on the link. Opens up. screen and fill in the Dop. Characteristics are similar to the main screen. Possible 2 added screen.
Without going to the page of the detailed description of the object (card) , when there are enough characteristics of objects on the page with a list (cards), for example, it is enough for us to "remote" only the name of the product, brand and price. In this case, fill in the table similar to the previous paragraph, but before remembering the characteristic, you need to make a change in Xpath. Since all cards on the page have one Xpath template to characterize an object (e.g. name), they will differ only in one number in the Xpath template! It is necessary to determine this place in the template and replace the number with a sequence "???" (without quotes). For example, in Xpath ".../div[1]/div[4]/div[1]/div/article[14]/div/div[14]/div/h2/h2[1]/span[1][2]" need to be replaced by "article[14]" with "article[???]".
tab5.webp
Tab "Pagination"
3 variants of pagination are processed:
Page range - set the page range. On the card list page, find a string with pagination (usually at the bottom of the page). Fill in the "First page" and "Last page" fields, click (CTRL + LeftClick) on any field with page number or "Last page" to fill the "Pattern" field.
Button like "More" - on the page with a list of cards at the bottom to find the "More" button or similar in meaning, click CTRL + LeftClick on it. The fields of Title, Id, Classlist (which are filled on the page) will be filled.
Infinite scroll is an endless scroll on the page.

You need to select one of the options, fill the fields and press the "Save" button.
tab6.webp
Tab "Settings"
Select the format of stored data and press "Save"
To send the parsing task to the server, click the "Start" button. At the same time, the "Start" and "Download" buttons will become inactive before the end of the parsing process. After the end of parsing, the buttons will become active and you can download the resulting file by the "download" button within 3 days.
To remove the expansion window from the screen - press the "Hide" button. To return the extension window, click on the extension icon on the right in the browser address bar.
This project will be stored in the browser until the extension is removed from the list of extensions (see. "Management of extensions"). You can also remove the project on the "Remove" button.

We offer services for parsing sites without limit restrictions on the volume of data and parsing of sites on a regular basis.

Contact us by email si2pars@gmail.com