Dynamic page queries with YQL in YahooPipes

This tutorial will show you how to query multiple pages with YQL. For example query the webpage of every news item in your RSS feed to gain extra information.

My example is going to query the tek-9 RSS feed to find out how many comments were made on each item.

1. Select the fetch feed module and enter the feed of your choice.

http://www.tek-9.org/rss/news/cs/

2. Select the “loop operator” and place the “string builder” into it

3. In the first box type the following
select * from html where url=’

3. In the second box
item.link

4. In the third box
‘ and xpath=’/html/body/div[5]/div[2]/div/h2′

Note (make sure the Xpath matches the query you are looking for). Please refer back to the earlier YQL tutorials if you get lost here

5. Assign results to item.field
(in my case, item.comments)

6. Add another Loop module

7. Place the YQL module into the loop module

8. In the body of the loop module type in the item created. (in my case “item.comments”)

9. Assign “all” to item.comments

10. This will run through every item on your RSS feed. Enjoy

Sorting YQL into RSS feed format using YahooPipes

This tutorial will show how to sort the jargon YQL produces in YahooPipes into a familiar RSS format.

1. The YQL query here is
select * from html where url=’http://www.cadred.org’ and xpath=’/html/body/div/div/div[3]/div[5]/div[3]/div/div’
The output is not useful for an RSS feed and needs sorting out before it can be used

2. Select the Loop module from operators

3. Place the “item builder” module into the “loop” module

4. This process is very difficult and requires some trial error.
On the left hand side of the “item builder” input the field you want to create.
Then from the drop down try to select the path that is the closest match to the jargon from the YQL debugger.
Sometimes you will get lucky and you will find it works first time! However sometimes the dropdown path is incorrect and you will need to try and figure out the correct path and manually enter it

5. Repeat this process for every field you want.
Use the “rename module” to create virtual fields hilst you figure out the exact path of the item you want. Then add the exact path to the item builder. This will save time having to refresh through the loop module which can be sensitive to incorrect paths.

Using YQL to select specific information in Yahoo Pipes

YQL can be used instead of the fetch page module to select specific areas of a website. It is useful for selecting very specific elements such as the comments on a topic within a forum. It helps cut back on having to use complex regex which can be a real time saver.

To follow the steps in this tutorial you must first install firebug. Click here for help.

1. Select the YQL module from the sources area of YahooPipes

2. Paste the follow YQL query into the YQL module. Replacing the website address with your own, and the xpath with your own.

select * from html where url=’http://www.tek-9.org/news/announcing_the_powercup_on_esports_heaven-2095.html’ and xpath=’/html/body/div[5]/div[2]/div/h2′

4. Make sure there are no spaces in this query.

3. Depending on the accuracy of your YQL query you made find you have several nodes (01,2,3,4) in the debugger. You can either try to make a more specific xpath by going back to firebug (this is not always possible) or move onto the next tutorial to learn how to use the “item builder”

Using firebug to find the Xpath on a webpage (needed for YQL)

1. Google “firebug” and install the addon

2. Navigate to the website wish to find the YQL for and select the blue arrow in the upper right hand side of firebug

3. Using the arrow select the area of the site you need. This should highlight an area of HTML in the debugger

4. Right click on the highlighted zone and select “copy xpath”

5. Note down/copy this “xpath” for later use!

Increasing RSS description detail by using fetch page with YahooPipes

This tutorial will demonstrate how to add more content to a RSS feeds description. Most RSS feeds lack detail to encourage you to click through to the site. This method gets around the problem!

1. Select the fetch feed module and enter in your RSS feed

2. Under operators select the “loop” module

3. Under operators select the “fetch page” module and place it in the “loop” module

4. In the “fetch page : URL ” type in “item.link”

5. Follow the instruction from here to cut down a page for the description

6. Make sure the “Loop” module is set to assign “all” to the item.description

7. Link the “loop” module to the “pipeout” module

Using the fetch page module to extract information using YahooPipes

1. Under operators select the fetch page module

2. Copy and paste the URL of the website you want into the fetch page

3. Highlight the fetchpage module and select item.description

4. In the Item.description click on the ‘Source’

5. Search for the HTML you wish to start your page from

6. Enter the HTML code in the ‘Cut content from’ box of the fetch page

7. Search For the HTML for the end of the page and put that in the “to” box

8. Check the item.description for the cut down version of the web page

Finding digits with regex using Yahoo Pipes

The regex module allows you to modify, remove or add any text or images to your RSS feed.

In this tutorial I will demonstrate my methods of tackling regex code.
I do not come from a programming background and found regex a huge challenge. My methods are not efficient so if your looking for efficiency look elsewhere
:) .

  1. Load up http://gskinner.com/RegExr/
    This is my favorite regex tester. There are others about.
  2. In this example I am going to extract the digits from a link.
    Images/Icons/Category/48.gif

    I want to cut this link to only say ‘48’

  3. Enter the text you want editing to the body of the regex tester.
  4. You want to start at the beginning of the line. The regex for this is ‘^’
  5. You then want to remove all the text up until the digits you are after.
    The regex for this is .+
  6. Now you have the entire line highlighted (this is effectively deleting the entire line)

  7. Now you want to select the digits. The regex for this is ‘\d’
    Because there are two digits you need a ‘+’. The ‘+’ Takes the next character in the line.
  8. We now have the correct amount of the line highlighted.

  9. Put a brackets around the digit part of the regex. In this case it would look like this,    ^.+(\d)
    The brackets here are making the digits an output variable which can be extracted.
  10. Now add a ‘.+’ which highlights (deletes) the rest of the line.Finished regex
    ^.+(\d)).+

  11. In the regex module put a ‘$1’ to tell the module to output the digit.

Remove images from an RSS feed using Yahoo Pipes

In this tutorial we will look at removing images from an RSS feed. Images can be bulky and slow especially when using mobile phone RSS readers.

  1. Under operators select the regex module
  2. Attach the regex module to the fetch feed module. Connect the regex module to the pipe output module
  3. Highlight the pipe output module.
  4. In the debugger click on a Title and search for ‘description’
  5. Change from ‘html’ to ‘source’ view.
  6. Search for the HTML relating to the image you want removed. Normally this will be ‘<img
  7. Load up http://gskinner.com/RegExr/
    This is my favorite online regex tester.. There are others available
  8. Copy and paste the HTML from the description into the main body of the regex tester.
  9. In the tester box type the HTML you think is appropriate to the image. You should notice the text you are typing is highlighted in the main body.
  10. If the body is highlighted over the ‘<img’ text then add in (.+)
    This will highlight the rest of the HTML in the body.
  11. The highlighted text will be removed from the description when this is added to Yahoo Pipes. At the moment the (.+) is taking to much content out of the description.

  12. Search for the end of the image HTML. In my example this is (/>)
  13. Now that the correct area of HTML is highlighted we can add the new regex code to Yahoo Pipes
  14. Select ‘item.description’ from the drop down box in the regex module
  15. In the ‘replace’ box add in your regex code. <img.+/>
  16. Check the debugger to see if the image is removed from the description.

Adding content to the Title of an RSS feed using Yahoo Pipes

The ‘regex’ module is incredibly powerful. It allows you to alter your RSS feeds. You can add in content, remove content, add images and remove images. (excellent for removing RSS adverts).

In this tutorial we will look at adding text to the title.

Adding content to the title using the Yahoo Pipe Regex operator

  1. Under operators select the regex module
  2. Attach the regex module to the fetch feed module.
  3. Select ‘Item.title’ from the drop down box
  4. In the ‘replace’ box add in the following cymbal. (^)
  5. In the ‘with’ box add in a word of your choice. (myfeed)
  6. Check your debugger to see the word of your choice has been added to the front of the title.

Why is this useful?

When you begin to take RSS feeds from multiple sites it is useful to add in the site source to the start of the title. When reading your finished RSS feed you know what sources the content is coming from.

Here is an example of this in use

http://pipes.yahoo.com/pipes/pipe.info?_id=2d1493950ee470fb1735f26d3622d21c

Sorting RSS feeds by date using Yahoo Pipes

This tutorial is a continuation on tutorial 1 and will be using the pipe created from tutorial 2.

  1. Under ‘operators’ select the sort module
  2. Connect the bottom of the ‘sort module’ to the top of the ‘pipe output’ module
  3. From the drop down box select ‘item.y:published
    Item.y:published is a date variable used by RSS feeds. Use this field for anything date related.
  4. Select ‘descending’ to the newest posts at the top of the feed. Select ‘ascending’ to put the oldest posts at the top.