Follow

How to create an SGML/XML filter

You can disable the Start screen under File>Options>General>Start-up options.
This is especially helpful when you have imported a great number of XML files, which can blow up the size of the filter to several hundred megabytes.

Unlike other file formats (FrameMaker, Word, Excel, etc.), SGML (Standardized General Markup Language) and XML (eXtensible Markup Language) are not real file formats; they are a standard for tagging files, and for defining those tags. Since every set of SGML/XML files uses a different set of tags, an SGML/XML filter must be created for every set of SGML/XML files.

SGML defines a standard for creating DTDs (Document Type Definition). For example, the World Wide Consortium (W3C) has DTDs for the various specifications of HTML; this means that HTML is a markup language defined according to SGML rules. You will probably be somewhat familiar with the structure and tags in HTML, so we will use it as an example in our explanations.

Tags and Attributes

SGML files are text files that encode formatting, layout, and image information using tags. Tags are in the format of:

<TAGNAME ATTRIBUTE1="VALUE1" ATTRIBUTE2="VALUE2">

A tag can contain attributes that further define a value of the tag.

Because Déjà Vu X2 Professional does not need to interpret tags and keys, there are only two pieces of information that you must provide:

  • Embeddable tags: An embeddable tag is one that can appear in the middle of a segment, and Déjà Vu should not split the segment before or after this tag. For example, the <B> and <I> tags in HTML (which specify bold and italic attributes) are embeddable, while the <P> tag (which specifies a paragraph change) is not.
  • Extractable text between tags: It is possible to define whether text between certain tags is extractable (default) or not extractable, i.e., not translatable. For example, if text between certain tags always contains dates or numbers that may not need to be translated, you can choose to embed that text.
    If you define a text between tags with nested subtags (for example, <tag1> text <tag2> text </tag2></tag1>) as non-extractable, text between the nested subtags will not be extracted either.
    However, attributes (see below), are not affected by a choice to not extract text between tags (for example, the attributetext in: <tag1 attribute="attributetext"> text </ tag1> would be extracted if so defined, even though the text of tag1 may be defined as not extractable).
  • Extractable attributes: certain tags may contain attributes whose values are translatable, and must therefore be extracted. For example, the <IMG> tag in HTML (which inserts an image into the text) has the ALT=“[alternate text for the image]” attribute, which specifies the text to display if the browser cannot load the image. This text should be translated, so the attribute is extractable.

Déjà Vu offers two possibilities for creating an SGML filter file:

  • from the DTD file
  • directly from the SGML/XML files

In general, it is advisable to combine the methods to allow for greater accuracy of the SGML filter.

Click on the version of Déjà Vu that you are using below to see how to create an XML filter.

Déjà Vu X2 Déjà Vu X3
   
To create an SGML filter from SGML/XML files To create an SGML filter from the DTD file
   
  1. Open Déjà Vu X2.
  2. Click the xmlfilter001 button on the toolbar.
    The New File dialog appears.

    Double-click SGML/XML Filter, or select it and click OK.
    –Or–
    On the File menu, click New>SGML/XML Filter.
  3. The New SGML/XML Filter Wizard appears.
  4. Click Next.
  5. The wizard prompts you to create an SGML/XML filter.
  6. Click Create, select a folder in which you want to have the SGML/XML filter saved, and type a name for the filter.
  7. Click Open.
  8. Click Next. The wizard prompts you to either specify a DTD file or to generate the SGML/XML filter directly from an SGML/XML file.
  9. For this exercise we will use an XML file. Click Next.
  10. Click the Add button.
  11. Select your SGML/XML file(s) and click Open.
  12. Click Next.
  13. The New SGML/XML Filter Wizard displays the current settings.
  14. Click Finish.
  15. The wizard displays the import progress.
  16. Click Finish after the import process has finished.
  17. The Tags and Attributes tab is displayed.
    The newly created SGML filter has made the following definitions:
    • all the tags of the imported SGML/XML file(s) are interpreted as embaddable (by having the ...<>... column in the Tags field unchecked),
    • all text between tags is defined as extractable (by having the <>...<> column in the Tags field checked), and
    • all attributes are defined as embeddable (by not having the <=...> column in the Attributes field checked).
  18. You will have to review each of these tags and attributes and decide whether the default setting is appropriate or not. To ease that process, Déjà Vu X2 Professional displays examples from the occurrences of the tags and attributes in the respective file(s) under and to the right of Examples.
    • Typically, the vast majority of tags should not be embedded. Below is an example of tags that could be embedded; the ...<>... column in the Tags fields is therefore checked:
    • Typically, the majority of text between tags should be extracted. Below are examples of text that should probably not be extracted; the <>...<> column in the Tags field is therefore unchecked:
    • Most of the attributes will only contain internal, non-translatable information. Leaving the checkboxes in the <=...> column in the Attributes field unchecked ensures that they will not be extracted, i.e., displayed in the Déjà Vu X2 project. For those that should be translated, add a check mark.
  19. When you are finished defining the tags and attributes, you can reduce the size of your SGML/XML filter by deleting all the examples. This is especially helpful when you have imported a great number of SGML or XML files, which can blow up the size of the filter to several hundred megabytes.
  20. Select Edit>Delete All Examples.
    xmlfilter017
    Other SGML/XML-specific options include the deletion of all tags and attributes and all entities. These options are only used on very rare occasions.
  21. Select the Entities and Characters tab.

    Here you can find definitions of the Begin Tag and the End Tag as well as for the Begin Entity and the End Entity. These settings are the standard settings and typically do not have to be changed.
    On this tab you can also find a great number of pre-defined special characters. The definition of each will determine how Déjà Vu X2 will display the character and export it again. The copyright sign (©), for instance, will be displayed as &copy; in the SGML file before and after the translation, but as © in the project file.
  22. In the process of generating the SGML/XML filter file below, Déjà Vu X2 has detected one character, a y with an accent (ý), that is not in its predefined lists of special characters. With the appropriate Unicode sequence, you can now define how you want this character to be handled.
  23. Type the appropriate Unicode sequence into the field to the right of U+. The correct character will now be displayed in the adjacent field.
  24. Click Replace.
  25. The new entity will now be displayed correctly as ý in Déjà Vu X2, but as &yac; when exported.
  1. Open Déjà Vu X2.
  2. Click the xmlfilter001 button on the toolbar.
    The New File dialog appears.

    Double-click SGML/XML Filter, or select it and click OK.
    –Or–
    On the File menu, click New>SGML/XML Filter.
  3. The New SGML/XML Filter Wizard appears.
  4. Click Next.
  5. The wizard prompts you to create an SGML/XML filter.
  6. Click Create, select a folder in which you want to have the SGML/XML filter saved, and type a name for the filter.
  7. Click Open.
  8. Click Next. The wizard prompts you to either specify a DTD file or to generate the SGML/XML filter directly from an SGML/XML file.
  9. For this exercise, we will use an existing DTD file. Click Select and select your DTD file.
  10. Click Open. The wizard displays the current settings.
  11. Click Next.
  12. The wizard prompts you to specify the location of your SGML/XML files.
  13. Click Next.
  14. The New SGML/XML Filter Wizard displays the current settings.
  15. Click Finish.
  16. The Tags and Attributes tab is displayed.
    The newly created SGML filter has made the following definitions:
    • most of the tags that are listed in the DTD file are interpreted as extractable (by having the ...<>... column in the Tags fields unchecked),
    • all text between tags is defined as extractable (by having the <>...<> column in the Tags field checked), and
    • all attributes are defined as extractable (by having the <=...> column in the Attributes field checked).
  17. You can now choose to review each of these tags and attributes and decide whether the default setting is appropriate or not.
    To ease this process, it is advisable to combine the DTD import with the import of some representative SGML/XML files. Déjà Vu X2 Professional will then display examples from the occurrences of the tags and attributes in the respective file(s) under and to the right of Examples.
To create an XML filter from XML files To create an XML filter from the DTD file
   
  1. Open Déjà Vu X3.
  2. The Start screen appearsnote.
  3. Select the XML Filter button under XML Filter.

    -Or-
    Select File>New or click the New button in the Quick Access Toolbar if the Start screen is disabled.
  4. The New File dialog appears.
  5. Double-click SGML/XML Filter, or select it and click OK.
  6. You are prompted to create an XML filter.
  7. Click Create, select a folder in which you want to have the XML filter saved, and type a name for the filter.
  8. Click Next. The wizard prompts you to either specify a DTD file or to generate the XML filter directly from an XML file.
  9. For this exercise we will use an XML file. Click Next.
  10. Click the Add button.
  11. Select your XML file(s) and click Open.
  12. Click Next.
  13. The wizard displays the import progress.
  14. Click Close after the import process has finished.
  15. The Tags and Attributes tab is displayed.
    The newly created XML filter has made the following definitions:
      • all the tags of the imported XML file(s) are interpreted as embeddable (by having the ...<>... column in the Tags field unchecked),
      • all text between tags is defined as extractable (by having the <>...<> column in the Tags field checked), and
      • all attributes are defined as embeddable (by not having the <=...> column in the Attributes field checked).
  16. You will have to review each of these tags and attributes and decide whether the default setting is appropriate or not. To ease that process, Déjà Vu X3 displays examples from the occurrences of the tags and attributes in the respective file(s) under and to the right of Examples.
    • Typically, the vast majority of tags should not be embedded. Below is an example of tags that could be embedded; the ...<>... column in the Tags fields is therefore checked:
    • Typically, the majority of text between tags should be extracted. Below are examples of text that should probably not be extracted; the <>...<> column in the Tags field is therefore unchecked:
    • Most of the attributes will only contain internal, non-translatable information. Leaving the checkboxes in the <=...> column in the Attributes field unchecked ensures that they will not be extracted, i.e., displayed in the Déjà Vu X3 project. For those that should be translated, add a check mark.
  17. When you are finished defining the tags and attributes, you can reduce the size of your XML filter by deleting all the examplesnote.
  18. Select Home>Delete All Examples.

    Other XML-specific options include the deletion of all tags and attributes and all entities. These options are only used on very rare occasions.
  19. Select the Entities and Characters tab.

    Here you can find definitions of the Begin Tag and the End Tag as well as for the Begin Entity and the End Entity. These settings are the standard settings and typically do not have to be changed.
    On this tab you can also find a great number of pre-defined special characters. The definition of each will determine how Déjà Vu X3 will display the character and export it again. The copyright sign (©), for instance, will be displayed as &copy; in the XML file before and after the translation, but as © in the project file.
  20. In the process of generating the XML filter file below, Déjà Vu X3 has detected one character, a y with an accent (ý), that is not in its predefined lists of special characters. With the appropriate Unicode sequence, you can now define how you want this character to be handled.
  21. Type the appropriate Unicode sequence into the field to the right of U+. The correct character will now be displayed in the adjacent field.
  22. Click Add.
  23. The new entity will now be displayed correctly as ý in Déjà Vu X3, but as &yac; when exported.
  1. Open Déjà Vu X3.
  2. The Start screen appearsnote.
  3. Select the XML Filter button under XML Filter.

    -Or-
    Select File>New or click the New button in the Quick Access Toolbar if the Start screen is disabled.
  4. The New File dialog appears.
  5. Double-click SGML/XML Filter, or select it and click OK.
  6. You are prompted to create an XML filter.
  7. Click Create, select a folder in which you want to have the XML filter saved, and type a name for the filter.
  8. Click Next. The wizard prompts you to either specify a DTD file or to generate the XML filter directly from an XML file.
  9. For this exercise, we will use an existing DTD file. Click Select and select your DTD file.
  10. Click Save. The wizard displays the current settings.
  11. Click Next.
  12. The wizard prompts you to specify the location of your XML files.
  13. Since there is no reason to select any for the purpose of this tutorial click Next.
  14. You are informed that the project has been created.
  15. Click Close.
  16. The Tags and Attributes tab is displayed.
    The newly created XML filter has made the following definitions:
      • all the tags of the imported XML file(s) are interpreted as embeddable (by having the ...<>... column in the Tags field unchecked),
      • all text between tags is defined as extractable (by having the <>...<> column in the Tags field checked), and
      • all attributes are defined as embeddable (by not having the <=...> column in the Attributes field checked).
  17. You can now choose to review each of these tags and attributes and decide whether the default setting is appropriate or not.
Was this article helpful?
0 out of 0 found this helpful
Have more questions? Submit a request

Comments

Powered by Zendesk