Pages

Tuesday, November 18, 2025

1st contact, conversation online

[Boilerplate for future online conversation
Boilerplate pro budoucí online konverzaci

partner.
partner.]


We can try online chat. Here is how it works.
Můžeme zkusit online chat. Zde je návod, jak to funguje.

You decide when you can meet online with me. Tell
Vy rozhodujete, kdy se se mnou můžete setkat online. Sdělit

me date and time. If I am available, I will tell
mě datum a čas. Pokud budu k dispozici, řeknu

you. I do not have meetings before 10 in the
Vy. Nemám schůzky před 10

morning.
ráno.

Five or ten minutes before the scheduled meeting,
Pět nebo deset minut před plánovanou schůzkou,

I will send you an email with a link.
Pošlu vám e-mail s odkazem.

You click on the link and enter the Google Meet
Kliknete na odkaz a vstoupíte do Google Meet

space. There we can see and talk with each other.
plocha. Tam se můžeme vidět a mluvit spolu.

While conversing, I make chat notes. After we say
Během konverzace si dělám poznámky do chatu. Poté, co řekneme

good-bye, I clean up the notes and send a copy
sbohem, uklidím poznámky a posílám kopii

for you to study or review.
abyste si je mohli prostudovat nebo prohlédnout.

These meetings are conversations, not classroom
Tato setkání jsou rozhovory, nikoli třída

lessons. So we talk about what you want to talk
lekce. Takže mluvíme o tom, o čem chcete mluvit

about. I ask questions. You ask questions. We
o. kladu otázky. Kladete otázky. My

tell lies. It goes like that till the end. It is
lhát. Tak to jde až do konce. To je

a 45 minute meeting. The first is always free.
schůzka na 45 minut. První je vždy zdarma.

When do you want to meet?
Kdy se chceš sejít?


==================================================
Display Order: Source first
Splitting: Character
Service: Google
Source Characters: 806
Translated Characters: 767
Source Words: 160
Translated Words: 134
Generated: 18 November 2025

Best practices text segmentation

Best practices for text segmentation in translation include using Computer-Assisted Translation (CAT) tools to break text into meaningful, logical units like sentences or phrases, ensuring segments are concise and retain complete units of meaning to fit short-term memory, and prioritizing consistency through defined rules and Translation Memories (TMs) to improve quality and efficiency. Additionally, proper source document formatting, including clear structure and avoiding unnecessary formatting, helps CAT tools parse content correctly and create clearer segments. 

Understanding Text Segmentation
  • Definition: Text segmentation is the process of dividing a source text into smaller, translatable units, called "segments". 
Best Practices for Text Segmentation

  1. Break into meaningful units: Segment text at natural linguistic boundaries, such as sentences, rather than arbitrary points like every 5-10 words.
  2. Keep segments concise: Segments should be short enough for a translator to easily retain the information in their short-term memory.
  3. Ensure completeness: Each segment should represent a complete thought or unit of meaning to avoid confusion and an unnatural translation.
  4. Utilize CAT Tools & Translation Memories: CAT tools, when configured with appropriate segmentation rules, help identify and manage these units. This promotes consistency and allows for the reuse of previously translated segments in a Translation Memory.
  5. Format source content well: Clear, well-organized source documents with consistent formatting (e.g., proper use of paragraph breaks, hard returns, and page breaks) ensure that CAT tools can parse the content correctly and produce unambiguous segments.
  6. Define segmentation rules: Establish clear rules, often in a format like SRX (Segmentation Rules Exchange), to define how text should be broken down for specific projects or language pairs.
  7. Perform automated QA checks: After segmentation and translation, automated quality assurance checks help identify and correct errors like misspellings or incorrect terminology, preventing them from being added to the Translation Memory.
  8. Prioritize consistency: Over time, consistent segmentation practices, combined with well-maintained TMs, significantly increase content reuse, reduce translation costs, and boost overall translation quality. 

  • Purpose: It makes translation faster, easier, and more consistent by allowing translators to focus on smaller, logical chunks of text. 
  • Tools: Segmentation is a foundational step in Computer-Assisted Translation (CAT) tools and is configured using specific segmentation rules. 

New kid on the block

 Try inter-linear.com

Enter text or upload a file, select a translation service, submit this source for translation and display to copy or download. The display, in contrast to side-by-side or parallel texts, can be segmented by sentences, paragraphs, word/phrase, or characters per line.

Additionally there are tools for converting source and translation into inter-linear displays as well as segmenting text into sentences.

Expect output suitable for everyday languages enthusiasts, not for scholarly or biblical exegesis.

Provide feedback on your experience with inter-linear displays and the results you got or wanted, or just enjoy the possibilities.

 

Thursday, October 2, 2025

Best strategy? for AI inter-linea display

 The best strategy for producing an interlinear display of a translation using Google Translate is to translate the complete submission first and then use client-side logic (JavaScript/CSS) for display conversion.


Analysis of Strategies

  1. Translate Complete Submission First (Recommended)

    • Process: Send the full sentence or segment to the Google Cloud Translation API. The API uses Neural Machine Translation (NMT), which translates the entire sentence at once to ensure contextual accuracy and capture nuances, idiomatic expressions, and complex grammar.1 The complete translated text is returned. Client-side code (JS/CSS) is then used to align the original text with the translated text for an interlinear (line-by-line or word-under-word) display, often by segmenting the text into smaller chunks (e.g., words, phrases) and attempting to map them, or simply displaying the source text and the full translation on separate lines.

    • Pros:

      • Highest Quality Translation: NMT produces the most human-like and contextually accurate result.2

      • Efficiency: Fewer API calls are made (one per sentence/segment), which is generally faster and more cost-effective.

    • Cons:

      • Alignment Challenge: The primary difficulty is that Google's NMT does not provide a direct word-to-word or phrase-to-phrase mapping in its standard API response. Perfect alignment for an interlinear display is often an extremely complex, if not impossible, computational challenge without the underlying translation model data.

  2. Translate Words and Phrases Iteratively (Not Recommended)

    • Process: Break the source text down into individual words or small phrases and make a separate API call for each one. Then, display the original word and its corresponding translation.

    • Pros:

      • Simple Alignment: Alignment is trivial since each call corresponds to a single word/phrase.

    • Cons:

      • Poor Translation Quality: Google's NMT is specifically designed to work on full sentences to understand context.3 Translating isolated words or short phrases loses context, resulting in poor, literal, and often incorrect translations.4

      • Inefficiency & Cost: This dramatically increases the number of API calls, leading to higher latency and significantly greater costs.


Conclusion

Since Google Translate's core strength is its contextual Neural Machine Translation, you should prioritize translation quality by sending the full segment. The trade-off is that you will need to implement a sophisticated (and likely imperfect) client-side logic to attempt to align the words for the interlinear display.

The other strategy involves losing the key benefit of Google Translate's technology.

To see how to get started with the Google Translate API, watch ... video: How to use the Google API to translate texts. This video is relevant because it demonstrates using the Google Translate API, which is the necessary first step for either strategy.


Saturday, September 20, 2025

Environment and AI solutions

Soubor:Google Gemini logo.svg 

20.09.25, ref. per Gemini

Question. Which is more friendly to the environment, for example in terms of electricity usage, carbon footprint, etc., client-side or server-side AI solutions? 

Response. In general, client-side AI solutions are more environmentally friendly than server-side solutions, but the answer is complex and depends on many factors. Client-side solutions, which run on a user's local device, eliminate or reduce the need for massive, energy-intensive data centers for inference (the use of a trained AI model). However, the training of these models, regardless of where they are ultimately deployed, remains a major environmental concern.

***

### Server-Side AI: The Environmental Burden

Server-side AI relies on large-scale data centers, which are one of the biggest contributors to the environmental footprint of AI. These data centers consume immense amounts of electricity for three main purposes:

* **Training and Inference:** The process of training a large language model (LLM) like GPT-4 can consume a staggering amount of energy. While a single query (inference) uses a tiny fraction of that energy, the sheer volume of billions of daily queries for popular models adds up to a massive cumulative carbon footprint.
* **Cooling:** Data centers generate a tremendous amount of heat. To prevent hardware from overheating, they require sophisticated and highly energy-intensive cooling systems, which can also consume millions of gallons of water annually.
* **Idle Capacity:** To ensure high availability and responsiveness, data centers must maintain a degree of idle capacity, which still consumes energy while waiting to handle traffic spikes.

The carbon footprint of server-side AI also depends on the **location** of the data center and its energy source. A data center running on a grid powered by renewable energy like wind or solar will have a significantly lower carbon footprint than one relying on fossil fuels.

### Client-Side AI: The Greener Alternative

Client-side, or on-device, AI runs directly on a user's device, such as a smartphone or laptop. Because the processing happens locally, it avoids the energy consumption and carbon emissions associated with data transmission and large data centers.

While client-side AI is more efficient for **inference**, it's crucial to understand a few key points:

* **Training is still a major factor:** The initial training of the AI model, whether it's for on-device or server-side use, is the most energy-intensive part of the process. This training is almost always done in a data center.
* **Limited Capabilities:** On-device models are typically smaller and less powerful than their server-side counterparts. This is because consumer devices lack the massive computational power and memory of a data center.
* **Device-level Consumption:** While they bypass the data center, on-device models still use a device's battery and CPU, which consumes energy. This can lead to faster battery drain and generate some heat.

### A Nuanced Comparison

Ultimately, the choice between client-side and server-side AI for a given application is a trade-off between performance and environmental impact. For computationally demanding tasks like training a large generative AI model, a server-side solution is currently the only option. For tasks that can be performed by smaller models, such as basic text completion or image classification, client-side AI is a much more energy-efficient and environmentally responsible choice.

As on-device hardware becomes more powerful and AI models are optimized to be smaller and more efficient, we'll likely see a shift toward more client-side solutions, reducing the reliance on energy-hungry data centers for many everyday AI applications.