Как е направено: Приложението Image to Speech.

In this article we describe how the application “Image to Speech” was made. Some code hints and documentation links . Application reads aloud and saves to audio track any text on image you give it and based on Google’s Cloud ML technology. Application built with Flutter framework using Dart language and available for free on Google PlayMarket and Apple AppStore.
You can check application’s source code on public GitHub repository.

Prologue.

Before we start, a bit of history reference: building the application, we’ve started with on-device image to text recognition but later on we’ve switched to cloud-based API due to on-device library for Flutter on that moment supported English only. Hope this was improved since.

Episode 1: Grab the image and recognize it to text.

Wouldn’t it be nice to have an application that can recognize text from a picture or photo, and even read this text and save the audio track separately? It will be very useful for the visually impaired, for foreigners who don’t know the correct pronunciation, or for fans of audio books.
So, create a new Flutter project, then connect Firebase for iOS and Android, as described in this document.
In this application we will use Google cloud OCR and Google cloud TTS, of course, there are already ready-made ones dependencies, such as firebase_ml_vision or mlkit, which will do everything for you and work without the Internet, but their functionality will be cut down, they will only recognize English language. Cloud Vision documentation can be found here.
Now in Google Cloud Platform need add to project:

Cloud Functions API
Cloud Vision API
Google Cloud APIs

Add dependencies camera, image_picker and http with which we will take photo or add already taken photos from the gallery and send this photo to the server.
So, choose photo from gallery:

	Future pickGallery() async {
	var tempStore = await ImagePicker.pickImage(source: ImageSource.gallery);
	if (tempStore != null) {
	recognizePhoto(tempStore.path);
	}
	}

view raw its1.dart hosted with ❤ by GitHub

convert photo to base64:

	recognizePhoto(filePath) async {
	try {
	File image = File(filePath);
	List<int> imageBytes = image.readAsBytesSync();
	String base64Image = base64Encode(imageBytes);
	TextRecognize text = await rep.convert(base64Image);
	getVoice(text);
	} catch (e) {
	print(e);
	}
	}

view raw its2.dart hosted with ❤ by GitHub

Map data to model from response:

	class TextRecognize {
	List<Response> responses;
	TextRecognize({this.responses});

	factory TextRecognize.fromJson(Map<String, dynamic> parsedJson) {
	var list = parsedJson["responses"] as List;
	List<Response> response = list.map((e) => Response.fromJson(e)).toList();
	return TextRecognize(responses: response);
	}
	}

	class Response {
	List<TextAnnotations> textAnnotations;
	Response({this.textAnnotations});

	factory Response.fromJson(Map<String, dynamic> parsedJson) {
	var list = parsedJson["textAnnotations"] as List;
	List<TextAnnotations> textAnnotation =
	list.map((e) => TextAnnotations.fromJson(e)).toList();
	return Response(textAnnotations: textAnnotation);
	}
	}

	class TextAnnotations {
	String locale;
	String description;
	BoundingPoly boundingPoly;
	TextAnnotations({this.locale, this.description, this.boundingPoly});

	factory TextAnnotations.fromJson(Map<String, dynamic> parsedJson) {
	return TextAnnotations(
	locale: parsedJson["locale"],
	description: parsedJson[“description"], boundingPoly:BoundingPoly.fromJson(parsedJson["boundingPoly"]));
	}
	}

	class BoundingPoly {
	List<Vertices> vertices;
	BoundingPoly({this.vertices});

	factory BoundingPoly.fromJson(Map<String, dynamic> parsedJson) {
	var list = parsedJson["vertices"] as List;
	List<Vertices> vertice = list.map((i) => Vertices.fromJson(i)).toList();
	return BoundingPoly(vertices: vertice);
	}
	}

	class Vertices {
	int x;
	int y;
	Vertices({this.x, this.y});

	factory Vertices.fromJson(Map<String, dynamic> parseJson) {
	return Vertices(x: parseJson["x"], y: parseJson["y"]);
	}
	}

view raw its3.dart hosted with ❤ by GitHub

send json with base64Image to Google vision

	static const _apiKey = "Your Api Key";
	String url = "https://vision.googleapis.com/v1/images:annotate?key=$_apiKey";

	Future<TextRecognize> convert(base64Image) async {
	var body = json.encode({
	"requests": [
	{
	"image": {"content": base64Image},
	"features": [
	{"type": "TEXT_DETECTION"}
	]
	}
	]
	});

	final response = await http.post(url, body: body);
	var jsonResponse = json.decode(response.body);
	return TextRecognize.fromJson(jsonResponse);
	}

view raw its4.dart hosted with ❤ by GitHub

Get text from model:

	getVoice(TextRecognize text) async {
	for (var response in text.responses) {
	for (var textAnnotation in response.textAnnotations) {
	print("${textAnnotation.description}");
	if (textAnnotation.locale != null) {
	var locale = textAnnotation.locale;
	Voice voice = await rep.getVoice(locale);
	writeAudio(voice);
	}
	}
	}
	}

view raw its5.dart hosted with ❤ by GitHub

Response from cloud gets us the text and the locale recognized.

Episode 2: Convert text to speech and save track to local file.

When we got text data and locale from ml-vision, we set this data to Google Text To Speech API.
For this we create http request with text.synthesizeioEncoding method:

	Future<dynamic> synthesizeText(
	String text, String name, String languageCode) async {
	try {
	final uri = Uri.https(‘texttospeech.googleapis.com’, '/v1beta1/text:synthesize');
	final Map json = {
	'input': {'text': text},
	'voice': {'name': name, 'languageCode': languageCode},
	'audioConfig': {'audioEncoding': 'MP3', "speakingRate": 1}
	};
	final jsonResponse = await _postJson(uri, json);
	if (jsonResponse == null) return null;
	final String audioContent = await jsonResponse['audioContent'];
	return audioContent;
	} on Exception catch (e) {
	print("$e");
	return null;
	}
	}

view raw its6.dart hosted with ❤ by GitHub

where:

‘input' it’s SynthesisInput type with field “text” it’s the raw text to be synthesized;
‘voice' its VoiceSelectionParams type, we set
“name” - type of voice
and languageCode - language;
‘audioConfig’ it’s description of audio data to be synthesized, AudioConfig.

We create request with ‘_postJson’ method:

	Future<Map<String, dynamic>> _postJson(Uri uri, Map jsonMap) async {
	try {
	final httpRequest = await _httpClient.postUrl(uri);
	final jsonData = utf8.encode(json.encode(jsonMap));
	final jsonResponse =
	await _processRequestIntoJsonResponse(httpRequest, jsonData);
	return jsonResponse;
	} on Exception catch (e) {
	print("$e");
	return null;
	}
	}
	Future<Map<String, dynamic>> _processRequestIntoJsonResponse(
	HttpClientRequest httpRequest, List<int> data) async {
	try {
	httpRequest.headers.add('X-Goog-Api-Key', ‘Google API Key’);
	httpRequest.headers.add(HttpHeaders.CONTENT_TYPE, 'application/json');
	if (data != null) {
	httpRequest.add(data);
	}
	final httpResponse = await httpRequest.close();
	if (httpResponse.statusCode != HttpStatus.OK) {
	print("httpResponse.statusCode " + httpResponse.statusCode.toString());
	throw Exception('Bad Response');
	}
	final responseBody = await httpResponse.transform(utf8.decoder).join();
	print("responseBody " + responseBody.toString());

	return json.decode(responseBody);
	} on Exception catch (e) {
	print("$e");
	return null;
	}
	}

view raw its7.dart hosted with ❤ by GitHub

Create Voice model:

	class Voice {
	final String name;
	final String gender;
	final List<String> languageCodes;

	Voice(this.name, this.gender, this.languageCodes);

	static List<Voice> mapJSONStringToList(List<dynamic> jsonList) {
	return jsonList.map((v) {
	return Voice(
	v['name'], v['ssmlGender'], List<String>.from(v['languageCodes']));
	}).toList();
	}
	}

view raw its8.dart hosted with ❤ by GitHub

Map data to model from response:

	Future<List<Voice>> getVoices() async {
	try {
	final uri = Uri.https(‘texttospeech.googleapis.com’, '/v1beta1/voices');
	final jsonResponse = await _getJson(uri);
	if (jsonResponse == null) {
	return null;
	}
	final List<dynamic> voicesJSON = jsonResponse['voices'].toList();
	if (voicesJSON == null) {
	return null;
	}
	final voices = Voice.mapJSONStringToList(voicesJSON);
	return voices;
	} on Exception catch (e) {
	return null;
	}
	}

	Future<Map<String, dynamic>> _getJson(Uri uri) async {
	try {
	final httpRequest = await _httpClient.getUrl(uri);
	final jsonResponse =
	await _processRequestIntoJsonResponse(httpRequest, null);
	return jsonResponse;
	} on Exception catch (e) {
	return null;
	}
	}

view raw its9.dart hosted with ❤ by GitHub

Then we create audio file in app directory naming that by creation time:

	String _getTimestamp() => DateTime.now().millisecondsSinceEpoch.toString();

	writeAudioFile(String text) async {
	Voice voice = getVoices();
	final String audioContent = await TextToSpeechAPI()
	.synthesizeText(text, voice.name, voice.languageCodes.first);
	bytes = Base64Decoder().convert(audioContent, 0, audioContent.length);
	final dir = await getTemporaryDirectory();
	final audioFile = File('${dir.path}/${_getTimestamp()}.mp3');
	await audioFile.writeAsBytes(bytes);
	return audioFile.path;
	}

view raw its10.dart hosted with ❤ by GitHub

And we can play created file with flutter audioplayer plugin:

	playAudio(String audioText){
	AudioPlayer audioPlugin = AudioPlayer();
	String audioPath = writeAudioFile();
	audioPlugin.play(audio, isLocal: true);
	}

view raw its11.dart hosted with ❤ by GitHub

Epilogue.

Thank you for reading this article to the end, we hope you enjoy it and now you know kung-fu.
Please check application published:
On Google PlayMarket and Apple AppStore.

Flutter

Как е направено: Приложението Image to Speech.

Prologue.

Episode 1: Grab the image and recognize it to text.

Episode 2: Convert text to speech and save track to local file.

Epilogue.

More like this

Свържете се

Frankfurt am Main, Germany (Sales)

Kharkiv, Ukraine (Development)

Burgas, Bulgaria (Development)